Key entries by link if missing ID

Resolves the problem of incorrectly duplicated entries in feeds that
update content but don’t explicitly provide entry IDs. Example feed:

  - https://www.to-rss.xyz/wikipedia/current_events/

Example entry:

    <item>
      <title>Current events: 2022-07-13</title>
      <link>https://en.wikipedia.org/wiki/Portal:Current_events/2022_July_13</link>
      <description>[VARIABLE CONTENT]</description>
      <pubDate>Wed, 13 Jul 2022 00:00:00 -0000</pubDate>
      </item>
    <item>

This behavior is suggested by the common practice of using an entry’s
link as its ID value, and is consistent with typical feed aggregators
such as Feedbin and Inoreader.
This commit is contained in:
Andrew Kvalheim 2022-07-14 09:41:33 -07:00
parent 30ad459870
commit 03bb128005

View File

@ -279,12 +279,12 @@ class RSSBot(Plugin):
feed_id=feed_id,
id=(
getattr(entry, "id", None)
or getattr(entry, "link", None)
or hashlib.sha1(
" ".join(
[
getattr(entry, "title", ""),
getattr(entry, "description", ""),
getattr(entry, "link", ""),
]
).encode("utf-8")
).hexdigest()