Deduplication in news portals
Deduplication of information refers to removing information that is duplicated. This means that when an agent finds the same article or posting more than once, the posting will only be shown once in your alert. This is to prevent your news portal becoming cluttered with redundant or duplicated information.
How do we do this?
There are two layers of deduplication in the Digimind platform. The first is applied inside the agent, as it brings information into the platform. The second layer is applied on the dashboard when alerts are displayed.
- your agent has been set up to look for keywords
In this case any postings or information that is returned with the same URL will be automatically deduplicated.
- your agent monitors source Packs from the URL Store or the DCF as a source in agent
Any postings that have the same URL will be deduplicated.
- your agent monitors RSS feeds or Webnews sources
Any postings that have the same URL or have the same title will be deduplicated.
- in the news portal, the default setting is to deduplicate
Any postings with the same title, same URL or same résumé will be deduplicated.
However, you can modify the display options by clicking on advanced options to see:
Check the box “Duplicated alerts” to see duplicated information in your news portal.
Why see duplicated information?
The answer to this depends on your reasons for conducting CI. In many cases, you won’t want to see the same piece of information twice.
But, let’s imagine that, as part of your online reputation management, your objective is to track the spread of a viral campaign, or to see how many times the same post gets shared on different websites. In such a case, it might become essential to track duplicated alerts.