Item19: Optimize harvester to remove duplicates

Priority: CurrentState: AppliesTo: Component: WaitingFor:
Enhancement Closed frontend   ErikBorra

Details

The harvester currently does not distinguish between urls or urls with a trailing slash. It also does not distinguish between www.host or host.

Solution:

  • strip http:// and slash at end. Remove www too, compare what's left. Put www back if it existed. Put http:// in front

-- ErikBorra - 28 Feb 2008

Decided to not strip www, did the rest.

-- ErikBorra - 09 Apr 2008

ItemTemplate
Summary Optimize harvester to remove duplicates
ReportedBy ErikBorra
AppliesTo frontend
Priority Enhancement
CurrentState Closed
WaitingFor ErikBorra
Topic revision: r2 - 09 Apr 2008 - 13:03:42 - ErikBorra