Item26: Crawler sometimes finishes but does not terminate

Priority: CurrentState: AppliesTo: Component: WaitingFor:
Normal Confirmed crawler   KoenMartens

Details

From time to time, the crawler finishes a crawl, but then does not terminate properly. The log ends with (excerpt from node2:/home/crawler/cbsd/var/crawler-a-1206359140.log):

[1206368119803] [DEBUG:net.oneworld.issuemapper.IssueCrawler worm 7]GET:[http://bbbeugosto.blogspot.com/2008/03/rafinha-o-novo-lder.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=WEEKLY-1205636400000&toggleopen=WEEKLY-1206241200000]
[1206368121804] [DEBUG:net.oneworld.issuemapper.IssueCrawler worm 12]HostRecord: Requesting link at 1206368118003
[1206368121826] [DEBUG:net.oneworld.issuemapper.IssueCrawler worm 12]GET:[http://bbbeugosto.blogspot.com/2008/03/rafinha-o-novo-lder.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=WEEKLY-1205031600000&toggleopen=WEEKLY-1206241200000]
[1206368121826] [DEBUG:net.oneworld.issuemapper.IssueCrawler worm 4]DOWNLOADING page at time 1206368121826
[1206368121826] [DEBUG:net.oneworld.issuemapper.IssueCrawler worm 7]DOWNLOADING page at time 1206368121826
[1206368121848] [DEBUG:net.oneworld.issuemapper.IssueCrawler worm 12]DOWNLOADING page at time 1206368121848
[GC 51022K->47258K(65656K), 0.0018625 secs]
Mon Mar 24 14:15:24 GMT 2008: visited=2303 tested=214018 queue=633 process=1
Queues: nHostsRemaining=0

Mon Mar 24 14:15:29 GMT 2008: visited=2303 tested=214018 queue=633 process=1
Queues: nHostsRemaining=0

Mon Mar 24 14:15:34 GMT 2008: visited=2304 tested=214138 queue=633 process=0
Queues: nHostsRemaining=0

Mon Mar 24 14:15:39 GMT 2008: visited=2304 tested=214138 queue=633 process=0
Queues: nHostsRemaining=0

Mon Mar 24 14:15:45 GMT 2008: visited=2304 tested=214138 queue=633 process=0
Queues: nHostsRemaining=0

[1206368149445] [DEBUG:CRAWLSESSION]checking to see if finished: finished nq=633 np=0 more=false
[Crawlevent: stopped]
[1206368149467] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 0,5,main] awaiting exit, dead=true state=1
[1206368149491] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 1,5,main] awaiting exit, dead=true state=1
[1206368149513] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 2,5,main] awaiting exit, dead=true state=1
[1206368149534] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 3,5,main] awaiting exit, dead=true state=1
[1206368149556] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 4,5,main] awaiting exit, dead=true state=1
[1206368149577] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 5,5,main] awaiting exit, dead=true state=1
[1206368149599] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 6,5,main] awaiting exit, dead=true state=1
[1206368149620] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 7,5,main] awaiting exit, dead=true state=1
[1206368149642] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 8,5,main] awaiting exit, dead=true state=1
[1206368149663] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 9,5,main] awaiting exit, dead=true state=1
[1206368149685] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 10,5,main] awaiting exit, dead=true state=1
[1206368149707] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 11,5,main] awaiting exit, dead=true state=1
[1206368149728] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 12,5,main] awaiting exit, dead=true state=1
[1206368149750] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 13,5,main] awaiting exit, dead=true state=1
[1206368149771] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 14,5,main] awaiting exit, dead=true state=1
[1206368149793] [DEBUG:CRAWLSESSION]Thread[net.oneworld.issuemapper.IssueCrawler worm 15,5,main] awaiting exit, dead=true state=1
[1206368149815] [INFO]Total bytes downloaded: 720872926
[1206368149836] [INFO]Bytes per second: 311996.9
[1206368149859] [DEBUG:CRAWLSESSION]total crawled urls discovered 36300
[1206368149881] [DEBUG:CRAWLSESSION]total crawled urls processed 19941
[1206368149904] [DEBUG:CRAWLSESSION]Getting host...
[GC 51348K->47041K(65656K), 0.0023527 secs]
[1206368150356] [DEBUG:CRAWLSESSION]Updating colinkees information....
[1206368158307] [DEBUG:CRAWLSESSION]Finding Inward and External Links...
[1206368280685] [DEBUG:CRAWLSESSION]Updating statistics Information...
[1206368653727] [DEBUG:CRAWLSESSION]Stopping stats logger...
[1206368655414] [DEBUG:CRAWLSESSION]Stats logger stopped
[1206368655448] [DEBUG:CRAWLSESSION]TimeOuts : true
[1206368735153] [DEBUG:CRAWLSESSION]PagesDownloaded : true
[1206368735227] [INFO]Generating network info...
[1206368735364] [DEBUG:CRAWLSESSION]session finished
DB PARAMS - URL=jdbc:postgresql://82.94.245.16/issuemaps_a USER=crawler PW=d7e6d552

-- KoenMartens - 26 Mar 2008

Problem seems to be around line 620 of CrawlSession? .java, where a synchronized section is being entered. Apparently, is still locked by some thread (deadlock?).

-- KoenMartens - 09 May 2008

Recompiled YINYANG kernel, changed ULE to 4BSD.. testing....

-- KoenMartens - 09 May 2008

Here are the running threads at that point:

node3# tail -f crawler-b-1210341573.log [1210343790238] [DEBUG:CRAWLSESSION]session finished [1210343790239] [DEBUG:CRAWLSESSION]0: Thread[Reference Handler,10,system] [1210343790241] [DEBUG:CRAWLSESSION]0: Thread[Finalizer,8,system] [1210343790242] [DEBUG:CRAWLSESSION]0: Thread[Signal Dispatcher,9,system] [1210343790243] [DEBUG:CRAWLSESSION]1: Thread[main,5,main] [1210343790244] [DEBUG:CRAWLSESSION]1: Thread[Timer-0,5,main] [1210343790245] [DEBUG:CRAWLSESSION]1: Thread[CRAWLSESSION,5,main] [1210343790246] [DEBUG:CRAWLSESSION]1: Thread[Timer Manager,5,main] DB PARAMS - URL=jdbc:postgresql://10.0.0.15/issuemaps_b USER=crawler PW=d7e6d552 [GC 4282K->3393K(15436K), 0.0024230 secs]

-- KoenMartens - 09 May 2008

ItemTemplate
Summary Crawler sometimes finishes but does not terminate
ReportedBy KoenMartens
AppliesTo crawler
Priority Normal
CurrentState Confirmed
WaitingFor KoenMartens
Topic revision: r3 - 09 May 2008 - 16:19:48 - KoenMartens