6 Commits

Author SHA1 Message Date
Noah Levitt
5f3c247e0c trick to avoid crawling same url again too quickly 2015-07-09 21:49:55 -07:00
Noah Levitt
7cc777661d fix dumb bug 2015-07-09 18:54:09 -07:00
Noah Levitt
783794ca37 basic of site/seed crawling with scoping 2015-07-09 18:36:07 -07:00
Noah Levitt
92ea701987 rudimentary crawling in parallel with multiple browsers 2015-07-08 18:50:18 -07:00
Noah Levitt
4022cc0162 simple in-memory frontier with prioritized queues by host 2015-07-08 17:44:38 -07:00
Noah Levitt
4042f22497 rudimentary link extraction and crawling 2015-07-07 16:45:52 -07:00