37 Commits

Author SHA1 Message Date
Noah Levitt
92ea701987 rudimentary crawling in parallel with multiple browsers 2015-07-08 18:50:18 -07:00
Noah Levitt
4022cc0162 simple in-memory frontier with prioritized queues by host 2015-07-08 17:44:38 -07:00
Noah Levitt
4042f22497 rudimentary link extraction and crawling 2015-07-07 16:45:52 -07:00
Noah Levitt
d8a962b29e experimenting with captureScreenshot 2015-06-16 18:42:21 -07:00
Noah Levitt
9053279b4e change default routing key to "urls" 2014-11-03 11:54:59 -08:00
Noah Levitt
2ab767eaa9 make drain-queue output actual json instead of python dict syntax 2014-08-26 23:46:00 +00:00
Noah Levitt
fe1d9e01eb utility queue-json to publish an arbitrary json blob to amqp 2014-08-26 23:45:42 +00:00
Noah Levitt
6306c16698 kill -HUP to immediately close and reopen amqp consumer connection 2014-06-23 17:18:27 -07:00
Noah Levitt
9b32f9a3d1 ugh, it was better with the default width, in spite of the ridiculous behavior.script 2014-06-20 14:40:12 -07:00
Noah Levitt
2cf69bdaff seriously, don't try to wrap any lines, pprint 2014-06-20 14:37:33 -07:00
Noah Levitt
c6fa00812c when dumping state on SIGQUIT, build the whole string before printing to avoid stuff getting intermingled with other logging and stuff 2014-06-20 14:33:01 -07:00
Noah Levitt
ead46d5716 more elaborate dumping of state on SIGQUIT to replace faulthandler 2014-06-20 14:05:33 -07:00
Noah Levitt
ebb14ff889 get rid of chrome_wait straggler 2014-06-18 17:31:28 -07:00
Noah Levitt
025db91dea get rid of --browser-wait and --routing-key in favor of sensible defaults, some other tweaks 2014-06-11 10:58:08 -07:00
Noah Levitt
a78e60f1da wait for a browser to become available and start it up before reading the next url from amqp; ack the message only after completing the browsing process successfully, and requeue if it's not successful; some refactoring to make the timing work for this 2014-06-09 13:15:05 -07:00
Noah Levitt
b2e27b99d2 nice log message when fully shut down 2014-05-30 17:32:01 -07:00
Noah Levitt
c9d503e690 log version number at startup 2014-05-30 15:00:01 -07:00
Noah Levitt
3127e02cbb fancy --version that includes git branch and timestamp of last commit if available 2014-05-29 20:43:00 -07:00
Noah Levitt
9c08be2699 sigterm and sigint both shutdown request shutdown, which stops consuming urls and waits for active browsers to finish; a second sigint/sigterm immediately shuts down active browsers 2014-05-24 01:52:22 -07:00
Noah Levitt
2c4ba005b5 make umbra amenable to clustering by using a pool of n browsers and removing the browser-clientId affinity (not useful currently since we start a fresh browser instance for each page browsed), and set prefetch_count=1 on amqp consumers to round-robin incoming urls among umbra instances 2014-05-23 21:59:34 -07:00
Noah Levitt
8d269f4c56 add options --verbose, --exchange, --queue, --routing-key 2014-05-23 13:39:39 -07:00
Noah Levitt
bd3f979b56 capitalize AMQP in description 2014-05-23 13:39:08 -07:00
Noah Levitt
d7cfcbf233 new helper utility to browse urls provided as command line args 2014-05-20 17:11:16 -07:00
Noah Levitt
6c69b68771 organize imports, tweak command line args 2014-05-20 17:10:41 -07:00
Noah Levitt
1e18c2ca74 improve helper utilities 2014-05-20 16:44:13 -07:00
Noah Levitt
b59e76a5b9 clean shutdown without draining entire amqp queue (only consume urls from amqp when browser activity isn't saturated) 2014-05-20 03:02:48 -07:00
Noah Levitt
3e4232f32c refactor umbra.py into controller.py and browser.py, improve class names 2014-05-20 02:42:40 -07:00
Noah Levitt
f69edd5a87 handle multiple clients, browsers 2014-02-13 01:59:09 -08:00
Eldon
bdf00cc515 Refactor to pull Chrome execution inside of umbra, simplify some things 2014-02-12 19:31:03 -05:00
Eldon
8afe7d90a2 Replace js evaluation with direct page navigation, add default for dump_queue 2014-01-28 00:10:31 -05:00
Noah Levitt
8eb92b28e6 make load_url handle arguments similarly to umbra 2014-01-27 19:34:54 -08:00
Eldon
bd0183058d Inccognito messes with currently running chromium instances, disable it 2014-01-23 18:26:20 -05:00
Eldon
6dc20e660f Remove debugging output, improve support scripts 2014-01-22 18:41:00 +00:00
Eldon
4e38a142d4 Some refactor/testing and utility scripts 2014-01-22 18:03:02 +00:00
Eldon
428d6cb7da Rework executable script so that it uses a main 2014-01-22 02:30:12 +00:00
Eldon
7b219ab011 Fix parameter passing and work with chromiums wrapper stuff 2014-01-22 02:22:16 +00:00
Eldon
dd72311e2d Create executable umbra script 2014-01-21 18:23:11 +00:00