122 Commits

Author SHA1 Message Date
Noah Levitt
c2153be288 start behaviors again on any Page.loadEventFired, because if we don't do that, we keep asking the page if the behavior thinks it's finished, and it doesn't know what we're talking about 2014-06-03 18:06:02 -07:00
Noah Levitt
bfb6cac25f use temp dir as $HOME instead of just chromium user-data-dir, because sometimes we have been seeing chrome print this error message and hang "[1975:2001:0603/215855:ERROR:nss_util.cc(444)] Error initializing NSS with a persistent database (sql:/home/archiveit/.pki/nssdb): NSS error code: -8187" 2014-06-03 16:02:00 -07:00
Noah Levitt
e619e013b6 sleep for 5 seconds after starting a browser, since starting 20 at once brings the computer to its knees 2014-06-03 15:57:12 -07:00
Noah Levitt
1f91018d91 even more patience killing chrome, send another sigterms every ten seconds if chrome is still alive 2014-06-02 12:09:15 -07:00
Noah Levitt
c6bd2417d7 good smarter killing of chrome 2014-06-02 11:58:11 -07:00
Noah Levitt
1ae9b83dab Merge branch 'dev' of github.com:nlevitt/umbra into dev 2014-05-30 23:07:54 -07:00
Noah Levitt
56a721f059 dump stack trace and don't return browser to pool on critical error where chrome process might still be running 2014-05-30 23:07:39 -07:00
Noah Levitt
b2e27b99d2 nice log message when fully shut down 2014-05-30 17:32:01 -07:00
Noah Levitt
c9d503e690 log version number at startup 2014-05-30 15:00:01 -07:00
Noah Levitt
ed92f3bd53 for the version string, use abbreviated commit hash instead of attempting to use the branch name 2014-05-29 23:33:14 -07:00
Noah Levitt
bef57e2819 for version string, try to handle case where head is detached 2014-05-29 20:57:33 -07:00
Noah Levitt
3127e02cbb fancy --version that includes git branch and timestamp of last commit if available 2014-05-29 20:43:00 -07:00
Noah Levitt
0bcc583b40 think it's safer to use a range of ports 9200 thru 9200+n than to try to choose random ports and hold them with socket.bind() (don't know how we can be sure a port is available) 2014-05-29 17:55:00 -07:00
Noah Levitt
94c2e4390b debugging to and mitigation for problem "[Errno 98] Address already in use" 2014-05-28 18:57:21 -07:00
vonrosen
2dc30cc8bc Merge pull request #23 from nlevitt/master
improve robustness, refactor, tweaks
2014-05-28 12:38:11 -07:00
Noah Levitt
9c08be2699 sigterm and sigint both shutdown request shutdown, which stops consuming urls and waits for active browsers to finish; a second sigint/sigterm immediately shuts down active browsers 2014-05-24 01:52:22 -07:00
Noah Levitt
b67d9fadf0 log ports chose for browsers, and give threads nice names to make logs easier to understand 2014-05-23 22:30:25 -07:00
Noah Levitt
2c4ba005b5 make umbra amenable to clustering by using a pool of n browsers and removing the browser-clientId affinity (not useful currently since we start a fresh browser instance for each page browsed), and set prefetch_count=1 on amqp consumers to round-robin incoming urls among umbra instances 2014-05-23 21:59:34 -07:00
Noah Levitt
8d269f4c56 add options --verbose, --exchange, --queue, --routing-key 2014-05-23 13:39:39 -07:00
Noah Levitt
bd3f979b56 capitalize AMQP in description 2014-05-23 13:39:08 -07:00
Noah Levitt
6f61d0289b improve readme, mentioning archive-it per kristine 2014-05-23 13:34:51 -07:00
Noah Levitt
a7cd872b95 sleep for 0.5 sec before attempting to reconnect to amqp; documentation tweaks 2014-05-23 13:34:07 -07:00
Noah Levitt
155db96461 provide abbreviated api 2014-05-23 13:27:00 -07:00
Noah Levitt
bf3afcccb9 oops, Browser.__init__ doesn't take client_id anymore 2014-05-20 19:27:53 -07:00
Noah Levitt
d7cfcbf233 new helper utility to browse urls provided as command line args 2014-05-20 17:11:16 -07:00
Noah Levitt
6c69b68771 organize imports, tweak command line args 2014-05-20 17:10:41 -07:00
Noah Levitt
d4693b2aba remove unused param to __init__, avoid exception when on_request callback not provided 2014-05-20 17:07:42 -07:00
Noah Levitt
99d219dfda not sure why /bin/ et al were in .gitignore... replace with a couple of useful things 2014-05-20 17:06:26 -07:00
Noah Levitt
1e18c2ca74 improve helper utilities 2014-05-20 16:44:13 -07:00
Noah Levitt
8749b97811 oops, check in browser.py 2014-05-20 03:10:33 -07:00
Noah Levitt
b59e76a5b9 clean shutdown without draining entire amqp queue (only consume urls from amqp when browser activity isn't saturated) 2014-05-20 03:02:48 -07:00
Noah Levitt
3e4232f32c refactor umbra.py into controller.py and browser.py, improve class names 2014-05-20 02:42:40 -07:00
Noah Levitt
6fdcdd0bf0 configurable max number of instances of chrome simultaneously browsing pages (default=3); close and reopen connection to amqp every 15 minutes (consumer only); increase default browser wait to 60 sec 2014-05-20 01:09:11 -07:00
Noah Levitt
cc0ffee508 only websocket-client-py3==0.13.1 works right with python3 at the moment, see https://github.com/liris/websocket-client/issues/84 2014-05-20 00:57:07 -07:00
Eldon
154eb6f334 Merge pull request #22 from nlevitt/master
whole bunch of changes (already deployed on QA)
2014-05-06 09:13:56 -04:00
Noah Levitt
05e673917d "wasThrown" is necessarily always included in the result message from chrome for Runtime.evaluate 2014-05-05 19:58:41 -07:00
Noah Levitt
93b16f28b9 improve facebook behavior: when we expect a "close" button to appear, wait for it before moving on to other actions; and when we discover a missed click target above, scroll back up to click on it 2014-05-05 18:39:16 -07:00
Noah Levitt
fa6e3eebb2 clear UmbraWorker.self._behavior when finished with a page (after the first page, nothing was getting behaviors); bump hard timeout to 20 minutes 2014-05-05 18:37:39 -07:00
Noah Levitt
55fad80553 UmbraWorker.send_to_chrome() - central place to send message to chrome via websocket 2014-05-05 12:26:39 -07:00
Noah Levitt
a62a07e6b7 change magic first line of behavior js files to a commented-out json blob, which should include the fields 'url_regex' and 'request_idle_timeout_sec'; behavior.is_finished() incorporates the custom idle timeout into its check; also rename variables in behavior scripts with umbra/UMBRA_ prefix to sort of namespace them; and add "finished" logic to facebook and vimeo behaviors (flickr needs work to support it) 2014-05-05 11:58:55 -07:00
Noah Levitt
2a9633ad77 Bunch of improvements, most importantly a default fallback behavior script which scrolls to the bottom of the page, and rearchitecting some stuff so that the behavior script can have some say on when it's finished with the page. Also some doc comments. 2014-05-04 21:33:13 -07:00
Adam Miller
602459bb42 Merge pull request #21 from nlevitt/disable-google-analytics
disable google analytics by setting a breakpoint in www.google-analytics...
2014-05-02 18:32:35 -07:00
Noah Levitt
8679ee0ea7 disable google analytics by setting a breakpoint in www.google-analytics.com/analytics.js and replacing the content of that script when the breakpoint is hit 2014-05-02 18:30:28 -07:00
Noah Levitt
d6b696ded8 Merge pull request #20 from adam-miller/master
Removing first run ui checks
2014-05-02 17:42:53 -07:00
Adam Miller
9cf20f195c Removing first run ui checks 2014-05-02 17:37:10 -07:00
Eldon
e7353fbb4b Merge pull request #19 from nlevitt/ari-3814
ARI-3814 try to recover from rabbitmq communication problems
2014-04-09 13:25:22 -04:00
Noah Levitt
89e41e7c82 remove exception raised for testing 2014-04-07 11:45:54 -07:00
Noah Levitt
aacb886b62 ARI-3814 try to recover from rabbitmq communication problems 2014-04-07 11:45:12 -07:00
Eldon
4e72cbae58 Merge pull request #18 from nlevitt/ari-3771
to address ARI-3771 "Lasalle Facebook last scrolldown doesn't work", scr...
2014-04-04 16:04:38 -04:00
Eldon
beeb4a2a2c Merge pull request #17 from nlevitt/ari-3811
thread dump on SIGQUIT a la java
2014-04-04 15:21:41 -04:00