110 Commits

Author SHA1 Message Date
Hunter Stern
91f9788eb2 Add iframe css path to target id for soundcloud buttons. 2015-01-21 16:28:29 -08:00
Hunter Stern
e9451f88d8 Merge branch 'master' of github.com:internetarchive/umbra into ari-3774 2015-01-21 16:21:13 -08:00
Hunter Stern
5ea12fd053 More refinements. 2014-12-19 15:52:13 -08:00
Hunter Stern
8d225b8859 More debugging. 2014-12-19 15:13:02 -08:00
Hunter Stern
5304f2909d Less verbose logging. 2014-12-19 14:35:11 -08:00
Hunter Stern
ae60205648 Fix for https://webarchive.jira.com/browse/ARI-4150 2014-12-19 14:17:50 -08:00
Noah Levitt
1108ef9362 Merge pull request #33 from adam-miller/ARI-4016
ARI-4016 - Support: embedded videos on marquette.edu
2014-11-21 15:10:53 -08:00
Adam Miller
7f8e6802de Implementing suggestions in pull request. 2014-11-07 15:56:05 -08:00
Noah Levitt
ab86426475 properly handle socket.error from amqp conn.drain_events (was previously diagnosed as error starting browser) 2014-11-03 11:54:10 -08:00
vonrosen
01ed5a7d4d Merge pull request #28 from internetarchive/ari-3940
Ari 3940 - prioritize scrolling all the way to the bottom
2014-10-09 21:21:02 +00:00
Hunter Stern
1ee45053c5 Even more formatting changes. 2014-09-22 14:22:52 -07:00
Hunter Stern
6af3455dbf Improve formatting. 2014-09-22 14:21:00 -07:00
Adam Miller
eb3ea95b87 Cleanup timeout logic 2014-09-17 15:26:13 -07:00
Adam Miller
5a3c8e9a05 ARI-4016 - Support: embedded videos on marquette.edu 2014-09-15 11:06:33 -07:00
Hunter Stern
a2ea2501db More soundcloud changes. 2014-09-12 16:07:32 -07:00
Hunter Stern
e320654d1e Allow selector to detect https and http soundcloud widget. 2014-09-12 09:56:41 -07:00
Hunter Stern
0e7fd93967 Merge remote-tracking branch 'internetarchive/master' into ari-3774 2014-08-26 15:12:13 -07:00
Noah Levitt
c886b57d3a reject (discard) bad messages 2014-08-19 18:51:43 -07:00
Noah Levitt
9d90b5830a facebook - scroll all the to the bottom before scrolling back up to click more stuff 2014-08-01 16:53:13 -07:00
Noah Levitt
dd9ef50484 suppress logging of umbraBehaviorFinished() message which is sent a lot 2014-08-01 16:22:45 -07:00
Hunter Stern
6a5d1e2266 Disable web security in chromium so iframes on different domains can be accessed by behavior javascript. 2014-07-24 16:46:06 -07:00
Hunter Stern
80f3a4a067 Enhancement to allow embedded soundcloud audio files to be detected 2014-07-24 16:44:05 -07:00
Noah Levitt
ae838af25d set amqp prefetch count to the number of urls we can handle at a time, i.e. max_active_browsers (with prefetch=1 umbra was only browsing one url at a time, after quickly burning through urls already on the queue when started) 2014-07-02 10:30:51 -07:00
Noah Levitt
6306c16698 kill -HUP to immediately close and reopen amqp consumer connection 2014-06-23 17:18:27 -07:00
Noah Levitt
02c054c284 do not wait forever for zombie websocket threads (this change should also reveal how we get these sometimes) 2014-06-20 18:13:45 -07:00
Noah Levitt
ead46d5716 more elaborate dumping of state on SIGQUIT to replace faulthandler 2014-06-20 14:05:33 -07:00
Noah Levitt
17ef9d9f28 close and reopen the amqp consumer connection only every 2.5 hours instead of every 15 minutes, because now that we have to wait for all browsers to close when we do the reconnection, it slows us down a lot 2014-06-18 14:58:44 -07:00
Noah Levitt
025db91dea get rid of --browser-wait and --routing-key in favor of sensible defaults, some other tweaks 2014-06-11 10:58:08 -07:00
Noah Levitt
a78e60f1da wait for a browser to become available and start it up before reading the next url from amqp; ack the message only after completing the browsing process successfully, and requeue if it's not successful; some refactoring to make the timing work for this 2014-06-09 13:15:05 -07:00
Hunter Stern
41270af223 Allow flash requests to be detected. 2014-06-06 10:47:29 -07:00
Noah Levitt
dd2d36328f scroll up faster on facebook 2014-06-04 12:34:20 -07:00
Noah Levitt
c2153be288 start behaviors again on any Page.loadEventFired, because if we don't do that, we keep asking the page if the behavior thinks it's finished, and it doesn't know what we're talking about 2014-06-03 18:06:02 -07:00
Noah Levitt
bfb6cac25f use temp dir as $HOME instead of just chromium user-data-dir, because sometimes we have been seeing chrome print this error message and hang "[1975:2001:0603/215855:ERROR:nss_util.cc(444)] Error initializing NSS with a persistent database (sql:/home/archiveit/.pki/nssdb): NSS error code: -8187" 2014-06-03 16:02:00 -07:00
Noah Levitt
e619e013b6 sleep for 5 seconds after starting a browser, since starting 20 at once brings the computer to its knees 2014-06-03 15:57:12 -07:00
Noah Levitt
1f91018d91 even more patience killing chrome, send another sigterms every ten seconds if chrome is still alive 2014-06-02 12:09:15 -07:00
Noah Levitt
c6bd2417d7 good smarter killing of chrome 2014-06-02 11:58:11 -07:00
Noah Levitt
56a721f059 dump stack trace and don't return browser to pool on critical error where chrome process might still be running 2014-05-30 23:07:39 -07:00
Noah Levitt
3127e02cbb fancy --version that includes git branch and timestamp of last commit if available 2014-05-29 20:43:00 -07:00
Noah Levitt
0bcc583b40 think it's safer to use a range of ports 9200 thru 9200+n than to try to choose random ports and hold them with socket.bind() (don't know how we can be sure a port is available) 2014-05-29 17:55:00 -07:00
Noah Levitt
94c2e4390b debugging to and mitigation for problem "[Errno 98] Address already in use" 2014-05-28 18:57:21 -07:00
Noah Levitt
9c08be2699 sigterm and sigint both shutdown request shutdown, which stops consuming urls and waits for active browsers to finish; a second sigint/sigterm immediately shuts down active browsers 2014-05-24 01:52:22 -07:00
Noah Levitt
b67d9fadf0 log ports chose for browsers, and give threads nice names to make logs easier to understand 2014-05-23 22:30:25 -07:00
Noah Levitt
2c4ba005b5 make umbra amenable to clustering by using a pool of n browsers and removing the browser-clientId affinity (not useful currently since we start a fresh browser instance for each page browsed), and set prefetch_count=1 on amqp consumers to round-robin incoming urls among umbra instances 2014-05-23 21:59:34 -07:00
Noah Levitt
a7cd872b95 sleep for 0.5 sec before attempting to reconnect to amqp; documentation tweaks 2014-05-23 13:34:07 -07:00
Noah Levitt
155db96461 provide abbreviated api 2014-05-23 13:27:00 -07:00
Noah Levitt
bf3afcccb9 oops, Browser.__init__ doesn't take client_id anymore 2014-05-20 19:27:53 -07:00
Noah Levitt
d4693b2aba remove unused param to __init__, avoid exception when on_request callback not provided 2014-05-20 17:07:42 -07:00
Noah Levitt
8749b97811 oops, check in browser.py 2014-05-20 03:10:33 -07:00
Noah Levitt
b59e76a5b9 clean shutdown without draining entire amqp queue (only consume urls from amqp when browser activity isn't saturated) 2014-05-20 03:02:48 -07:00
Noah Levitt
3e4232f32c refactor umbra.py into controller.py and browser.py, improve class names 2014-05-20 02:42:40 -07:00