Noah Levitt
|
fcc63b6675
|
fancier prioritization takes into account hops from seed, path depth; and clean shutdown
|
2015-07-09 22:35:37 -07:00 |
|
Noah Levitt
|
783794ca37
|
basic of site/seed crawling with scoping
|
2015-07-09 18:36:07 -07:00 |
|
Noah Levitt
|
4042f22497
|
rudimentary link extraction and crawling
|
2015-07-07 16:45:52 -07:00 |
|
Noah Levitt
|
d8a962b29e
|
experimenting with captureScreenshot
|
2015-06-16 18:42:21 -07:00 |
|
Noah Levitt
|
0901cac2e0
|
Merge pull request #38 from nlevitt/bump-browser-timeout
increase browser start and stop timeouts, since sometimes we strand brow...
|
2015-01-26 21:22:18 -08:00 |
|
Noah Levitt
|
e9c2fc61dd
|
increase browser start and stop timeouts, since sometimes we strand browser processes after starting them, when the machine is very busy
|
2015-01-26 21:09:56 -08:00 |
|
Hunter Stern
|
e9451f88d8
|
Merge branch 'master' of github.com:internetarchive/umbra into ari-3774
|
2015-01-21 16:21:13 -08:00 |
|
Noah Levitt
|
dd9ef50484
|
suppress logging of umbraBehaviorFinished() message which is sent a lot
|
2014-08-01 16:22:45 -07:00 |
|
Hunter Stern
|
6a5d1e2266
|
Disable web security in chromium so iframes on different domains can be accessed by behavior javascript.
|
2014-07-24 16:46:06 -07:00 |
|
Noah Levitt
|
02c054c284
|
do not wait forever for zombie websocket threads (this change should also reveal how we get these sometimes)
|
2014-06-20 18:13:45 -07:00 |
|
Noah Levitt
|
ead46d5716
|
more elaborate dumping of state on SIGQUIT to replace faulthandler
|
2014-06-20 14:05:33 -07:00 |
|
Noah Levitt
|
025db91dea
|
get rid of --browser-wait and --routing-key in favor of sensible defaults, some other tweaks
|
2014-06-11 10:58:08 -07:00 |
|
Noah Levitt
|
a78e60f1da
|
wait for a browser to become available and start it up before reading the next url from amqp; ack the message only after completing the browsing process successfully, and requeue if it's not successful; some refactoring to make the timing work for this
|
2014-06-09 13:15:05 -07:00 |
|
Hunter Stern
|
41270af223
|
Allow flash requests to be detected.
|
2014-06-06 10:47:29 -07:00 |
|
Noah Levitt
|
c2153be288
|
start behaviors again on any Page.loadEventFired, because if we don't do that, we keep asking the page if the behavior thinks it's finished, and it doesn't know what we're talking about
|
2014-06-03 18:06:02 -07:00 |
|
Noah Levitt
|
bfb6cac25f
|
use temp dir as $HOME instead of just chromium user-data-dir, because sometimes we have been seeing chrome print this error message and hang "[1975:2001:0603/215855:ERROR:nss_util.cc(444)] Error initializing NSS with a persistent database (sql:/home/archiveit/.pki/nssdb): NSS error code: -8187"
|
2014-06-03 16:02:00 -07:00 |
|
Noah Levitt
|
1f91018d91
|
even more patience killing chrome, send another sigterms every ten seconds if chrome is still alive
|
2014-06-02 12:09:15 -07:00 |
|
Noah Levitt
|
c6bd2417d7
|
good smarter killing of chrome
|
2014-06-02 11:58:11 -07:00 |
|
Noah Levitt
|
0bcc583b40
|
think it's safer to use a range of ports 9200 thru 9200+n than to try to choose random ports and hold them with socket.bind() (don't know how we can be sure a port is available)
|
2014-05-29 17:55:00 -07:00 |
|
Noah Levitt
|
94c2e4390b
|
debugging to and mitigation for problem "[Errno 98] Address already in use"
|
2014-05-28 18:57:21 -07:00 |
|
Noah Levitt
|
9c08be2699
|
sigterm and sigint both shutdown request shutdown, which stops consuming urls and waits for active browsers to finish; a second sigint/sigterm immediately shuts down active browsers
|
2014-05-24 01:52:22 -07:00 |
|
Noah Levitt
|
b67d9fadf0
|
log ports chose for browsers, and give threads nice names to make logs easier to understand
|
2014-05-23 22:30:25 -07:00 |
|
Noah Levitt
|
2c4ba005b5
|
make umbra amenable to clustering by using a pool of n browsers and removing the browser-clientId affinity (not useful currently since we start a fresh browser instance for each page browsed), and set prefetch_count=1 on amqp consumers to round-robin incoming urls among umbra instances
|
2014-05-23 21:59:34 -07:00 |
|
Noah Levitt
|
d4693b2aba
|
remove unused param to __init__, avoid exception when on_request callback not provided
|
2014-05-20 17:07:42 -07:00 |
|
Noah Levitt
|
8749b97811
|
oops, check in browser.py
|
2014-05-20 03:10:33 -07:00 |
|