147 Commits

Author SHA1 Message Date
Noah Levitt
6c69b68771 organize imports, tweak command line args 2014-05-20 17:10:41 -07:00
Noah Levitt
d4693b2aba remove unused param to __init__, avoid exception when on_request callback not provided 2014-05-20 17:07:42 -07:00
Noah Levitt
99d219dfda not sure why /bin/ et al were in .gitignore... replace with a couple of useful things 2014-05-20 17:06:26 -07:00
Noah Levitt
1e18c2ca74 improve helper utilities 2014-05-20 16:44:13 -07:00
Noah Levitt
8749b97811 oops, check in browser.py 2014-05-20 03:10:33 -07:00
Noah Levitt
b59e76a5b9 clean shutdown without draining entire amqp queue (only consume urls from amqp when browser activity isn't saturated) 2014-05-20 03:02:48 -07:00
Noah Levitt
3e4232f32c refactor umbra.py into controller.py and browser.py, improve class names 2014-05-20 02:42:40 -07:00
Noah Levitt
6fdcdd0bf0 configurable max number of instances of chrome simultaneously browsing pages (default=3); close and reopen connection to amqp every 15 minutes (consumer only); increase default browser wait to 60 sec 2014-05-20 01:09:11 -07:00
Noah Levitt
cc0ffee508 only websocket-client-py3==0.13.1 works right with python3 at the moment, see https://github.com/liris/websocket-client/issues/84 2014-05-20 00:57:07 -07:00
Eldon
154eb6f334 Merge pull request #22 from nlevitt/master
whole bunch of changes (already deployed on QA)
2014-05-06 09:13:56 -04:00
Noah Levitt
05e673917d "wasThrown" is necessarily always included in the result message from chrome for Runtime.evaluate 2014-05-05 19:58:41 -07:00
Noah Levitt
93b16f28b9 improve facebook behavior: when we expect a "close" button to appear, wait for it before moving on to other actions; and when we discover a missed click target above, scroll back up to click on it 2014-05-05 18:39:16 -07:00
Noah Levitt
fa6e3eebb2 clear UmbraWorker.self._behavior when finished with a page (after the first page, nothing was getting behaviors); bump hard timeout to 20 minutes 2014-05-05 18:37:39 -07:00
Noah Levitt
55fad80553 UmbraWorker.send_to_chrome() - central place to send message to chrome via websocket 2014-05-05 12:26:39 -07:00
Noah Levitt
a62a07e6b7 change magic first line of behavior js files to a commented-out json blob, which should include the fields 'url_regex' and 'request_idle_timeout_sec'; behavior.is_finished() incorporates the custom idle timeout into its check; also rename variables in behavior scripts with umbra/UMBRA_ prefix to sort of namespace them; and add "finished" logic to facebook and vimeo behaviors (flickr needs work to support it) 2014-05-05 11:58:55 -07:00
Noah Levitt
2a9633ad77 Bunch of improvements, most importantly a default fallback behavior script which scrolls to the bottom of the page, and rearchitecting some stuff so that the behavior script can have some say on when it's finished with the page. Also some doc comments. 2014-05-04 21:33:13 -07:00
Adam Miller
602459bb42 Merge pull request #21 from nlevitt/disable-google-analytics
disable google analytics by setting a breakpoint in www.google-analytics...
2014-05-02 18:32:35 -07:00
Noah Levitt
8679ee0ea7 disable google analytics by setting a breakpoint in www.google-analytics.com/analytics.js and replacing the content of that script when the breakpoint is hit 2014-05-02 18:30:28 -07:00
Noah Levitt
d6b696ded8 Merge pull request #20 from adam-miller/master
Removing first run ui checks
2014-05-02 17:42:53 -07:00
Adam Miller
9cf20f195c Removing first run ui checks 2014-05-02 17:37:10 -07:00
Eldon
e7353fbb4b Merge pull request #19 from nlevitt/ari-3814
ARI-3814 try to recover from rabbitmq communication problems
2014-04-09 13:25:22 -04:00
Noah Levitt
89e41e7c82 remove exception raised for testing 2014-04-07 11:45:54 -07:00
Noah Levitt
aacb886b62 ARI-3814 try to recover from rabbitmq communication problems 2014-04-07 11:45:12 -07:00
Eldon
4e72cbae58 Merge pull request #18 from nlevitt/ari-3771
to address ARI-3771 "Lasalle Facebook last scrolldown doesn't work", scr...
2014-04-04 16:04:38 -04:00
Eldon
beeb4a2a2c Merge pull request #17 from nlevitt/ari-3811
thread dump on SIGQUIT a la java
2014-04-04 15:21:41 -04:00
Noah Levitt
be9115fd11 to address ARI-3771 "Lasalle Facebook last scrolldown doesn't work", scroll by 200 pixels each time instead of 100 on facebook, which avoids hitting the 15 second idle timeout in my tests; also detect when unclicked targets are above the screen/viewport and not below and log it as such, instead of trying to continue scrolling down 2014-04-04 12:16:00 -07:00
Noah Levitt
da975bc586 thread dump on SIGQUIT a la java 2014-04-03 21:19:08 -07:00
Eldon
e1c297269c Merge pull request #15 from nlevitt/master
setup.py - include behaviors.d/*.js in installation
2014-03-13 11:09:34 -04:00
Noah Levitt
f3a540b92d setup.py - include behaviors.d/*.js in installation 2014-03-13 00:00:32 -07:00
vonrosen
b3bd959ab2 Merge pull request #14 from eldondev/master
Check to see if the object has a click method before calling it
2014-03-10 12:01:20 -07:00
Eldon
427b74ebfc Check to see if the object has a click method before calling it 2014-03-10 14:58:16 -04:00
vonrosen
a16ce4abeb Merge pull request #13 from nlevitt/master
facebook, logging, timeout tweaks
2014-03-09 16:47:19 -07:00
Noah Levitt
3fd792fddb lengthen timeouts and improve timeout handling; log js console messages from browser 2014-03-07 19:39:27 -08:00
Noah Levitt
5637e7111f use *rel=["theater"] to click on photos and videos that won't navigate to a new page; don't click on comments links for now, since it might interfere with other stuff; more verbose logging of click targets 2014-03-07 19:37:43 -08:00
vonrosen
a0f8474a73 Merge pull request #12 from nlevitt/master
vimeo, tweaks
2014-03-07 11:32:14 -08:00
Noah Levitt
5a7a24083f simplify checking for *.js 2014-03-07 11:29:43 -08:00
Noah Levitt
a30b5d8dd2 only reset idle timer on Network.requestWillBeSent instead of all events (otherwise long-running videos keep the browser open unnecessarily) 2014-03-06 18:35:04 -08:00
Noah Levitt
9d9014c864 start the hard stop timer 2014-03-06 18:32:30 -08:00
Noah Levitt
52db581a3c restore logging 2014-03-06 18:25:46 -08:00
Noah Levitt
12d66982d1 only load behaviors files named like *.js (avoids vim .swp files and stuff); tweak logging 2014-03-06 18:25:35 -08:00
Noah Levitt
9cb9172a4d behavior for vimeo - click on <video> elements 2014-03-06 18:24:12 -08:00
Noah Levitt
9848c41d5f make regexes the same that crawlman puts in crawler-beans.cxml 2014-03-06 18:23:31 -08:00
vonrosen
5b1992a8c0 Merge pull request #11 from eldondev/master
Convert behaviors to independent, runnable javascript files, hard timeout on pages
2014-03-06 11:08:45 -08:00
Eldon
393df3f16e Update behaviors for facebook theater 2014-03-05 23:44:52 -05:00
Eldon
f2f78d2ced Convert from one big json file, to js files with a regex as a comment at the top. 2014-03-05 23:19:09 -05:00
Eldon
4c22891093 Merge pull request #10 from nlevitt/master
remove unused function
2014-02-25 17:35:26 -05:00
Noah Levitt
b763d6550f remove unused function 2014-02-25 14:26:10 -08:00
Eldon
b4675a7cd2 Merge pull request #9 from nlevitt/master
add behaviors, handle multiple browsers
2014-02-25 16:23:27 -05:00
Noah Levitt
11da122ec2 remove old commented out line of code 2014-02-18 13:20:18 -08:00
Noah Levitt
b96d8856d4 create temp dir for user profile rather than rely on --temp-profile 2014-02-14 19:45:16 -08:00