Noah Levitt
c6e6b34e82
handle case where websocket connection is unexpectedly closed during the post-behavior phase
2016-07-06 18:17:01 -05:00
Noah Levitt
3bf3c80720
implement timeout and retries to work around issue where sometimes we receive no result message after requesting outlinks
2016-07-06 17:54:36 -05:00
Noah Levitt
01e38ea8c7
oops didn't mean to leave that windows-only subprocess flag
2016-07-01 14:07:04 -05:00
Noah Levitt
79ad57669c
do not send more than one SIGTERM when shutting down browser process, because on recent chromium on linux, the second sigterm abruptly ends the process, and sometimes leaves orphan subprocesses; also send TERM/KILL signals to the whole process group, another measure to avoid orphans; and adjust logging levels for captured chrome output
2016-06-30 17:10:27 -05:00
Noah Levitt
9fd78fdbe8
implement timeout to work around issue where sometimes we receive no result message after requesting scroll to top
2016-06-30 11:45:19 -05:00
Noah Levitt
79beddfc44
set Browser._chrome_instance=None if _chrome_instance.start() throws exception, to avoid endless loop after one failure
2016-06-29 19:47:25 -05:00
Noah Levitt
ffcf26b6c9
undo accidentally committed change to browser startup timeout, and remove now misleading comment about browser ports (see https://github.com/internetarchive/brozzler/pull/3 )
2016-06-29 18:53:32 -05:00
Noah Levitt
479713e25b
--trace level logging
2016-06-29 18:29:45 -05:00
Noah Levitt
772bcf0df6
handle "undefined" in list of frames when extracting outlinks (fixes ARI-4988)
2016-06-28 12:23:32 -05:00
Noah Levitt
0bd687abde
avoid hanging in case a page has no outlinks
2016-06-28 11:25:04 -05:00
Noah Levitt
2038598f41
fix bug in case no outlinks are found, make brozzler.browser.browse_page() return an empty set instead of a set with one element which is an empty string {''}
2016-06-22 17:43:53 +00:00
Noah Levitt
d198a69e45
recurse through all frames to find outlinks
2016-06-22 11:39:31 -05:00
Barbara Miller
1c1237d07e
disable browser extensions
2016-05-27 22:51:38 -07:00
Noah Levitt
317a5eb99d
without sudo, psutil.net_connections() raises psutil.AccessDenied on mac; in this case, silently try running chrome on the unvetted configured port
2016-05-09 17:25:14 -07:00
Adam Miller
1f7f55a14a
browser.py - Fix port search logic
2016-05-05 22:55:45 +00:00
Adam Miller
8e84465ff9
browser.py - Check for open ports before starting Chrome. Open next available on conflict
2016-05-05 22:31:07 +00:00
Noah Levitt
8d618ed135
refactor post-behavior stuff into separate interval function for clarity
2016-05-05 10:37:00 -07:00
Noah Levitt
31356d526a
Merge branch 'master' into AITFIVE-832
...
* master:
copy over latest behaviors and stuff from umbra
support for host rules in outlink scoping
recover from rethinkdb error updating service registry
2016-05-05 10:06:12 -07:00
Noah Levitt
cea192b4b3
copy over latest behaviors and stuff from umbra
2016-05-05 00:58:26 -07:00
Adam Miller
61cec15fff
Restructure browser.py to take screenshot after behavior script.
2016-05-03 22:06:03 +00:00
Noah Levitt
df61e55b6b
add license headers
2016-04-25 20:02:11 +00:00
Noah Levitt
68abb3cb94
log "behavior finished"/"hard timeout" only once
2016-04-21 22:02:50 +00:00
Noah Levitt
7bc726f717
fix bug preventing links from being extracted if hard timeout is reached
2016-04-20 17:24:18 -07:00
Noah Levitt
4874eaccbb
Merge remote-tracking branch 'umbra/master'
...
* umbra/master:
Handle Python to JS boolean conversion
Allow clicking on already clicked element to continue in behaviors if click_until_hard_timeout is set to true
Make Umbra click on 'Load More' button for youtube pages
catch and log exception deleting temporary work directory
update detection of modal close button for facebook changes
Add custom behavior for Brooklyn Museum.
2016-03-07 17:37:12 -08:00
Noah Levitt
343b5c0f82
register with service registry; only start chrome right before using it, so that web console vnc windows aren't always full of about:blank
2015-11-12 02:56:27 +00:00
Noah Levitt
ddce1cdc71
fix mistakenly removed import; try to shut down chrome in case of unexpected exception
2015-08-19 20:04:46 +00:00
Noah Levitt
2533229fa1
add __all__ to modules
2015-08-19 19:01:28 +00:00
Noah Levitt
a878730e02
goodbye sqlite and rabbitmq, hello rethinkdb
2015-08-18 21:44:54 +00:00
Noah Levitt
fc75e18928
handle "aw snap" or "he's dead jim" from chrome
2015-08-11 18:14:53 +00:00
Noah Levitt
ce154fc3db
more robustness improvements
2015-08-10 20:11:46 +00:00
Noah Levitt
a47292dab5
thread to read and selectively log output from chrome
2015-08-07 22:36:07 +00:00
Noah Levitt
e6eeca6ae2
handle 420 Reached limit when fetching robots in brozzler-hq
2015-08-01 17:54:29 +00:00
Noah Levitt
511e19ff4d
handle 420 "Limit reached" when browser receives it
2015-08-01 01:26:59 +00:00
Noah Levitt
11fbbc9d49
change browse-url command to brozzle-page, which does some more stuff as if it were in brozzler, like youtube_dl, warcprox features, etc
2015-07-31 00:03:13 +00:00
Noah Levitt
f9c049a69e
navigate to about:blank before the real url to avoid situation where we navigate to the same page that we're currently on, perhaps with a different #fragment, which prevents Page.loadEventFired from happening
2015-07-21 20:39:19 +00:00
Noah Levitt
b5cb94fc8b
some additional logging and error handling to avoid mysterious messages
2015-07-21 06:33:02 +00:00
Noah Levitt
2ba5bd4d4b
support adding extra http request headers
2015-07-17 13:45:27 -07:00
Noah Levitt
d2650a2547
update scope if seed redirects
2015-07-16 18:27:47 -07:00
Noah Levitt
140a441eb5
honor site proxy setting; remove brozzler-worker options that are now configured at the site level (and in the case of ignore_cert_errors, always on, no longer an option); use "reppy" library for robots.txt handling; fix some bugs
2015-07-16 17:19:12 -07:00
Noah Levitt
fd0c3322ee
update readme, s/umbra/brozzler/ in most places, delete non-brozzler stuff
2015-07-13 17:09:39 -07:00