Commit graph

190 commits

Author SHA1 Message Date
Noah Levitt
f7427219cf restore handling of "aw snap" or "he's dead jim" 2016-12-21 14:21:20 -08:00
Noah Levitt
a0b61408b9 convert behaviors to jinja2, move them to new subdir js-templates, along with javascript previously stored as a string in browser.py 2016-12-20 16:33:25 -08:00
Noah Levitt
06fd0a0d79 add hack for submitting a login form containing an element with name or id "submit", which masks the form submit() method 2016-12-20 11:24:26 -08:00
Noah Levitt
2f8f20bbb4 detect <input type="email"> as potential username field for login 2016-12-19 18:08:10 -08:00
Noah Levitt
86ac48d6c3 generalized support for login doing automatic detection of login form on a page 2016-12-19 17:30:09 -08:00
Noah Levitt
bc6e0d243f yet more refactoring of browser.py, clearer separation of purpose, Browser class manages browsing, sends most of the messages to chrome, WebsockReceiverThread handles messages that come back from chrome 2016-12-16 13:52:12 -08:00
Noah Levitt
c71854127d major refactoring of browsing code to make it easier to add functionality 2016-12-15 16:42:45 -08:00
Noah Levitt
ef8bc83928 Merge branch 'refactor-browsing' into qa
* refactor-browsing:
  don't log every little message from chrome
2016-12-15 13:21:38 -08:00
Noah Levitt
cb6a00f4f0 don't log every little message from chrome 2016-12-15 13:21:30 -08:00
Noah Levitt
7a68599057 Merge branch 'refactor-browsing' into qa
* refactor-browsing:
  more shutdown tweaks
  improving shutdown process
  working on major refactoring of browser management
2016-12-15 12:28:21 -08:00
Noah Levitt
4186869bf9 Merge branch 'master' into qa
* master:
  fix bug handling page with zero outlinks
  avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something)
  brozzler logo svg with small default size
  travis-ci slack integration
  fix _find_available_port and its unit test
  little fixes
  avoid broken version of websocket-client to fix https://github.com/internetarchive/brozzler/issues/28
  wrong branch of warcprox in ansible install
  move cookie db management code into chrome.py
  move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test
  split Chrome class into its own module
  new utility brozzler-list-captures for looking up entries in the "captures" table
2016-12-15 12:07:29 -08:00
Noah Levitt
4bdad4729a more shutdown tweaks 2016-12-14 16:13:14 -08:00
Noah Levitt
5fa96b6438 improving shutdown process 2016-12-14 14:49:41 -08:00
Noah Levitt
f23f928c16 working on major refactoring of browser management 2016-12-09 16:50:11 -08:00
Noah Levitt
d68053764c fix bug handling page with zero outlinks 2016-12-09 16:43:23 -08:00
Noah Levitt
d3063fbd2b move cookie db management code into chrome.py 2016-12-06 18:04:51 -08:00
Noah Levitt
ce03381b92 move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test 2016-12-06 17:12:20 -08:00
Noah Levitt
74009852d6 split Chrome class into its own module 2016-12-06 12:50:38 -08:00
Noah Levitt
a80d6bcc9a Merge branch 'master' into qa
* master:
  use \n to delimit outlinks because urls can contain spaces (and anything else except [\n\t\0]) in the fragment part even after browser canonicalization
2016-11-11 14:19:37 -08:00
Noah Levitt
26b571219b use \n to delimit outlinks because urls can contain spaces (and anything else except [\n\t\0]) in the fragment part even after browser canonicalization 2016-11-11 14:14:47 -08:00
Noah Levitt
0eb07c9ca2 Merge branch 'master' into qa
* master:
  pass behavior_parameters from job configuration into Site objects
  add --behavior-parameters argument to brozzler-new-site
  fix bug in final_bounces (not sure what I was thinking)
  restore accidentally removed functionality handling page redirects and friends
  cat logs on travis-ci failure
  reppy 0.4.1 has a significantly different api apparently, so for now let's go back to 0.3.4
  still trying to get installation of pip to work on travis-ci
  update for reppy api change and pin to current version of reppy
  tweaks to ansible config to try to get the deployment to run on travis-ci
2016-11-09 13:43:24 -08:00
Noah Levitt
8889e4ab20 restore accidentally removed functionality handling page redirects and friends 2016-11-08 18:17:48 -08:00
Barbara Miller
49e7d81239 just test gzip,deflate after all 2016-10-18 22:04:38 -07:00
Barbara Miller
49e38f608a test extra headers 2016-10-18 22:04:38 -07:00
Barbara Miller
6a23ab9f7a correct local_state 2016-10-18 22:03:16 -07:00
Barbara Miller
108503cd2a try saving only Brotli conifg 2016-10-18 22:03:16 -07:00
Barbara Miller
f2bb1f44cd disable SDCH? 2016-10-18 22:03:16 -07:00
Barbara Miller
91a1b03508 make data_dir 2016-10-18 22:03:16 -07:00
Barbara Miller
7a104d317a simplify to troubleshoot 2016-10-18 22:03:16 -07:00
Barbara Miller
7291f4684c better variable name 2016-10-18 22:03:16 -07:00
Barbara Miller
e461627300 read/write chromium local state 2016-10-18 22:03:16 -07:00
Barbara Miller
4dfda90df2 clean up Browser dirs and add flags_location 2016-10-18 22:03:16 -07:00
Barbara Miller
dd4fce8763 read/write chromium local state 2016-10-18 22:03:16 -07:00
Barbara Miller
5b30d644b9 clean up Browser dirs and add flags_location 2016-10-18 22:03:16 -07:00
Barbara Miller
0a6d8ed3da Accept-Encoding fix II 2016-10-17 12:50:31 -07:00
Alex Osborne
743b5a4347 Add user_agent option
Currently doesn't apply to requests made by youtube-dl as I
couldn't see a thread-safe way of doing that.
2016-10-05 04:25:09 +11:00
Noah Levitt
0e096dd4e4 don't try to read the browser's cookie database if the browser hasn't been started (which can happen if the page is simply fetched rather than browsed because it's not html) 2016-10-03 15:03:08 -07:00
Noah Levitt
1c5c9417d2 avoid "Uncaught TypeError: Cannot read property 'querySelectorAll' of undefined" from outlinks script 2016-08-25 13:10:30 -07:00
Noah Levitt
20f9934dd9 avoid "Uncaught RangeError: Maximum call stack size exceeded" compiling outlinks 2016-08-04 17:33:06 -07:00
Noah Levitt
e62055d7d6 logging tweak 2016-08-04 15:54:05 -07:00
Noah Levitt
cfc18e6845 add docstring to _chain_chrome_messages, remove debug logging, tweak name of websock thread 2016-07-28 20:29:11 -05:00
Noah Levitt
2046ee36e0 add a timeout to the one post-behavior step that didn't already have one (getting a screenshot), and majorly refactored the post-behavior code to incorporate timeouts automatically into each step, and hopefully make it easier to follow 2016-07-28 19:59:28 -05:00
Noah Levitt
b2b07b79a9 logging tweaks 2016-07-28 10:19:30 -05:00
Noah Levitt
dd2d8c89e3 reduce log level of messages from chrome, since it spews stuff that looks bad but usually isn't 2016-07-27 18:48:13 -05:00
Noah Levitt
c4bdb6c1fd pass behavior template parameters on to behavior - fixes umbra's ability to log in with parameters received from amqp 2016-07-26 19:47:09 -05:00
Adam Miller
c2dc2fee2a Changing EnvironmentError to OSError 2016-07-26 00:46:16 +00:00
Adam Miller
77dabd4057 Fix naming conventions. 2016-07-26 00:39:50 +00:00
Adam Miller
2029964a74 Create cookie directory if it doesn't exist. Add debug messages for cookie db read/write. 2016-07-25 23:36:14 +00:00
Adam Miller
1cb6653fab Read/Write Cookie DB file when creating and stopping browser instance. 2016-07-22 00:22:28 +00:00
Noah Levitt
c902a70450 tweak thread names 2016-07-19 14:33:57 -05:00