Noah Levitt
bc6e0d243f
yet more refactoring of browser.py, clearer separation of purpose, Browser class manages browsing, sends most of the messages to chrome, WebsockReceiverThread handles messages that come back from chrome
2016-12-16 13:52:12 -08:00
Noah Levitt
534d2e63d6
bump version number in setup.py
2016-12-15 16:43:27 -08:00
Noah Levitt
c71854127d
major refactoring of browsing code to make it easier to add functionality
2016-12-15 16:42:45 -08:00
Noah Levitt
ef8bc83928
Merge branch 'refactor-browsing' into qa
...
* refactor-browsing:
don't log every little message from chrome
2016-12-15 13:21:38 -08:00
Noah Levitt
cb6a00f4f0
don't log every little message from chrome
2016-12-15 13:21:30 -08:00
Noah Levitt
f6333df6ef
back to dev version number
2016-12-15 12:34:26 -08:00
Noah Levitt
85de2fad6a
i dub thee 1.1b8
2016-12-15 12:33:34 -08:00
Noah Levitt
7a68599057
Merge branch 'refactor-browsing' into qa
...
* refactor-browsing:
more shutdown tweaks
improving shutdown process
working on major refactoring of browser management
2016-12-15 12:28:21 -08:00
Noah Levitt
4186869bf9
Merge branch 'master' into qa
...
* master:
fix bug handling page with zero outlinks
avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something)
brozzler logo svg with small default size
travis-ci slack integration
fix _find_available_port and its unit test
little fixes
avoid broken version of websocket-client to fix https://github.com/internetarchive/brozzler/issues/28
wrong branch of warcprox in ansible install
move cookie db management code into chrome.py
move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test
split Chrome class into its own module
new utility brozzler-list-captures for looking up entries in the "captures" table
2016-12-15 12:07:29 -08:00
Noah Levitt
4bdad4729a
more shutdown tweaks
2016-12-14 16:13:14 -08:00
Noah Levitt
5fa96b6438
improving shutdown process
2016-12-14 14:49:41 -08:00
Noah Levitt
f23f928c16
working on major refactoring of browser management
2016-12-09 16:50:11 -08:00
Noah Levitt
d68053764c
fix bug handling page with zero outlinks
2016-12-09 16:43:23 -08:00
Noah Levitt
af1e1c75ec
avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something)
2016-12-09 14:16:27 -08:00
Noah Levitt
f6a25aa4f0
brozzler logo svg with small default size
2016-12-08 15:16:02 -08:00
Noah Levitt
40b4d9bfe8
travis-ci slack integration
2016-12-07 14:46:29 -08:00
Noah Levitt
9bcec54f4b
fix _find_available_port and its unit test
2016-12-07 14:08:34 -08:00
Noah Levitt
eed8b9ec30
little fixes
2016-12-07 11:20:10 -08:00
Noah Levitt
0b6c5346bd
avoid broken version of websocket-client to fix https://github.com/internetarchive/brozzler/issues/28
2016-12-07 11:18:41 -08:00
Noah Levitt
e250c4ca89
wrong branch of warcprox in ansible install
2016-12-07 09:33:06 -08:00
Noah Levitt
d3063fbd2b
move cookie db management code into chrome.py
2016-12-06 18:04:51 -08:00
Noah Levitt
ce03381b92
move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test
2016-12-06 17:12:20 -08:00
Noah Levitt
74009852d6
split Chrome class into its own module
2016-12-06 12:50:38 -08:00
Noah Levitt
3c43fdaced
new utility brozzler-list-captures for looking up entries in the "captures" table
2016-11-30 00:52:14 +00:00
Noah Levitt
2eea50dcfb
Merge branch 'master' into qa
...
* master:
in warcprox 2.0b2, captures table field has been renamed to "record_length"
remove flickr behavior, flickr is better off with the default behavior for now
Update README.rst
add travis-ci badge
2016-11-21 16:21:30 -08:00
Noah Levitt
9567c088c8
in warcprox 2.0b2, captures table field has been renamed to "record_length"
2016-11-21 16:21:21 -08:00
Noah Levitt
55c9ae07b7
remove flickr behavior, flickr is better off with the default behavior for now
2016-11-16 17:16:48 -08:00
Noah Levitt
899ee8a8dd
Update README.rst
2016-11-16 12:26:50 -08:00
Noah Levitt
6bb9d68dce
add travis-ci badge
2016-11-16 12:26:33 -08:00
Noah Levitt
eaa32ad3fc
Merge branch 'master' into qa
...
* master:
don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence
robots.txt for testing
monkey-patch reppy to support substring user-agent matching
give vagrant vm enough memory so that tests pass consistently
need warcprox to listen on public address because that's what it puts in the service registry
looks like the problem may have been a bug in ansible 2.2.0.0, so pin to 2.1.3.0
2016-11-16 12:24:30 -08:00
Noah Levitt
72816d1058
don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence
2016-11-16 12:23:59 -08:00
Noah Levitt
24cc8377fb
robots.txt for testing
2016-11-16 12:12:17 -08:00
Noah Levitt
3aead6de93
monkey-patch reppy to support substring user-agent matching
2016-11-16 11:41:34 -08:00
Noah Levitt
398871d46b
give vagrant vm enough memory so that tests pass consistently
2016-11-14 18:26:00 -08:00
Noah Levitt
2b0a47c914
Merge pull request #27 from internetarchive/i2
...
update Instagram behavior, mostly css selectors
2016-11-14 12:40:55 -08:00
Noah Levitt
a74247412c
need warcprox to listen on public address because that's what it puts in the service registry
2016-11-14 10:03:40 -08:00
Noah Levitt
c9b45a7e76
looks like the problem may have been a bug in ansible 2.2.0.0, so pin to 2.1.3.0
2016-11-14 09:58:13 -08:00
Barbara Miller
e01739743f
Merge branch 'i2' into qa
2016-11-14 09:25:58 -08:00
Barbara Miller
12a054e6dc
update behavior, mostly css selectors
2016-11-14 09:20:40 -08:00
Noah Levitt
28b010a2ba
back to dev version number
2016-11-11 14:58:55 -08:00
Noah Levitt
7aca046905
1.1b7
2016-11-11 14:58:07 -08:00
Barbara Miller
eb3fad9c84
cp feature branch instagram.js
2016-11-11 14:51:11 -08:00
Barbara Miller
54ec6cf15b
Merge branch 'i2' into qa
2016-11-11 14:44:10 -08:00
Barbara Miller
bb9334d757
jslint edits
2016-11-11 14:21:08 -08:00
Barbara Miller
d162a85a65
update markup, & simplify big image browse?
2016-11-11 14:21:08 -08:00
Noah Levitt
a80d6bcc9a
Merge branch 'master' into qa
...
* master:
use \n to delimit outlinks because urls can contain spaces (and anything else except [\n\t\0]) in the fragment part even after browser canonicalization
2016-11-11 14:19:37 -08:00
Noah Levitt
26b571219b
use \n to delimit outlinks because urls can contain spaces (and anything else except [\n\t\0]) in the fragment part even after browser canonicalization
2016-11-11 14:14:47 -08:00
Barbara Miller
7093e66360
Merge branch 'i2' into qa
2016-11-11 13:34:44 -08:00
Barbara Miller
51dfb2a899
jslint edits
2016-11-11 13:33:09 -08:00
Barbara Miller
3c3a09f5c0
Merge branch 'i2' into qa
2016-11-10 17:21:33 -08:00