337 Commits

Author SHA1 Message Date
Noah Levitt
534d2e63d6 bump version number in setup.py 2016-12-15 16:43:27 -08:00
Noah Levitt
f6333df6ef back to dev version number 2016-12-15 12:34:26 -08:00
Noah Levitt
85de2fad6a i dub thee 1.1b8 2016-12-15 12:33:34 -08:00
Noah Levitt
d68053764c fix bug handling page with zero outlinks 2016-12-09 16:43:23 -08:00
Noah Levitt
af1e1c75ec avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something) 2016-12-09 14:16:27 -08:00
Noah Levitt
f6a25aa4f0 brozzler logo svg with small default size 2016-12-08 15:16:02 -08:00
Noah Levitt
40b4d9bfe8 travis-ci slack integration 2016-12-07 14:46:29 -08:00
Noah Levitt
9bcec54f4b fix _find_available_port and its unit test 2016-12-07 14:08:34 -08:00
Noah Levitt
eed8b9ec30 little fixes 2016-12-07 11:20:10 -08:00
Noah Levitt
0b6c5346bd avoid broken version of websocket-client to fix https://github.com/internetarchive/brozzler/issues/28 2016-12-07 11:18:41 -08:00
Noah Levitt
e250c4ca89 wrong branch of warcprox in ansible install 2016-12-07 09:33:06 -08:00
Noah Levitt
d3063fbd2b move cookie db management code into chrome.py 2016-12-06 18:04:51 -08:00
Noah Levitt
ce03381b92 move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test 2016-12-06 17:12:20 -08:00
Noah Levitt
74009852d6 split Chrome class into its own module 2016-12-06 12:50:38 -08:00
Noah Levitt
3c43fdaced new utility brozzler-list-captures for looking up entries in the "captures" table 2016-11-30 00:52:14 +00:00
Noah Levitt
9567c088c8 in warcprox 2.0b2, captures table field has been renamed to "record_length" 2016-11-21 16:21:21 -08:00
Noah Levitt
55c9ae07b7 remove flickr behavior, flickr is better off with the default behavior for now 2016-11-16 17:16:48 -08:00
Noah Levitt
72816d1058 don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence 2016-11-16 12:23:59 -08:00
Noah Levitt
3aead6de93 monkey-patch reppy to support substring user-agent matching 2016-11-16 11:41:34 -08:00
Noah Levitt
398871d46b give vagrant vm enough memory so that tests pass consistently 2016-11-14 18:26:00 -08:00
Noah Levitt
a74247412c need warcprox to listen on public address because that's what it puts in the service registry 2016-11-14 10:03:40 -08:00
Noah Levitt
28b010a2ba back to dev version number 2016-11-11 14:58:55 -08:00
Noah Levitt
7aca046905 1.1b7 2016-11-11 14:58:07 -08:00
Noah Levitt
26b571219b use \n to delimit outlinks because urls can contain spaces (and anything else except [\n\t\0]) in the fragment part even after browser canonicalization 2016-11-11 14:14:47 -08:00
Noah Levitt
02bf23059e pass behavior_parameters from job configuration into Site objects 2016-11-09 13:43:10 -08:00
Noah Levitt
8e115b44fa add --behavior-parameters argument to brozzler-new-site 2016-11-09 13:12:36 -08:00
Noah Levitt
953e50d9a6 fix bug in final_bounces (not sure what I was thinking) 2016-11-09 13:12:14 -08:00
Noah Levitt
054cb255ac cat logs on travis-ci failure 2016-11-08 14:26:12 -08:00
Noah Levitt
125a31165a reppy 0.4.1 has a significantly different api apparently, so for now let's go back to 0.3.4 2016-11-08 14:11:46 -08:00
Noah Levitt
fe18d915f5 still trying to get installation of pip to work on travis-ci 2016-11-08 13:50:12 -08:00
Noah Levitt
f10b4c71e6 update for reppy api change and pin to current version of reppy 2016-11-08 13:39:32 -08:00
Noah Levitt
cba5fa4a0b tweaks to ansible config to try to get the deployment to run on travis-ci 2016-11-08 13:31:52 -08:00
Noah Levitt
9d66f294ec move behavior_parameters into top level of site configuration 2016-11-07 18:16:04 -08:00
Noah Levitt
abca90a128 install the virtualenv package with pip because the apt version is old and conflicts with the recent version of pip we're using 2016-11-07 17:51:43 -08:00
Noah Levitt
99feeab581 logging tweak 2016-11-04 17:53:02 -07:00
Noah Levitt
5ac8994a24 rename webconsole to dashboard 2016-11-04 17:46:23 -07:00
Noah Levitt
5bd4908e1d punycode host part of url to avoid errors doing WARCPROX_WRITE_RECORD 2016-10-26 13:50:23 -07:00
Noah Levitt
f30c143c66 avoid exception in case of url without host part 2016-10-26 12:45:24 -07:00
Noah Levitt
332912acd7 apparently response.status doesn't work sometimes; response.getcode() is documented so hopefully it keeps working 2016-10-25 17:50:49 -07:00
Noah Levitt
70ce642bee integer job ids are permitted as well as string 2016-10-21 21:25:16 +00:00
Noah Levitt
21891476c4 avoid use of __double_underscore member variables because they're special https://shahriar.svbtle.com/underscores-in-python 2016-10-18 18:57:11 -07:00
Noah Levitt
becd832ea3 bump version after merging accept-encoding pull request 2016-10-18 17:55:00 -07:00
Noah Levitt
aae34452f5 bump version number after merging travis-ci pull request 2016-10-18 17:48:45 -07:00
Noah Levitt
68a32fcbe2 bump version number after mouse's pull request 2016-10-18 17:45:55 -07:00
Noah Levitt
a370e7b987 tiny fix, and now the test passes for me 2016-10-14 19:21:26 -07:00
Noah Levitt
4044fcb647 fix pywb/brozzler replay of revisit records 2016-10-14 19:15:23 -07:00
Noah Levitt
27452990ee toward getting initial tests to pass 2016-10-14 18:26:48 -07:00
Noah Levitt
5a373466a3 some vagrant/ansible fixes 2016-10-14 13:47:54 -07:00
Noah Levitt
3627209be1 move ansible directory to top level; generalize formerly vagrant-specific ansible configuration; let upstart manage logging with "console log" 2016-10-13 17:21:55 -07:00
Noah Levitt
56e651baeb working on basic integration tests 2016-10-13 17:12:35 -07:00