Noah Levitt
|
f7427219cf
|
restore handling of "aw snap" or "he's dead jim"
|
2016-12-21 14:21:20 -08:00 |
|
Noah Levitt
|
a5d48a9fdb
|
add seed username/password parameters to job config schema
|
2016-12-20 18:06:20 -08:00 |
|
Noah Levitt
|
86d6060a2d
|
loosen the find_available_port test slightly, since it seems to be not 100% predictable for reasons i haven't investigated
|
2016-12-20 17:52:21 -08:00 |
|
Noah Levitt
|
edf0a3a50d
|
convert mouseovers and simpleclicks to jinja2
|
2016-12-20 17:34:29 -08:00 |
|
Noah Levitt
|
e2dbf68ccd
|
remove obsolete facebook login code
|
2016-12-20 16:38:11 -08:00 |
|
Noah Levitt
|
a0b61408b9
|
convert behaviors to jinja2, move them to new subdir js-templates, along with javascript previously stored as a string in browser.py
|
2016-12-20 16:33:25 -08:00 |
|
Noah Levitt
|
06fd0a0d79
|
add hack for submitting a login form containing an element with name or id "submit", which masks the form submit() method
|
2016-12-20 11:24:26 -08:00 |
|
Noah Levitt
|
b24b229cb2
|
how did i miss this file?
|
2016-12-20 11:13:48 -08:00 |
|
Noah Levitt
|
7a40822e64
|
forgot to git add new test data
|
2016-12-19 18:10:07 -08:00 |
|
Noah Levitt
|
2f8f20bbb4
|
detect <input type="email"> as potential username field for login
|
2016-12-19 18:08:10 -08:00 |
|
Noah Levitt
|
86ac48d6c3
|
generalized support for login doing automatic detection of login form on a page
|
2016-12-19 17:30:09 -08:00 |
|
Noah Levitt
|
bc6e0d243f
|
yet more refactoring of browser.py, clearer separation of purpose, Browser class manages browsing, sends most of the messages to chrome, WebsockReceiverThread handles messages that come back from chrome
|
2016-12-16 13:52:12 -08:00 |
|
Noah Levitt
|
534d2e63d6
|
bump version number in setup.py
|
2016-12-15 16:43:27 -08:00 |
|
Noah Levitt
|
c71854127d
|
major refactoring of browsing code to make it easier to add functionality
|
2016-12-15 16:42:45 -08:00 |
|
Noah Levitt
|
f6333df6ef
|
back to dev version number
|
2016-12-15 12:34:26 -08:00 |
|
Noah Levitt
|
85de2fad6a
|
i dub thee 1.1b8
1.1b8
|
2016-12-15 12:33:34 -08:00 |
|
Noah Levitt
|
d68053764c
|
fix bug handling page with zero outlinks
|
2016-12-09 16:43:23 -08:00 |
|
Noah Levitt
|
af1e1c75ec
|
avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something)
|
2016-12-09 14:16:27 -08:00 |
|
Noah Levitt
|
f6a25aa4f0
|
brozzler logo svg with small default size
|
2016-12-08 15:16:02 -08:00 |
|
Noah Levitt
|
40b4d9bfe8
|
travis-ci slack integration
|
2016-12-07 14:46:29 -08:00 |
|
Noah Levitt
|
9bcec54f4b
|
fix _find_available_port and its unit test
|
2016-12-07 14:08:34 -08:00 |
|
Noah Levitt
|
eed8b9ec30
|
little fixes
|
2016-12-07 11:20:10 -08:00 |
|
Noah Levitt
|
0b6c5346bd
|
avoid broken version of websocket-client to fix https://github.com/internetarchive/brozzler/issues/28
|
2016-12-07 11:18:41 -08:00 |
|
Noah Levitt
|
e250c4ca89
|
wrong branch of warcprox in ansible install
|
2016-12-07 09:33:06 -08:00 |
|
Noah Levitt
|
d3063fbd2b
|
move cookie db management code into chrome.py
|
2016-12-06 18:04:51 -08:00 |
|
Noah Levitt
|
ce03381b92
|
move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test
|
2016-12-06 17:12:20 -08:00 |
|
Noah Levitt
|
74009852d6
|
split Chrome class into its own module
|
2016-12-06 12:50:38 -08:00 |
|
Noah Levitt
|
3c43fdaced
|
new utility brozzler-list-captures for looking up entries in the "captures" table
|
2016-11-30 00:52:14 +00:00 |
|
Noah Levitt
|
9567c088c8
|
in warcprox 2.0b2, captures table field has been renamed to "record_length"
|
2016-11-21 16:21:21 -08:00 |
|
Noah Levitt
|
55c9ae07b7
|
remove flickr behavior, flickr is better off with the default behavior for now
|
2016-11-16 17:16:48 -08:00 |
|
Noah Levitt
|
899ee8a8dd
|
Update README.rst
|
2016-11-16 12:26:50 -08:00 |
|
Noah Levitt
|
6bb9d68dce
|
add travis-ci badge
|
2016-11-16 12:26:33 -08:00 |
|
Noah Levitt
|
72816d1058
|
don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence
|
2016-11-16 12:23:59 -08:00 |
|
Noah Levitt
|
24cc8377fb
|
robots.txt for testing
|
2016-11-16 12:12:17 -08:00 |
|
Noah Levitt
|
3aead6de93
|
monkey-patch reppy to support substring user-agent matching
|
2016-11-16 11:41:34 -08:00 |
|
Noah Levitt
|
398871d46b
|
give vagrant vm enough memory so that tests pass consistently
|
2016-11-14 18:26:00 -08:00 |
|
Noah Levitt
|
2b0a47c914
|
Merge pull request #27 from internetarchive/i2
update Instagram behavior, mostly css selectors
|
2016-11-14 12:40:55 -08:00 |
|
Noah Levitt
|
a74247412c
|
need warcprox to listen on public address because that's what it puts in the service registry
|
2016-11-14 10:03:40 -08:00 |
|
Noah Levitt
|
c9b45a7e76
|
looks like the problem may have been a bug in ansible 2.2.0.0, so pin to 2.1.3.0
|
2016-11-14 09:58:13 -08:00 |
|
Barbara Miller
|
12a054e6dc
|
update behavior, mostly css selectors
|
2016-11-14 09:20:40 -08:00 |
|
Noah Levitt
|
28b010a2ba
|
back to dev version number
|
2016-11-11 14:58:55 -08:00 |
|
Noah Levitt
|
7aca046905
|
1.1b7
1.1b7
|
2016-11-11 14:58:07 -08:00 |
|
Noah Levitt
|
26b571219b
|
use \n to delimit outlinks because urls can contain spaces (and anything else except [\n\t\0]) in the fragment part even after browser canonicalization
|
2016-11-11 14:14:47 -08:00 |
|
Noah Levitt
|
02bf23059e
|
pass behavior_parameters from job configuration into Site objects
|
2016-11-09 13:43:10 -08:00 |
|
Noah Levitt
|
8e115b44fa
|
add --behavior-parameters argument to brozzler-new-site
|
2016-11-09 13:12:36 -08:00 |
|
Noah Levitt
|
953e50d9a6
|
fix bug in final_bounces (not sure what I was thinking)
|
2016-11-09 13:12:14 -08:00 |
|
Noah Levitt
|
8889e4ab20
|
restore accidentally removed functionality handling page redirects and friends
|
2016-11-08 18:17:48 -08:00 |
|
Noah Levitt
|
054cb255ac
|
cat logs on travis-ci failure
|
2016-11-08 14:26:12 -08:00 |
|
Noah Levitt
|
125a31165a
|
reppy 0.4.1 has a significantly different api apparently, so for now let's go back to 0.3.4
|
2016-11-08 14:11:46 -08:00 |
|
Noah Levitt
|
fe18d915f5
|
still trying to get installation of pip to work on travis-ci
|
2016-11-08 13:50:12 -08:00 |
|