1492 Commits

Author SHA1 Message Date
Barbara Miller
c744bb2f92
update copyright 2020-09-01 19:05:21 -07:00
Barbara Miller
d599778c27
Merge pull request #206 from internetarchive/galgeek-patch-1
bump version after merge
2020-08-05 09:24:28 -07:00
Barbara Miller
84d6bb43fa
bump version after merge 2020-08-05 09:23:58 -07:00
Barbara Miller
5a6ecb09d5
Merge pull request #205 from vbanos/behavior-timeout-zero
Skip loading behavior when behavior_timeout=0

behavior_timeout is an existing parameter to `Browser.browse_page`
2020-08-04 16:18:58 -07:00
Neil Minton
12913cccf0
Merge pull request #204 from galgeek/noplaylist-ydl
youtube-dl option noplaylist: True
2020-08-04 14:12:14 -04:00
Vangelis Banos
8b10587031 Skip loading behavior when behavior_timeout=0
The user may set `behavior_timeout=0`. This means that they don't want
to run the behavior. As it is now, Brozzler will invoke
`brozzler.behavior_script` to load the script and `self.run_behavior`
to execute it.
We will run the behavior using `Runtime.evaluate` but then it will be
terminated immediately because of timeout=0.

It is better to skip behavior loading and running when
`behavior_timeout=0`.
2020-08-04 06:27:21 +00:00
Barbara Miller
dc0d99470a
Merge pull request #203 from miku/update-readme-proxy
Thank you, @miku!
2020-07-28 13:43:19 -07:00
Martin Czygan
8e670ca814 readme: remove proxy from job configuration
It has been removed in 934190084c73699747cf3f4c4d2ee7e268927eae.
2020-07-28 22:21:05 +02:00
Barbara Miller
e3a067cf60 youtube-dl option noplaylist: True 2020-07-24 16:22:50 -07:00
jkafader
1b9ebca13c
Merge pull request #202 from galgeek/limit_downloadThroughput
configurable limit for Chromium download throughput
2020-07-23 14:14:20 -07:00
Barbara Miller
739d09294e make configurable 2020-07-14 10:12:28 -07:00
Barbara Miller
36b4f80350 try SPN2 downloadThroughput limit 2020-07-14 10:12:28 -07:00
Barbara Miller
03594413f9
Merge pull request #200 from NGTmeaty/fix-test
Merging for the current fixes—thanks, @NGTmeaty!
2020-06-18 13:22:41 -07:00
NGTmeaty
25313a97de
Fix tests:
Update the RethinkDB pubkey location and repo location based on their guide https://rethinkdb.com/docs/install/ubuntu/
Numpy has updated to no longer support 3.5, on 3.5, we should install a earlier version of Numpy to maintain compatibility.
2020-06-02 03:26:33 -04:00
Neil Minton
3c5d1f24e0
Merge pull request #199 from galgeek/ARI-6097
instagram selector update
2020-05-26 17:11:35 -04:00
Barbara Miller
8da3ae9274 instagram update 2020-05-07 17:55:26 -07:00
jkafader
212111f581
Merge pull request #196 from galgeek/no-cache-dir-ydl
youtube-dl cache_dir: False
2020-04-30 15:05:00 -07:00
Barbara Miller
926de9c853 cache_dir: False 2020-04-30 11:11:38 -07:00
Barbara Miller
5b2381ef1f
bump version after merge 2020-04-22 10:54:06 -07:00
jkafader
17f173f12a
Merge pull request #162 from galgeek/ARI-5980
capture onclick links...
2020-04-22 10:07:04 -07:00
Barbara Miller
4df280a9b6
Merge pull request #194 from NGTmeaty/improve-login
Expanding Brozzler's logging in capabilities...

Thanks, @NGTmeaty and @vbanos! 

A couple of qa test crawls show the new code works as advertised.
2020-04-18 19:30:40 -07:00
Barbara Miller
04fba79d34 faster regex match 2020-04-16 18:09:03 -07:00
Jake L
09f938410a
Lower amount of times querySelectorAll is called.
Fix formatting issues.
2020-04-15 17:26:24 -04:00
Jake L
78365c9f35
Expanding Brozzler's logging in capabilities
Some sites don't allow you to login without clicking on a button to open a retracted modal.

This update to the login code allows Brozzler to click on all elements that we think are related to opening a login modal.

Then, if there isn't a regular form, we will attempt to fill out abnormal form schemes.

The test_try_login test has been expanded for the new type of login form we are supporting.
2020-04-14 17:19:53 -04:00
Barbara Miller
973af2c16e
bump version after merge, update copyright 2020-04-14 09:44:20 -07:00
Barbara Miller
a8734bcc11
Merge pull request #193 from vbanos/login-tests
Thanks, @vbanos!
2020-04-14 09:42:20 -07:00
Vangelis Banos
041feaf426 Add missing super().do_POST() 2020-04-14 09:39:48 +00:00
Barbara Miller
ae7248fff0 add dblclick (and fix typo) 2020-04-13 19:38:18 -07:00
Vangelis Banos
782aab3048 Add unit tests for try_login behavior
Add unit tests for the code that detects and tries to use login forms
automatically (`Browser.try_login`).

Add `htdocs/favicon.ico` because it is loaded automatically when the
browser tries to use the test web server and it causes a "missing"
warning.

Create a new dir `tests/htdocs/site11` which is used for login related
test html files.
2020-04-13 19:16:10 +00:00
Barbara Miller
a3b70fcb27 audio, too 2020-04-07 11:27:32 -07:00
jkafader
e22d80b9a4
Merge pull request #192 from galgeek/ss-fix
ss['stop'] not alway set here
2020-04-07 11:12:12 -07:00
Barbara Miller
d2b8171fb0 logging 2020-04-07 09:27:21 -07:00
Barbara Miller
f4f0c02064 ss['stop'] not alway set 2020-04-06 18:37:13 -07:00
Barbara Miller
401ba7293c
Merge pull request #191 from galgeek/skip-behaviors-on-error
bump version after merge
2020-04-02 14:24:56 -07:00
Barbara Miller
3647939af5 bump version after merge 2020-04-02 12:37:56 -07:00
Barbara Miller
ffea189d15
Merge pull request #190 from vbanos/skip-behaviors-on-error
Thank you, @vbanos!
2020-04-01 20:27:22 -07:00
Vangelis Banos
80341b9106 Add option simpler404 to enable this behavior
It is disabled by default.
2020-04-01 16:08:43 +00:00
Barbara Miller
cebdb20972
Merge pull request #188 from galgeek/xfail-interstitial
xfail test — didn't we already merge this simple update?
2020-03-27 16:35:59 -07:00
Vangelis Banos
140c27abe8 Skip running behaviors when page is 4xx or 5xx
Currently, when we run `Browser.browse_page`, we run JS behaviors after
we navigate to a page regardless of its status.
Maybe the page wasn't found (4xx) or unreachable for any reason (5xx).
In that case, we could skip running behaviors to save time and
resources.

With this PR, we add a new var to store navigated page HTTP status in
`WebsockReceiverThread.page_status`. We use this in
`Browser.browser_page` to skip behaviors, outlink and hashtag extraction
when page status is 4xx/5xx.

Note that we don't skip screenshots as it could be useful to have a
picture of an error page in some cases.
2020-03-23 16:21:57 +00:00
jkafader
3b249333a4
Merge pull request #189 from galgeek/ARI-6041
icaew.com behavior
2020-03-12 15:07:49 -07:00
Barbara Miller
4c0785fbfc
Merge pull request #187 from internetarchive/optimizes-rethinkdb-load-query
With the last commit, the only test failure is unrelated test_brozzling.py::test_page_interstitial_exception (already marked xfail in qa).
2020-03-11 21:29:01 -07:00
Barbara Miller
c4beeefe01 address var 2020-03-11 20:56:52 -07:00
Barbara Miller
2dfe3632f5 xfail test 2020-03-11 20:37:30 -07:00
James Kafader
313cec3139 coerce to dict not list 2020-03-11 19:31:02 -07:00
James Kafader
b9c5e4b66c fix output format 2020-03-11 19:15:57 -07:00
James Kafader
3defd49677 new selection function, based on optimized query 2020-03-11 16:09:16 -07:00
jkafader
1d9a95dfc2
Merge pull request #186 from galgeek/simpler_choose_warcprox
Simpler choose warcprox
2020-03-11 14:16:57 -07:00
Barbara Miller
f8f7aa1dca maybe fewer warcproxes 2020-03-11 14:08:34 -07:00
Barbara Miller
d190122a6d random.choice 2020-03-11 14:00:07 -07:00
Barbara Miller
af39b8cc6f skip active_sites query 2020-03-11 13:40:37 -07:00