Vangelis Banos
fdc84fb848
Add chrome options --disable-sync and --disable-breakpad
...
`--disable-sync` disables syncing to a Google account.
`--disable-breakpad` disables crashdump collection.
These options aren't useful for Brozzler. They are already used in
puppeteer
https://github.com/puppeteer/puppeteer/blob/main/src/node/Launcher.ts#L211
Docs in chrome-launcher
https://github.com/GoogleChrome/chrome-launcher/blob/master/docs/chrome-flags-for-tools.md
2022-01-18 10:09:39 +00:00
Alex Dempsey
427908e821
Merge pull request #233 from cclauss/codespell
...
Fix typos
2021-10-12 12:34:37 -07:00
Christian Clauss
a5ed291e65
Fix typos
2021-10-12 10:19:48 +02:00
Adam Miller
0f72233f3b
Adding support for hop path information to be stored and passed along to warcprox
2021-08-31 19:44:55 +00:00
Barbara Miller
4f301f4e03
Merge pull request #225 from internetarchive/wt-376-yt-user-page-fix
...
Added new extractor type to brozzler's youtube-dl playlist handling
2021-06-08 14:43:42 -07:00
Barbara Miller
c311fbb41f
bump version, update copyright
2021-05-25 17:14:21 -07:00
Barbara Miller
b59c4395ed
Merge pull request #223 from vbanos/fix-AddressValueError
...
Skip invalid outlink
2021-05-25 17:12:35 -07:00
Vangelis Banos
7aabc5f655
Skip invalid outlink
...
When one of the outlinks is `http://-1/ ` `urlcanon.whatwg` raises an
unhandled exception `ipaddress.AddressValueError` and the capture fails.
We can skip the problematic outlink and keep the rest without crashing.
2021-05-23 11:31:47 +00:00
Pravin Visakan
eabdeb0238
Added user page extractor type to ytdl monkeypatch
2021-05-04 16:50:38 -07:00
Barbara Miller
0f27c9995a
bump version
2020-10-29 17:12:14 -07:00
jkafader
5005c619f6
Merge pull request #211 from internetarchive/galgeek-websocket-url-timeout
...
configurable websocket url timeout, default 60
2020-10-29 17:08:48 -07:00
Barbara Miller
11c5cfa865
add param for Chrome.start
2020-10-21 15:39:46 -07:00
Barbara Miller
dc50fe1db2
Merge pull request #212 from internetarchive/bump-version-to-1.5.23
...
bump version after merge
2020-10-13 15:21:18 -07:00
Barbara Miller
052c3552ca
bump version after merge
2020-10-13 15:19:50 -07:00
Barbara Miller
f2ebdca597
configurable websocket url timeout, default 60
2020-10-13 15:12:32 -07:00
Barbara Miller
bb7594a14d
Merge pull request #209 from vbanos/outlinks-timeout
...
Thanks, @vbanos!
2020-10-13 15:01:55 -07:00
Vangelis Banos
8addaf31d5
Add option extract_outlinks_timeout
...
`Browser.extract_outlinks` has a default `timeout=60` parm that cannot be
changed in any way. (It is always invoked using `extract_outlinks()`.
We add param `extract_outlinks_timeout=60` to `BrozzlerWorker` and
`Browser.browse_page` to allow that.
2020-10-04 15:39:30 +00:00
Barbara Miller
18d3f5f930
Merge pull request #208 from internetarchive/galgeek-patch-2
...
based on PR #207 — thanks @cclaus!
2020-09-21 18:06:03 -07:00
Barbara Miller
297eaac6dd
update travis.yml and test!
2020-09-21 17:08:39 -07:00
Barbara Miller
c744bb2f92
update copyright
2020-09-01 19:05:21 -07:00
Barbara Miller
d599778c27
Merge pull request #206 from internetarchive/galgeek-patch-1
...
bump version after merge
2020-08-05 09:24:28 -07:00
Barbara Miller
84d6bb43fa
bump version after merge
2020-08-05 09:23:58 -07:00
Barbara Miller
5a6ecb09d5
Merge pull request #205 from vbanos/behavior-timeout-zero
...
Skip loading behavior when behavior_timeout=0
behavior_timeout is an existing parameter to `Browser.browse_page`
2020-08-04 16:18:58 -07:00
Neil Minton
12913cccf0
Merge pull request #204 from galgeek/noplaylist-ydl
...
youtube-dl option noplaylist: True
2020-08-04 14:12:14 -04:00
Vangelis Banos
8b10587031
Skip loading behavior when behavior_timeout=0
...
The user may set `behavior_timeout=0`. This means that they don't want
to run the behavior. As it is now, Brozzler will invoke
`brozzler.behavior_script` to load the script and `self.run_behavior`
to execute it.
We will run the behavior using `Runtime.evaluate` but then it will be
terminated immediately because of timeout=0.
It is better to skip behavior loading and running when
`behavior_timeout=0`.
2020-08-04 06:27:21 +00:00
Barbara Miller
dc0d99470a
Merge pull request #203 from miku/update-readme-proxy
...
Thank you, @miku!
2020-07-28 13:43:19 -07:00
Martin Czygan
8e670ca814
readme: remove proxy from job configuration
...
It has been removed in 934190084c73699747cf3f4c4d2ee7e268927eae.
2020-07-28 22:21:05 +02:00
Barbara Miller
e3a067cf60
youtube-dl option noplaylist: True
2020-07-24 16:22:50 -07:00
jkafader
1b9ebca13c
Merge pull request #202 from galgeek/limit_downloadThroughput
...
configurable limit for Chromium download throughput
2020-07-23 14:14:20 -07:00
Barbara Miller
739d09294e
make configurable
2020-07-14 10:12:28 -07:00
Barbara Miller
36b4f80350
try SPN2 downloadThroughput limit
2020-07-14 10:12:28 -07:00
Barbara Miller
03594413f9
Merge pull request #200 from NGTmeaty/fix-test
...
Merging for the current fixes—thanks, @NGTmeaty!
2020-06-18 13:22:41 -07:00
NGTmeaty
25313a97de
Fix tests:
...
Update the RethinkDB pubkey location and repo location based on their guide https://rethinkdb.com/docs/install/ubuntu/
Numpy has updated to no longer support 3.5, on 3.5, we should install a earlier version of Numpy to maintain compatibility.
2020-06-02 03:26:33 -04:00
Neil Minton
3c5d1f24e0
Merge pull request #199 from galgeek/ARI-6097
...
instagram selector update
2020-05-26 17:11:35 -04:00
Barbara Miller
8da3ae9274
instagram update
2020-05-07 17:55:26 -07:00
jkafader
212111f581
Merge pull request #196 from galgeek/no-cache-dir-ydl
...
youtube-dl cache_dir: False
2020-04-30 15:05:00 -07:00
Barbara Miller
926de9c853
cache_dir: False
2020-04-30 11:11:38 -07:00
Barbara Miller
5b2381ef1f
bump version after merge
2020-04-22 10:54:06 -07:00
jkafader
17f173f12a
Merge pull request #162 from galgeek/ARI-5980
...
capture onclick links...
2020-04-22 10:07:04 -07:00
Barbara Miller
4df280a9b6
Merge pull request #194 from NGTmeaty/improve-login
...
Expanding Brozzler's logging in capabilities...
Thanks, @NGTmeaty and @vbanos!
A couple of qa test crawls show the new code works as advertised.
2020-04-18 19:30:40 -07:00
Barbara Miller
04fba79d34
faster regex match
2020-04-16 18:09:03 -07:00
Jake L
09f938410a
Lower amount of times querySelectorAll is called.
...
Fix formatting issues.
2020-04-15 17:26:24 -04:00
Jake L
78365c9f35
Expanding Brozzler's logging in capabilities
...
Some sites don't allow you to login without clicking on a button to open a retracted modal.
This update to the login code allows Brozzler to click on all elements that we think are related to opening a login modal.
Then, if there isn't a regular form, we will attempt to fill out abnormal form schemes.
The test_try_login test has been expanded for the new type of login form we are supporting.
2020-04-14 17:19:53 -04:00
Barbara Miller
973af2c16e
bump version after merge, update copyright
2020-04-14 09:44:20 -07:00
Barbara Miller
a8734bcc11
Merge pull request #193 from vbanos/login-tests
...
Thanks, @vbanos!
2020-04-14 09:42:20 -07:00
Vangelis Banos
041feaf426
Add missing super().do_POST()
2020-04-14 09:39:48 +00:00
Barbara Miller
ae7248fff0
add dblclick (and fix typo)
2020-04-13 19:38:18 -07:00
Vangelis Banos
782aab3048
Add unit tests for try_login behavior
...
Add unit tests for the code that detects and tries to use login forms
automatically (`Browser.try_login`).
Add `htdocs/favicon.ico` because it is loaded automatically when the
browser tries to use the test web server and it causes a "missing"
warning.
Create a new dir `tests/htdocs/site11` which is used for login related
test html files.
2020-04-13 19:16:10 +00:00
Barbara Miller
a3b70fcb27
audio, too
2020-04-07 11:27:32 -07:00
jkafader
e22d80b9a4
Merge pull request #192 from galgeek/ss-fix
...
ss['stop'] not alway set here
2020-04-07 11:12:12 -07:00