1279 Commits

Author SHA1 Message Date
James Kafader
c87178e230 Merge branch 'optimizes-rethinkdb-load-query' into qa 2020-03-11 19:31:36 -07:00
James Kafader
313cec3139 coerce to dict not list 2020-03-11 19:31:02 -07:00
James Kafader
8a4743bc8b Merge branch 'optimizes-rethinkdb-load-query' into qa 2020-03-11 19:21:45 -07:00
James Kafader
b9c5e4b66c fix output format 2020-03-11 19:15:57 -07:00
James Kafader
59b67ac3d5 Merge branch 'optimizes-rethinkdb-load-query' into qa 2020-03-11 16:09:52 -07:00
James Kafader
3defd49677 new selection function, based on optimized query 2020-03-11 16:09:16 -07:00
jkafader
1d9a95dfc2
Merge pull request #186 from galgeek/simpler_choose_warcprox
Simpler choose warcprox
2020-03-11 14:16:57 -07:00
Barbara Miller
c952ae019b Merge branch 'simpler_choose_warcprox' into qa 2020-03-11 14:09:03 -07:00
Barbara Miller
f8f7aa1dca maybe fewer warcproxes 2020-03-11 14:08:34 -07:00
Barbara Miller
379be437a2 Merge branch 'simpler_choose_warcprox' into qa 2020-03-11 14:00:57 -07:00
Barbara Miller
d190122a6d random.choice 2020-03-11 14:00:07 -07:00
Barbara Miller
af39b8cc6f skip active_sites query 2020-03-11 13:40:37 -07:00
Barbara Miller
12c39050c9 Merge branch 'ARI-6041' into qa 2020-03-03 11:48:06 -08:00
Barbara Miller
414d1579fc icaew.com behavior 2020-03-03 11:47:34 -08:00
Noah Levitt
c2a1ca018a bump version after merge 2019-12-10 10:43:01 -08:00
Noah Levitt
558a0dd615
Merge pull request #184 from nlevitt/limit-failures
consider page completed after 3 failures
2019-12-10 10:42:43 -08:00
Barbara Miller
bb45419de4 Merge branch 'ARI-5995-too' into qa 2019-12-09 17:50:24 -08:00
Barbara Miller
ca3550af13 instagram interval 2000ms 2019-12-09 17:50:07 -08:00
Noah Levitt
597f2b5b33 reveal bad value when job conf validation fails 2019-12-04 15:11:53 -08:00
Noah Levitt
7915220ab7 consider page completed after 3 failures
https://github.com/internetarchive/brozzler/pull/183#issuecomment-560562807

"We've had a number of cases where a page kept failing for one reason or
another, and it's bad. We can end up with tons of duplicate captures,
the crawl is not able to make progress, and the overall performance of
the cluster is impacted in cases like yours, where a browser is sitting
there doing nothing for five minutes."
2019-12-04 12:38:22 -08:00
Noah Levitt
060adaffd0
Merge pull request #182 from CorentinB/patch-2
Fix Facebook ads variant selector
2019-11-27 16:10:55 -08:00
Noah Levitt
5aeaf47b6b bump version after merge 2019-11-27 12:41:16 -08:00
Noah Levitt
d6ac80af93
Merge pull request #181 from vbanos/no-sandbox
Enable running in docker / k8s
2019-11-27 12:40:42 -08:00
Vangelis Banos
3bc2f434ef Split extra chrome args on whitespace
This is in case multiple args are used.
2019-11-27 20:18:41 +00:00
Noah Levitt
64da843dc8
fix travis badge 2019-11-25 16:04:13 -08:00
Vangelis Banos
62cb051f93 Pass extra CLI params to chrome using ENV variable
If ENV var `BROZZLER_EXTRA_CHROME_ARGS` is set, pass its contents as
extra chromium cli options.

Remove `--no-sandbox` option. Its not good from a security point of
view.
2019-11-25 20:44:25 +00:00
Corentin Barreau
ff523b3bba
Fix: facebook ads variant selector 2019-11-25 17:48:33 +01:00
Barbara Miller
15921504e8 Merge branch 'master' into qa 2019-11-18 15:07:15 -08:00
Noah Levitt
5094267ae8 bump version after merge 2019-11-15 20:38:05 -08:00
Noah Levitt
dcba6c58e3
Merge pull request #168 from CorentinB/facebook
Implement facebook.js with behaviors.yaml
2019-11-15 20:37:31 -08:00
Corentin Barreau
0c7e93c941
Remove custom interval 2019-11-16 02:11:05 +01:00
Noah Levitt
0cf3a5c12a bump version after merge 2019-11-15 11:08:57 -08:00
Noah Levitt
3136eefb77
Merge pull request #180 from galgeek/UmbraBFB
scroll down, and down, then scroll up
2019-11-15 11:08:29 -08:00
Vangelis Banos
35c5fa482f Enable running in docker / k8s
When trying to run Brozzler in docker, we get the following error:
```
Failed to move to new namespace: PID namespaces supported, Network
namespace supported, but failed: errno = Operation not permitted
Trace/breakpoint trap
```
This happens because chromium uses sandboxing for increased security by
default and its not supported when running in a container.
Adding chromium option `--no-sandbox` fixes the problem.

This issue is common, I found various reports about it like this:
https://github.com/Zenika/alpine-chrome/issues/33
2019-11-15 13:20:30 +00:00
Barbara Miller
8bb79029a3 Merge branch 'UmbraBFB' into qa 2019-11-14 17:49:47 -08:00
Barbara Miller
9001449c70 prioritize scrolling down 2019-11-14 17:46:34 -08:00
Barbara Miller
ef70907040
Merge pull request #179 from CorentinB/fix-fb-ads-variants
Fix Facebook Ads Library variants selector
2019-11-13 13:08:47 -08:00
Corentin Barreau
beb80da7d2 Fix ads variant selector 2019-11-13 18:11:48 +01:00
Barbara Miller
c66d131886 Merge branch '168' into qa 2019-11-07 16:05:50 -08:00
Noah Levitt
395ff69f0a bump version after merge 2019-11-06 13:28:45 -08:00
Noah Levitt
802fbff986
Merge pull request #178 from galgeek/ARI-5995-tidied
ARI-5995 instagram capture updates
2019-11-06 13:26:56 -08:00
Corentin Barreau
06fba51b7f Restore 500ms interval speed 2019-11-06 14:11:19 +01:00
Barbara Miller
b4d9b6d20b Merge branch 'ARI-5995-tidied' into qa 2019-11-05 17:43:32 -08:00
Barbara Miller
ac4a3f9914 simpler check, interval; 500 2019-11-05 17:23:01 -08:00
Barbara Miller
69250359bc Merge branch 'ARI-5995-min' into qa 2019-11-05 16:26:09 -08:00
Barbara Miller
4f9b6a8fab skip unneeded check 2019-11-05 16:25:04 -08:00
Barbara Miller
aa8010e93f Merge branch 'ARI-5995-min' into qa 2019-11-05 16:20:55 -08:00
Barbara Miller
45ffce19ec skip unneeded check 2019-11-05 16:20:24 -08:00
Barbara Miller
3c5f0e25ff Merge remote-tracking branch 'upstream/master' into qa 2019-11-04 15:21:48 -08:00
Noah Levitt
754b92cb96 bump version after merge 2019-11-04 15:20:58 -08:00