Noah Levitt
8906037d82
bump dev version after PR #102
2018-05-16 17:33:52 -07:00
Noah Levitt
e90e7345a5
Merge pull request #102 from nlevitt/docs
...
complete job configuration documentation
2018-05-16 17:31:27 -07:00
Noah Levitt
331d07fe88
these ssurts are strings too
2018-05-16 17:11:08 -07:00
Noah Levitt
67558528cb
fix bad copy/paste
2018-05-16 16:43:38 -07:00
Noah Levitt
5bb392ec7c
ssurts are strings now
...
because they're friendlier that way in rethinkdb
2018-05-16 16:43:10 -07:00
Noah Levitt
399c097c7c
travis-ci install warcprox from github
2018-05-16 15:48:29 -07:00
Noah Levitt
ac735639ff
incorporate urlcanon fix
2018-05-16 14:41:49 -07:00
Noah Levitt
338d2e48f9
update warcprox dependency to include recent fixes
2018-05-16 14:26:51 -07:00
Noah Levitt
b9b8dcd062
backward compatibility for old scope["surt"]
...
and make sure to store ssurt as string in rethinkdb
2018-05-16 14:19:23 -07:00
Noah Levitt
1572fd3ed6
missed a spot where is_permitted_by_robots needs monkeying
2018-05-15 16:52:48 -07:00
Noah Levitt
a8de9b70d1
handle new chrome cookie db schema
2018-05-15 11:41:02 -07:00
Noah Levitt
de1f240e25
describe scope rule conditions
...
plus a bunch of tweaks and fixes
2018-05-15 11:01:09 -07:00
Noah Levitt
a327cb626f
more explication of scoping
2018-05-14 17:31:45 -07:00
Noah Levitt
2cf474aa1d
update docs to match new seed ssurt behavior
2018-05-14 16:59:55 -07:00
Noah Levitt
fc05cac338
ok seriously tests
2018-05-14 15:38:28 -07:00
Noah Levitt
05f8ab3495
fix more tests for new approach sans scope['surt']
2018-05-14 15:38:28 -07:00
Noah Levitt
85a4757527
s/max_hops_off_surt/max_hops_off/
2018-05-14 15:38:28 -07:00
Noah Levitt
5ebd2fb709
new test of max_hops_off
2018-05-14 15:38:28 -07:00
Noah Levitt
b83d3cb9df
rename page.hops_off_surt to page.hops_off
2018-05-14 15:38:28 -07:00
Noah Levitt
60f2b99cc0
doublethink had a bug fix
2018-05-14 15:38:28 -07:00
Noah Levitt
526a4d718f
tests for new approach without scope['surt']
...
replaced by an accept rule (two rules in some cases of seed redirects)
2018-05-14 15:38:28 -07:00
Noah Levitt
245e27a21a
tests for new approach without of scope['surt']
...
replaced by an accept rule (two rules in some cases of seed redirects)
2018-05-14 15:38:28 -07:00
Noah Levitt
f26712ce93
WIP add an accept rule instead of modifying surt
...
in place for seed redirects
2018-05-14 15:38:28 -07:00
Noah Levitt
98ce67ef36
WIP some words on scoping
2018-05-14 15:38:28 -07:00
Noah Levitt
88214236bb
WIP starting to flesh out "scoping" section
2018-05-14 15:38:28 -07:00
Noah Levitt
6df2c1cf22
WIP some explanation of automatic login
2018-05-14 15:38:28 -07:00
Noah Levitt
914289b414
WIP documentation!
2018-05-14 15:38:28 -07:00
Noah Levitt
a1af18230c
Merge pull request #103 from internetarchive/ARI-5671
...
instagram updates
2018-03-23 14:18:04 -07:00
Barbara Miller
426ca48554
less is more
2018-03-23 14:17:22 -07:00
Barbara Miller
51977908ec
uncomment; now tested
2018-03-20 10:39:14 -07:00
Barbara Miller
9e871a9f81
instagram umbraBehavior & vanishing elem fix
2018-03-20 10:22:55 -07:00
Noah Levitt
6aa8af9d80
Merge pull request #101 from galgeek/ARI-5617
...
repeatSameElement, firstMatchOnly, configurable interval timing, for ARI-5617
2018-03-19 16:36:52 -07:00
Barbara Miller
1e2e7213c8
better booleans for umbraBehavior
2018-03-19 16:31:23 -07:00
Barbara Miller
bc5a36e8a3
better booleans
2018-03-19 16:28:47 -07:00
Barbara Miller
745e6cc942
log behavior params better
2018-03-19 16:28:14 -07:00
Barbara Miller
ae6f72769a
better config names
2018-03-19 16:02:07 -07:00
Barbara Miller
74fc7cd102
update behaviors.yaml
2018-03-19 14:44:29 -07:00
Barbara Miller
cc207763d5
add onceOnly config; other tweaks
2018-03-19 14:44:29 -07:00
Barbara Miller
8f861389ba
amerciaspresidents.si.edu/gallery behavior
2018-03-19 14:44:29 -07:00
Barbara Miller
5dfb081bb4
skipIDcheck, default false / no / 0
2018-03-19 14:44:29 -07:00
Barbara Miller
8f12f0b0c0
better idCheck and configurable interval timing
2018-03-19 14:44:04 -07:00
Barbara Miller
c31f13e47f
add idCheck feature, default: true
2018-03-19 14:44:04 -07:00
Noah Levitt
8e273b2e6b
Merge pull request #100 from nlevitt/max-claimed-sites
...
reimplement max_claimed_sites
2018-03-15 15:05:46 -07:00
Noah Levitt
dc00f5de32
reimplement max_claimed_sites
...
Other approach was too slow and caused db contention.
New approach avoids (slow) rethinkdb join by max_claimed_sites job
parameter to each of the job's sites. Uses rethinkdb fold() to count
claimed sites and enforce max_claimed_sites within a single query.
2018-03-15 12:57:49 -07:00
Noah Levitt
55701ae373
bump version number after merge
2018-03-08 16:49:28 -08:00
jkafader
7d61673d3e
Merge pull request #97 from nlevitt/max-claimed-sites
...
Max claimed sites
2018-03-08 16:48:31 -08:00
Noah Levitt
4daac3dfc5
fix timely time limit enforcement
...
by including current brozzling session duration in time accounting
2018-03-05 17:05:41 -08:00
Noah Levitt
318ae13bcb
honor stop request before choosing proxy
...
makes test_warcprox_outage_resiliency pass again
2018-03-05 16:08:24 -08:00
Noah Levitt
a914fb8461
Merge pull request #99 from vbanos/chromium-single-process
...
Use single process model for chromium-browser
2018-03-05 12:06:20 -08:00
Vangelis Banos
171ce8d854
Use single process model for chromium-browser
...
By default chromium creates multiple renderer processes (each running
multiple threads) for each instance of a site the user visits. What we
see from `ps auxcf` output is the following:
```
\_ chromium-browse
\_ chromium-browse
| \_ chromium-browse
| \_ chromium-browse
| \_ chromium-browse
| \_ chromium-browse
```
Using the `--single-process` option, we run all renderers in the same
process, saving the overhead of running multiple processes. `ps auxcf`
output is the following:
```
\_ chromium-browse
\_ chromium-browse
\_ chromium-browse
```
Performance is improved a bit and I guess that using this in large scale
Brozzler deployments will have even better performance effects.
The potential problem of `--single-process` is stability (if a renderer
crashes, the whole browser also crashes) but since we use very short-lived
instances of chromium, we don't worry about this.
Details on chromium process models:
https://www.chromium.org/developers/design-documents/process-models
2018-03-04 20:48:29 +00:00