brozzler

mirror of https://github.com/internetarchive/brozzler.git synced 2025-02-24 16:49:56 -05:00

Author	SHA1	Message	Date
Noah Levitt	a327cb626f	more explication of scoping	2018-05-14 17:31:45 -07:00
Noah Levitt	2cf474aa1d	update docs to match new seed ssurt behavior	2018-05-14 16:59:55 -07:00
Noah Levitt	fc05cac338	ok seriously tests	2018-05-14 15:38:28 -07:00
Noah Levitt	05f8ab3495	fix more tests for new approach sans scope['surt']	2018-05-14 15:38:28 -07:00
Noah Levitt	85a4757527	s/max_hops_off_surt/max_hops_off/	2018-05-14 15:38:28 -07:00
Noah Levitt	5ebd2fb709	new test of max_hops_off	2018-05-14 15:38:28 -07:00
Noah Levitt	b83d3cb9df	rename page.hops_off_surt to page.hops_off	2018-05-14 15:38:28 -07:00
Noah Levitt	60f2b99cc0	doublethink had a bug fix	2018-05-14 15:38:28 -07:00
Noah Levitt	526a4d718f	tests for new approach without scope['surt'] replaced by an accept rule (two rules in some cases of seed redirects)	2018-05-14 15:38:28 -07:00
Noah Levitt	245e27a21a	tests for new approach without of scope['surt'] replaced by an accept rule (two rules in some cases of seed redirects)	2018-05-14 15:38:28 -07:00
Noah Levitt	f26712ce93	WIP add an accept rule instead of modifying surt in place for seed redirects	2018-05-14 15:38:28 -07:00
Noah Levitt	98ce67ef36	WIP some words on scoping	2018-05-14 15:38:28 -07:00
Noah Levitt	88214236bb	WIP starting to flesh out "scoping" section	2018-05-14 15:38:28 -07:00
Noah Levitt	6df2c1cf22	WIP some explanation of automatic login	2018-05-14 15:38:28 -07:00
Noah Levitt	914289b414	WIP documentation!	2018-05-14 15:38:28 -07:00
Noah Levitt	a1af18230c	Merge pull request #103 from internetarchive/ARI-5671 instagram updates	2018-03-23 14:18:04 -07:00
Barbara Miller	426ca48554	less is more	2018-03-23 14:17:22 -07:00
Barbara Miller	51977908ec	uncomment; now tested	2018-03-20 10:39:14 -07:00
Barbara Miller	9e871a9f81	instagram umbraBehavior & vanishing elem fix	2018-03-20 10:22:55 -07:00
Noah Levitt	6aa8af9d80	Merge pull request #101 from galgeek/ARI-5617 repeatSameElement, firstMatchOnly, configurable interval timing, for ARI-5617	2018-03-19 16:36:52 -07:00
Barbara Miller	1e2e7213c8	better booleans for umbraBehavior	2018-03-19 16:31:23 -07:00
Barbara Miller	bc5a36e8a3	better booleans	2018-03-19 16:28:47 -07:00
Barbara Miller	745e6cc942	log behavior params better	2018-03-19 16:28:14 -07:00
Barbara Miller	ae6f72769a	better config names	2018-03-19 16:02:07 -07:00
Barbara Miller	74fc7cd102	update behaviors.yaml	2018-03-19 14:44:29 -07:00
Barbara Miller	cc207763d5	add onceOnly config; other tweaks	2018-03-19 14:44:29 -07:00
Barbara Miller	8f861389ba	amerciaspresidents.si.edu/gallery behavior	2018-03-19 14:44:29 -07:00
Barbara Miller	5dfb081bb4	skipIDcheck, default false / no / 0	2018-03-19 14:44:29 -07:00
Barbara Miller	8f12f0b0c0	better idCheck and configurable interval timing	2018-03-19 14:44:04 -07:00
Barbara Miller	c31f13e47f	add idCheck feature, default: true	2018-03-19 14:44:04 -07:00
Noah Levitt	8e273b2e6b	Merge pull request #100 from nlevitt/max-claimed-sites reimplement max_claimed_sites	2018-03-15 15:05:46 -07:00
Noah Levitt	dc00f5de32	reimplement max_claimed_sites Other approach was too slow and caused db contention. New approach avoids (slow) rethinkdb join by max_claimed_sites job parameter to each of the job's sites. Uses rethinkdb fold() to count claimed sites and enforce max_claimed_sites within a single query.	2018-03-15 12:57:49 -07:00
Noah Levitt	55701ae373	bump version number after merge	2018-03-08 16:49:28 -08:00
jkafader	7d61673d3e	Merge pull request #97 from nlevitt/max-claimed-sites Max claimed sites	2018-03-08 16:48:31 -08:00
Noah Levitt	4daac3dfc5	fix timely time limit enforcement by including current brozzling session duration in time accounting	2018-03-05 17:05:41 -08:00
Noah Levitt	318ae13bcb	honor stop request before choosing proxy makes test_warcprox_outage_resiliency pass again	2018-03-05 16:08:24 -08:00
Noah Levitt	a914fb8461	Merge pull request #99 from vbanos/chromium-single-process Use single process model for chromium-browser	2018-03-05 12:06:20 -08:00
Vangelis Banos	171ce8d854	Use single process model for chromium-browser By default chromium creates multiple renderer processes (each running multiple threads) for each instance of a site the user visits. What we see from `ps auxcf` output is the following: ``` \_ chromium-browse \_ chromium-browse \| \_ chromium-browse \| \_ chromium-browse \| \_ chromium-browse \| \_ chromium-browse ``` Using the `--single-process` option, we run all renderers in the same process, saving the overhead of running multiple processes. `ps auxcf` output is the following: ``` \_ chromium-browse \_ chromium-browse \_ chromium-browse ``` Performance is improved a bit and I guess that using this in large scale Brozzler deployments will have even better performance effects. The potential problem of `--single-process` is stability (if a renderer crashes, the whole browser also crashes) but since we use very short-lived instances of chromium, we don't worry about this. Details on chromium process models: https://www.chromium.org/developers/design-documents/process-models	2018-03-04 20:48:29 +00:00
Noah Levitt	2639d7b991	fix query to make tests pass?	2018-03-02 16:30:35 -08:00
Noah Levitt	f9834ca77d	bump after merge	2018-03-02 11:51:50 -08:00
Noah Levitt	a0710b605c	Merge pull request #96 from vbanos/jinja2-auto-reload Disable Jinja2 template auto_reload for higher performance	2018-03-02 11:51:11 -08:00
Noah Levitt	f26d711a89	new job setting max_claimed_sites Puts a cap on the number of sites belonging to a given job that can be brozzled simultaneously across the cluster. Addresses the problem of a job with many seeds starving out other jobs. For AITFIVE-1578.	2018-03-01 17:17:54 -08:00
Noah Levitt	d7512fbeb6	move time limit enforcement now it's next to stop request enforcement which makes more sense and supports more timely action	2018-03-01 11:28:30 -08:00
Vangelis Banos	ce473897a3	Disable Jinja2 template auto_reload for higher performance Every time we run a JS behavior, we load a Jinja2 template. By default, Jinja2 has option `auto_reload=True`. This mean that every time a template is requested the loader checks if the source file changed and if yes, it will reload the template. For higher performance it’s possible to disable that. Also note that Jinja caches 400 templates by default. Ref: http://jinja.pocoo.org/docs/2.10/api/ In Brozzler, we don't make changes to JS templates while the system is running. So, there is no point in having auto_reload=True.	2018-02-25 20:24:25 +00:00
Noah Levitt	b438cdd33e	Merge pull request #94 from vbanos/json-compact Send more compact JSON to browser	2018-02-21 09:53:16 -08:00
Vangelis Banos	646faa8ab0	Invalid syntax in WebsockReceiverThread._javascript_dialog_open Fix `)` position	2018-02-21 07:34:36 +00:00
Noah Levitt	eda5133301	Merge pull request #95 from vbanos/configurable-wait-interval Make Browser._wait_for sleep time a varible	2018-02-20 15:05:34 -08:00
Vangelis Banos	e2128b42f0	Make Browser._wait_for sleep time a varible Useful to be able to tweak this value in other apps using `Browser`.	2018-02-18 23:08:51 +00:00
Vangelis Banos	d6c707d941	Send more compact JSON to browser Use JSON separators without spaces to reduce json size. Its already used elsewhere in Brozzler but not here.	2018-02-18 19:03:36 +00:00
Noah Levitt	0d605d0a88	Merge pull request #90 from vbanos/chrome-flags-performance Add chromium CLI flags to improve capture performance	2018-02-15 10:54:34 -08:00

... 7 8 9 10 11 ...

1404 Commits