mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-02-23 16:19:49 -05:00
WIP starting to flesh out "scoping" section
This commit is contained in:
parent
6df2c1cf22
commit
88214236bb
33
job-conf.rst
33
job-conf.rst
@ -251,12 +251,26 @@ becomes::
|
|||||||
+============+==========+===========+
|
+============+==========+===========+
|
||||||
| dictionary | no | ``false`` |
|
| dictionary | no | ``false`` |
|
||||||
+------------+----------+-----------+
|
+------------+----------+-----------+
|
||||||
Scope rules. *TODO*
|
Scope specificaion for the seed. See the "Scoping" section which follows.
|
||||||
|
|
||||||
Scoping
|
Scoping
|
||||||
=======
|
=======
|
||||||
|
|
||||||
*TODO* explanation of scoping and scope rules
|
The scope of a seed determines which links are scheduled for crawling and which
|
||||||
|
are not. Example::
|
||||||
|
|
||||||
|
scope:
|
||||||
|
accepts:
|
||||||
|
- parent_url_regex: ^https?://(www\.)?youtube.com/(user|channel)/.*$
|
||||||
|
regex: ^https?://(www\.)?youtube.com/watch\?.*$
|
||||||
|
- surt: +http://(com,google,video,
|
||||||
|
- surt: +http://(com,googlevideo,
|
||||||
|
blocks:
|
||||||
|
- domain: youngscholars.unimelb.edu.au
|
||||||
|
substring: wp-login.php?action=logout
|
||||||
|
- domain: malware.us
|
||||||
|
max_hops: 20
|
||||||
|
max_hops_off_surt: 0
|
||||||
|
|
||||||
Scope settings
|
Scope settings
|
||||||
--------------
|
--------------
|
||||||
@ -285,6 +299,21 @@ Scope settings
|
|||||||
| list | no | *none* |
|
| list | no | *none* |
|
||||||
+------+----------+---------+
|
+------+----------+---------+
|
||||||
|
|
||||||
|
``max_hops``
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
+--------+----------+---------+
|
||||||
|
| type | required | default |
|
||||||
|
+========+==========+=========+
|
||||||
|
| number | no | *none* |
|
||||||
|
+--------+----------+---------+
|
||||||
|
|
||||||
|
``max_hops_off_surt``
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
+--------+----------+---------+
|
||||||
|
| type | required | default |
|
||||||
|
+========+==========+=========+
|
||||||
|
| number | no | 0 |
|
||||||
|
+--------+----------+---------+
|
||||||
|
|
||||||
Scope rule settings
|
Scope rule settings
|
||||||
-------------------
|
-------------------
|
||||||
|
Loading…
x
Reference in New Issue
Block a user