WIP starting to flesh out "scoping" section

This commit is contained in:
Noah Levitt 2018-03-19 17:23:49 -07:00
parent 6df2c1cf22
commit 88214236bb

View File

@ -251,12 +251,26 @@ becomes::
+============+==========+===========+
| dictionary | no | ``false`` |
+------------+----------+-----------+
Scope rules. *TODO*
Scope specificaion for the seed. See the "Scoping" section which follows.
Scoping
=======
*TODO* explanation of scoping and scope rules
The scope of a seed determines which links are scheduled for crawling and which
are not. Example::
scope:
accepts:
- parent_url_regex: ^https?://(www\.)?youtube.com/(user|channel)/.*$
regex: ^https?://(www\.)?youtube.com/watch\?.*$
- surt: +http://(com,google,video,
- surt: +http://(com,googlevideo,
blocks:
- domain: youngscholars.unimelb.edu.au
substring: wp-login.php?action=logout
- domain: malware.us
max_hops: 20
max_hops_off_surt: 0
Scope settings
--------------
@ -285,6 +299,21 @@ Scope settings
| list | no | *none* |
+------+----------+---------+
``max_hops``
~~~~~~~~~~~~
+--------+----------+---------+
| type | required | default |
+========+==========+=========+
| number | no | *none* |
+--------+----------+---------+
``max_hops_off_surt``
~~~~~~~~~~~~~~~~~~~~~
+--------+----------+---------+
| type | required | default |
+========+==========+=========+
| number | no | 0 |
+--------+----------+---------+
Scope rule settings
-------------------