mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-08-02 03:26:11 -04:00
WIP some explanation of automatic login
This commit is contained in:
parent
914289b414
commit
6df2c1cf22
1 changed files with 17 additions and 8 deletions
25
job-conf.rst
25
job-conf.rst
|
@ -40,12 +40,11 @@ Example
|
||||||
How inheritance works
|
How inheritance works
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
Most of the available options apply to seeds. Such options can also be
|
Most of the settings that apply to seeds can also be specified at the top
|
||||||
specified at the top level, in which case the seeds inherit the options. If
|
level, in which case all seeds inherit those settings. If an option is
|
||||||
an option is specified both at the top level and at the level of an individual
|
specified both at the top level and at the level of an individual seed, the
|
||||||
seed, the results are merged with the seed-level value taking precedence in
|
results are merged with the seed-level value taking precedence in case of
|
||||||
case of conflicts. It's probably easiest to make sense of this by way of an
|
conflicts. It's probably easiest to make sense of this by way of an example.
|
||||||
example.
|
|
||||||
|
|
||||||
In the example yaml above, ``warcprox_meta`` is specified at the top level and
|
In the example yaml above, ``warcprox_meta`` is specified at the top level and
|
||||||
at the seed level for the seed http://one.example.org/. At the top level we
|
at the seed level for the seed http://one.example.org/. At the top level we
|
||||||
|
@ -117,7 +116,7 @@ seeds starving out other jobs.
|
||||||
+------------------------+----------+---------+
|
+------------------------+----------+---------+
|
||||||
List of seeds. Each item in the list is a dictionary (associative array) which
|
List of seeds. Each item in the list is a dictionary (associative array) which
|
||||||
defines the seed. It must specify ``url`` (see below) and can additionally
|
defines the seed. It must specify ``url`` (see below) and can additionally
|
||||||
specify any *seed* settings.
|
specify any seed settings.
|
||||||
|
|
||||||
Seed-level-only settings
|
Seed-level-only settings
|
||||||
------------------------
|
------------------------
|
||||||
|
@ -131,7 +130,7 @@ settings, which can also be specified at the top level.
|
||||||
+========+==========+=========+
|
+========+==========+=========+
|
||||||
| string | yes | *n/a* |
|
| string | yes | *n/a* |
|
||||||
+--------+----------+---------+
|
+--------+----------+---------+
|
||||||
The seed url.
|
The seed url. Crawling starts here.
|
||||||
|
|
||||||
``username``
|
``username``
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
|
@ -140,6 +139,8 @@ The seed url.
|
||||||
+========+==========+=========+
|
+========+==========+=========+
|
||||||
| string | no | *none* |
|
| string | no | *none* |
|
||||||
+--------+----------+---------+
|
+--------+----------+---------+
|
||||||
|
If set, used to populate automatically detected login forms. See explanation at
|
||||||
|
"password" below.
|
||||||
|
|
||||||
``password``
|
``password``
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
|
@ -148,6 +149,14 @@ The seed url.
|
||||||
+========+==========+=========+
|
+========+==========+=========+
|
||||||
| string | no | *none* |
|
| string | no | *none* |
|
||||||
+--------+----------+---------+
|
+--------+----------+---------+
|
||||||
|
If set, used to populate automatically detected login forms. If ``username``
|
||||||
|
and ``password`` are configured for a seed, brozzler will look for a login form
|
||||||
|
on each page it crawls for that seed. A form that has a single text or email
|
||||||
|
field (the username), a single password field (``<input type="password">``),
|
||||||
|
and has ``method="POST"`` is considered to be a login form. The form may have
|
||||||
|
other fields like checkboxes and hidden fields. For these, brozzler will leave
|
||||||
|
the default values in place. Login form detection and submission happen after
|
||||||
|
page load, then brozzling proceeds as usual.
|
||||||
|
|
||||||
Seed-level / top-level settings
|
Seed-level / top-level settings
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue