Some sites don't allow you to login without clicking on a button to open a retracted modal.
This update to the login code allows Brozzler to click on all elements that we think are related to opening a login modal.
Then, if there isn't a regular form, we will attempt to fill out abnormal form schemes.
The test_try_login test has been expanded for the new type of login form we are supporting.
Add unit tests for the code that detects and tries to use login forms
automatically (`Browser.try_login`).
Add `htdocs/favicon.ico` because it is loaded automatically when the
browser tries to use the test web server and it causes a "missing"
warning.
Create a new dir `tests/htdocs/site11` which is used for login related
test html files.
https://github.com/internetarchive/brozzler/pull/183#issuecomment-560562807
"We've had a number of cases where a page kept failing for one reason or
another, and it's bad. We can end up with tons of duplicate captures,
the crawl is not able to make progress, and the overall performance of
the cluster is impacted in cases like yours, where a browser is sitting
there doing nothing for five minutes."
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.
We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
enforce time limit based on all the time that a site was in active
rotation, including time it spent waiting for its turn to be brozzled;
this undoes the change from b9640b8a30c934, because now it seems that
was the wrong decision (brozzler jobs with many seeds and low
max_claimed_sites hanging around forever)
Puts a cap on the number of sites belonging to a given job that can be brozzled
simultaneously across the cluster. Addresses the problem of a job with many
seeds starving out other jobs. For AITFIVE-1578.
* master:
back to dev version number
commit for beta release
this should fix travis build?
fix tests
update brozzler-easy for current warcprox api
simpleclicks for minutes PDF
--warcprox-auto distribute assigned sites evenly
When running with --warcprox-auto, choose the instance of warcprox with
the least number of assigned sites, instead of the lowest load in the
service registry. In practice we often start brozzling a whole bunch of
sites at approximately the same time, and because it takes time for that
to affect the "load" reported by warcprox instances, sites end up being
distributed very unevenly.