Some sites don't allow you to login without clicking on a button to open a retracted modal.
This update to the login code allows Brozzler to click on all elements that we think are related to opening a login modal.
Then, if there isn't a regular form, we will attempt to fill out abnormal form schemes.
The test_try_login test has been expanded for the new type of login form we are supporting.
Add unit tests for the code that detects and tries to use login forms
automatically (`Browser.try_login`).
Add `htdocs/favicon.ico` because it is loaded automatically when the
browser tries to use the test web server and it causes a "missing"
warning.
Create a new dir `tests/htdocs/site11` which is used for login related
test html files.
Currently, when we run `Browser.browse_page`, we run JS behaviors after
we navigate to a page regardless of its status.
Maybe the page wasn't found (4xx) or unreachable for any reason (5xx).
In that case, we could skip running behaviors to save time and
resources.
With this PR, we add a new var to store navigated page HTTP status in
`WebsockReceiverThread.page_status`. We use this in
`Browser.browser_page` to skip behaviors, outlink and hashtag extraction
when page status is 4xx/5xx.
Note that we don't skip screenshots as it could be useful to have a
picture of an error page in some cases.
https://github.com/internetarchive/brozzler/pull/183#issuecomment-560562807
"We've had a number of cases where a page kept failing for one reason or
another, and it's bad. We can end up with tons of duplicate captures,
the crawl is not able to make progress, and the overall performance of
the cluster is impacted in cases like yours, where a browser is sitting
there doing nothing for five minutes."
If ENV var `BROZZLER_EXTRA_CHROME_ARGS` is set, pass its contents as
extra chromium cli options.
Remove `--no-sandbox` option. Its not good from a security point of
view.
When trying to run Brozzler in docker, we get the following error:
```
Failed to move to new namespace: PID namespaces supported, Network
namespace supported, but failed: errno = Operation not permitted
Trace/breakpoint trap
```
This happens because chromium uses sandboxing for increased security by
default and its not supported when running in a container.
Adding chromium option `--no-sandbox` fixes the problem.
This issue is common, I found various reports about it like this:
https://github.com/Zenika/alpine-chrome/issues/33