1724 Commits

Author SHA1 Message Date
Barbara Miller
faf176793d handle more/better browsing timeouts 2019-10-04 17:40:01 -07:00
Barbara Miller
4fc7b612a5 Merge branch 'ARI-5995-instagram' into qa 2019-10-04 17:29:57 -07:00
Barbara Miller
be40a0b56f try only closeSelector 2019-10-04 17:29:39 -07:00
Barbara Miller
3226dfef5c Merge branch 'ARI-5995-instagram' of github.com:galgeek/brozzler into ARI-5995-instagram 2019-10-04 15:22:38 -07:00
Barbara Miller
a4835f5de7 Merge branch 'ARI-5995-instagram' into qa 2019-10-01 16:36:25 -07:00
Barbara Miller
84b99aec33 open/close, then click through? 2019-10-01 16:35:53 -07:00
Barbara Miller
d6af8d3145 skip downloading videos from instagram user page 2019-10-01 16:08:53 -07:00
Barbara Miller
bc2d4903e8 update copyright 2019-10-01 16:08:53 -07:00
Barbara Miller
dd2921af69 Merge remote-tracking branch 'upstream/master' into qa 2019-10-01 16:06:25 -07:00
Barbara Miller
bbd2bd71bf handle timeout trying to extract tertiary assets 2019-10-01 15:43:07 -07:00
Noah Levitt
85e6027838 bump version after merge 2019-09-27 10:40:59 -07:00
Noah Levitt
996070b35c
Merge pull request #167 from vbanos/console-debug-only
Enable Console and Runtime outputs only when debugging
2019-09-27 10:40:17 -07:00
Vangelis Banos
fed5e6b741 Enable Console and Runtime outputs only when debugging
When capturing a page, we receive a LOT of messages from chrome.
Examining these message, we see that we can reduce them a bit to speed
up Brozzler.

We always use `Console.enable` which returns all browser console output.
Also, we always use `Runtime.enable`. Doc says:
https://chromedevtools.github.io/devtools-protocol/1-3/Runtime#method-enable

Enables reporting of execution contexts creation by means of
executionContextCreated event. When the reporting gets enabled the event
will be sent immediately for each existing execution context.

These outputs are useful when debugging but not in production.
If we disable them, we reduce the websocket traffic and improve
performance. With this PR, we enable them only when the current logging
level is `DEBUG`.

Counting the number of messages before and after the change, we see
improvements like:

https://www.gnome.org/technologies/ 220 -> 202 messages.

https://www.whitehouse.gov/issues/budget-spending/  203 -> 189 messages
2019-09-27 13:24:06 +00:00
Noah Levitt
7273c7c3a2
Merge pull request #166 from CorentinB/facebook-ads-lib
Add support for Facebook ads library and fix closing
2019-09-26 14:13:47 -07:00
Barbara Miller
2c29dc0333 make instagram use default interval, like prod 2019-09-26 13:28:26 -07:00
Corentin Barreau
e701e3f101 Add: break after closing the first visible element 2019-09-26 21:44:25 +02:00
Barbara Miller
ce7e7447b7 reset instagram interval closer to default 2019-09-26 12:44:20 -07:00
Barbara Miller
f4c57f5d30 Merge branch 'ARI-5980' into qa 2019-09-25 11:36:42 -07:00
Barbara Miller
4a5bc51a72 still better regex; rm old code already 2019-09-25 11:36:15 -07:00
Corentin Barreau
101f7f2e4a Remove: useless comment 2019-09-25 19:48:38 +02:00
Corentin Barreau
fb30fb9aa3 Add: isVisible check for close selectors
Modify: doTarget - Revert to initial code
2019-09-25 16:19:41 +02:00
Corentin Barreau
5c5743ea11 Fix: closeSelector not being clicked
Add: support for facebook.com/ads/library - Open and close metrics for ads
2019-09-25 16:10:59 +02:00
Barbara Miller
53671da941 Merge branch 'ARI-5980' into qa 2019-09-24 16:41:11 -07:00
Barbara Miller
ac9950a1ea better regex, outlinks.push(m[2]) 2019-09-24 16:40:45 -07:00
Noah Levitt
efa185a8dc
Merge pull request #160 from vbanos/behavior-timeout
More accurate JS behavior timeout
2019-09-24 12:11:37 -07:00
Noah Levitt
eb30ba0c33
Merge pull request #165 from vbanos/stderr-stdout-exception-handling
Improve exception handling when reading STDIN/STDERR
2019-09-24 12:03:06 -07:00
Barbara Miller
1603229315 Merge branch 'ARI-5995-instagram' into qa 2019-09-19 15:21:43 -07:00
Barbara Miller
9054daf3c4 skip downloading videos from instagram user page 2019-09-19 15:20:14 -07:00
Barbara Miller
f0a17da851 Merge branch 'fb-ad' into qa 2019-09-19 15:01:25 -07:00
Barbara Miller
3799f2747c Merge branch 'ARI-5995-instagram' into qa 2019-09-19 14:58:02 -07:00
Barbara Miller
4a0ce9da04 skip downloading videos from instagram user page 2019-09-19 14:57:17 -07:00
Barbara Miller
c46f29eaae update copyright 2019-09-19 14:41:55 -07:00
Vangelis Banos
f42ff08da1 Improve exception handling when reading STDIN/STDERR
When the chrome process dies and we try to read STDIN/STDERR, we get
`ValueError: I/O operation on closed file` or
`OSError: [Errno 9] Bad file descriptor`.

We modify `readline_nonblock` method to return the buffer it read up to
this point.
2019-09-19 20:08:55 +00:00
Barbara Miller
5ff7536c60 support fb ads pages? 2019-09-16 12:49:41 -07:00
Barbara Miller
4ba6efd9c9 Merge branch 'ARI-5980' into qa 2019-09-10 17:57:47 -07:00
Barbara Miller
2ec284c88b add selector video to default 2019-09-10 17:56:20 -07:00
Barbara Miller
7bb52faca9 add pop urls using regex for better match 2019-09-10 17:49:37 -07:00
Barbara Miller
0755210b47 Merge branch 'senate-videos' into qa 2019-09-03 14:48:12 -07:00
Barbara Miller
6431f4e803 add pop urls using regex for better match 2019-09-03 14:47:48 -07:00
Barbara Miller
57a5814884 Merge branch 'senate-videos' into qa 2019-08-22 16:28:03 -07:00
Barbara Miller
5b393837b8 add selector video to default 2019-08-22 16:26:24 -07:00
Vangelis Banos
0b28a4a57f More accurate JS behavior timeout
If you use a JS behavior timeout smaller than 7 sec, the JS behavior
will always need 7 sec because `sleep(7)` is hard-coded there.

We make a minor addition to use `min(timeout, 7)` for sleep so it will
finish faster when using a smaller JS behavior timeout.
2019-08-22 21:15:44 +00:00
Barbara Miller
5304ba4491 Merge branch 'senate-videos' into qa 2019-08-21 15:11:04 -07:00
Barbara Miller
14e3d56cd2 add popup urls as outlinks 2019-08-20 15:13:35 -07:00
Noah Levitt
16f886259d
Merge pull request #158 from galgeek/aitfive-1668-soundcoud
capture soundcloud user page before capturing tracks
2019-08-15 15:46:55 -07:00
Barbara Miller
c6308fe754 Revert "initial commit"
This reverts commit 5368a840665dcf9770ede6006d685ff113c84a3f.
2019-08-08 14:03:30 -07:00
Noah Levitt
94cd6cacb6 bump version after merge 2019-07-18 11:07:27 -07:00
Noah Levitt
726c6effed
Merge pull request #157 from vbanos/block-amp-analytics
Block AMP analytics JS script
2019-07-18 11:07:09 -07:00
Barbara Miller
d0c46db746 Merge branch 'aitfive-1668-soundcoud' into qa 2019-07-17 17:45:39 -07:00
Barbara Miller
9cc60449d7 skip downloading tracks from soundcloud user page 2019-07-17 17:45:02 -07:00