Commit graph

410 commits

Author SHA1 Message Date
Barbara Miller
8b93c078b7 Merge branch 'instaInterval' into qa 2018-12-21 14:41:27 -08:00
Noah Levitt
6b8e597a43 bump version after merge 2018-12-20 11:30:49 -08:00
Barbara Miller
bf8bbfba27 Merge branch 'no-skipIframes' into qa 2018-12-20 11:25:54 -08:00
Noah Levitt
034f7938c4 catch common exception in default behavior 2018-12-20 10:46:05 -08:00
Noah Levitt
2cd64811b3 bump version after merge 2018-12-17 15:10:26 -08:00
Barbara Miller
cbd6f0f90a Merge branch 'insta18q4' into qa 2018-12-13 17:29:36 -08:00
Noah Levitt
15870e6010 avoid IndexError
in some cases we receive this event from the browser:
{"method":"ServiceWorker.workerVersionUpdated","params":{"versions":[]}}
2018-12-13 15:49:38 -08:00
Noah Levitt
b577fe3c36 log browser uncaught exceptions at debug level
didn't realize these weren't showing up as console messages
2018-12-13 15:45:35 -08:00
Noah Levitt
b447063099 Merge branch 'master' into qa
* master:
  bump version after merge
  change time limit enforcement
2018-11-29 14:52:32 -08:00
Noah Levitt
ebcc063fe2 bump version after merge 2018-11-29 14:52:11 -08:00
Barbara Miller
b204e9aec1 Merge branch 'service-worker' into qa 2018-11-27 12:58:47 -08:00
Noah Levitt
574af7846e bump version after merge 2018-11-16 15:10:46 -08:00
Noah Levitt
9db7744f2c Merge branch 'master' into qa
* master:
  fail quickly if browser dies at startup
2018-11-01 15:57:52 -07:00
Noah Levitt
15610fa990 fail quickly if browser dies at startup
instead of trying to retrieve /json for 600 seconds
2018-11-01 15:57:03 -07:00
Noah Levitt
27ba877932 Merge branch 'master' into qa
* master:
  handle exceptions extracting links
  fix reported chromium crash by removing argument
  bump version after merge
  remove stray bad logging line
  tests expect outlinks to be a set
  tidy up some comments and docs
  watch pages as outlinks from youtube-dl playlists
  silence youtube-dl's logging, use only our own
  use a thread-local callback in monkey-patched
  skip downloading videos from youtube playlists
  trace-level logging for all the chrome output
2018-10-29 17:45:09 -07:00
Noah Levitt
1073431f76 handle exceptions extracting links
like this one:
Uncaught DOMException: Blocked a frame with origin "https://www.youtube.com" from accessing a cross-origin frame.
    at __brzl_compileOutlinks (<anonymous>:4:24)
    at __brzl_compileOutlinks (<anonymous>:10:29)
    at <anonymous>:16:1
__brzl_compileOutlinks @ VM194:4
__brzl_compileOutlinks @ VM194:10

not sure exactly why this happens but we just have to handle it
2018-10-29 17:42:25 -07:00
Noah Levitt
af85f28908 fix reported chromium crash by removing argument
--single-process
https://github.com/internetarchive/brozzler/issues/128
2018-10-22 14:28:31 -07:00
Noah Levitt
20996fa501 bump version after merge 2018-10-12 12:46:09 -07:00
Noah Levitt
82cf5c6dbb skip downloading videos from youtube playlists
because we expect to capture videos from individual watch pages, and
often processing thousands of videos with youtube-dl before the page is
ever opened in the browser is not desired behavior and is a crawling
problem
2018-10-11 15:46:30 -07:00
Noah Levitt
16c56fed5a Merge branch 'master' into qa
* master:
  hopefully fixes lingering ydl concurrency issue
2018-10-11 13:43:06 -07:00
Noah Levitt
1e95441ce7 hopefully fixes lingering ydl concurrency issue
which was causing awfulness like this:
2018-09-30 04:39:54,410 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '0 - unknown' video stitched-up as application/octet-stream (228243844 bytes) to warcprox at wbgrp-svc408.us.archive.org:8000 with url youtube-dl:00001:https://www.facebook.com/CongresswomanRosaDeLauro/
2018-09-30 04:39:58,092 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '0 - 1080x607' video stitched-up as video/mp4 (228243844 bytes) to warcprox at wbgrp-svc045.us.archive.org:8001 with url youtube-dl:00037:https://instagram.com/p/BfJvqhfnQ0C/
2018-09-30 04:40:05,120 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '0 - unknown' video stitched-up as application/octet-stream (228243844 bytes) to warcprox at wbgrp-svc107.us.archive.org:8000 with url youtube-dl:00009:https://www.facebook.com/LDS
2018-09-30 04:40:09,450 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '22 - 1280x720 (hd720)' video stitched-up as video/mp4 (228243844 bytes) to warcprox at wbgrp-svc407.us.archive.org:8000 with url youtube-dl:00048:https://www.youtube.com/watch?v=-gH28zrMmAM
2018-09-30 04:40:14,327 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing 'hls-2176 - 1280x720' video stitched-up as video/mp4 (228243844 bytes) to warcprox at wbgrp-svc108.us.archive.org:8000 with url youtube-dl:00001:https://twitter.com/RepTedLieu/status/1010212963897233408
2018-09-30 04:40:23,018 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '0 - unknown' video stitched-up as application/octet-stream (228243844 bytes) to warcprox at wbgrp-svc048.us.archive.org:8001 with url youtube-dl:00005:https://www.facebook.com/SenDuckworth/
2018-09-30 04:40:29,553 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '0 - unknown' video stitched-up as application/octet-stream (228243844 bytes) to warcprox at wbgrp-svc045.us.archive.org:8000 with url youtube-dl:00009:http://www.facebook.com/repkathleenrice
2018-09-30 04:40:37,057 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '22 - 1280x720 (hd720)' video stitched-up as video/mp4 (228243844 bytes) to warcprox at wbgrp-svc406.us.archive.org:8000 with url youtube-dl:00023:https://www.youtube.com/watch?v=MaamqVF87mE
2018-09-30 04:40:41,298 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing '22 - 1280x720 (hd720)' video stitched-up as video/mp4 (228243844 bytes) to warcprox at wbgrp-svc403.us.archive.org:8000 with url youtube-dl:00039:https://www.youtube.com/watch?v=pRpMp4H8El0
2018-09-30 04:40:45,613 19101 INFO BrozzlingThread:58486 brozzler.ydl._build_youtube_dl.<locals>._YoutubeDL._push_stitched_up_vid_to_warcprox(ydl.py:164) pushing 'hls-2176 - 1280x720' video stitched-up as video/mp4 (228243844 bytes) to warcprox at wbgrp-svc408.us.archive.org:8000 with url youtube-dl:00001:https://twitter.com/RepKevinCramer/status/999771072206639104
i.e. pushing the same stitched-up video to a bunch of wrong places :(
2018-10-11 13:40:57 -07:00
Noah Levitt
ff64e32bd3 Merge branch 'master' into qa
* master:
  brozzler-worker log version number at startup
2018-10-11 13:31:47 -07:00
Noah Levitt
e519616f8e brozzler-worker log version number at startup 2018-10-11 13:31:37 -07:00
Noah Levitt
a75632bd95 Merge branch 'master' into qa
* master:
  bump version after merge
  fix another oversight
  ugh. oops
  Revert "add a github PR template for this repo"
  improve performance of brozzler-new-job
2018-09-28 15:27:51 -07:00
Noah Levitt
362a2347b9 bump version after merge 2018-09-28 15:27:40 -07:00
Noah Levitt
87ec0f2f90 Merge branch 'master' into qa
* master:
  bump doublethink dependency
  verbiage tweaks
  safety check and --force for brozzler-purge
  new command brozzler-purge
2018-09-28 11:12:35 -07:00
Noah Levitt
2386e85a37 bump doublethink dependency 2018-09-27 14:25:49 -07:00
Noah Levitt
174178e02e new command brozzler-purge 2018-09-25 14:56:26 -07:00
Barbara Miller
60cfd684b2 Merge branch 'pageInterstitialShown' into qa 2018-09-25 10:30:02 -07:00
Noah Levitt
48bf185746 bump version after merge 2018-09-18 11:08:44 -07:00
Neil Minton
3c7fdeae2c Merge branch 'ari-5777' into qa 2018-09-12 12:07:45 -04:00
Noah Levitt
efb0696833 bump version number after merge 2018-09-06 16:17:59 -07:00
jkafader
8368cd2bcb
Merge pull request #115 from nlevitt/ydl-stitched
Ydl stitched
2018-09-06 16:15:52 -07:00
Noah Levitt
c4fdbe578d Merge branch 'master' into qa
* master:
  oops, back to dev version number
  wait 20 seconds to claim sites if none were avail-
  tweak logging
  why did those tests fail??? (#117)
  Add screenshots
  Add screenshots
  back to dev version
  1.4 for pypi
  explain --warcprox-auto briefly
  vagrant readme fixes (thanks funkyfuture)
  update cryptography dep version
2018-09-04 10:54:26 -07:00
Noah Levitt
a4eacb5b8f oops, back to dev version number 2018-09-04 10:52:34 -07:00
Noah Levitt
88d3d3b310
why did those tests fail??? (#117)
1.4 for pypi
2018-08-22 14:35:39 -07:00
Noah Levitt
2a2952e810 back to dev version 2018-08-21 15:18:18 -07:00
Noah Levitt
b63661ea70 1.4 for pypi 2018-08-21 15:15:38 -07:00
Noah Levitt
eaf7ef74be explain --warcprox-auto briefly 2018-08-17 12:06:04 -07:00
Noah Levitt
8cdc3dee21 Merge branch 'master' into ydl-stitched
* master:
  vagrant readme fixes (thanks funkyfuture)
  update cryptography dep version
2018-08-17 10:34:00 -07:00
Noah Levitt
d19e139101 vagrant readme fixes (thanks funkyfuture) 2018-08-17 10:31:01 -07:00
Noah Levitt
ffa8021968 update cryptography dep version
github tells me there's a vulnerability <2.3
2018-08-16 14:32:03 -07:00
Noah Levitt
cbeba3a6b9 Merge branch 'ydl-stitched' into qa
* ydl-stitched:
  fix failing tests
  test for youtube-dl stitch-up
  add missing imports and fix mimetype issue
  move youtube-dl code into separate file
  push youtube-dl's stitched up videos to warcprox
2018-08-16 12:10:44 -07:00
Noah Levitt
418a3ef20c Merge branch 'master' into qa
* master:
  expose more brozzle-page args
  update pillow dependency to get rid of github vul-
  more readme edits
  reformat readme to 80 columns
  Copy edits to job-conf readme
  bump up heartbeat interval (see comment)
  Copy edits
  back to dev version
  version 1.3 (messed up 1.2)
  setuptools wants README not readme
  back to dev version number
  version 1.2
  bump dev version after merge
  is test_time_limit is failing because of timing?
2018-08-16 12:08:48 -07:00
Noah Levitt
3c27132aaa test for youtube-dl stitch-up 2018-08-15 17:42:53 -07:00
Noah Levitt
39155ebcc5 push youtube-dl's stitched up videos to warcprox
(no tests yet)
2018-08-13 15:40:48 -07:00
Noah Levitt
4e398e1da2 expose more brozzle-page args 2018-08-13 15:38:24 -07:00
Noah Levitt
b44a444dc2 update pillow dependency to get rid of github vul-
nerability warning
2018-07-24 16:37:25 -05:00
Noah Levitt
9d18dc6aeb bump up heartbeat interval (see comment) 2018-07-03 18:35:08 -05:00
Noah Levitt
783fd0ea87 back to dev version 2018-06-25 19:32:27 +00:00