1500 Commits

Author SHA1 Message Date
Noah Levitt
ecb2d70369 youtube-dl: skip youtube dash manifests
missed in last merge commit
replay of 6a4bcaca1b
2018-08-16 14:25:37 -07:00
Noah Levitt
cbeba3a6b9 Merge branch 'ydl-stitched' into qa
* ydl-stitched:
  fix failing tests
  test for youtube-dl stitch-up
  add missing imports and fix mimetype issue
  move youtube-dl code into separate file
  push youtube-dl's stitched up videos to warcprox
2018-08-16 12:10:44 -07:00
Noah Levitt
418a3ef20c Merge branch 'master' into qa
* master:
  expose more brozzle-page args
  update pillow dependency to get rid of github vul-
  more readme edits
  reformat readme to 80 columns
  Copy edits to job-conf readme
  bump up heartbeat interval (see comment)
  Copy edits
  back to dev version
  version 1.3 (messed up 1.2)
  setuptools wants README not readme
  back to dev version number
  version 1.2
  bump dev version after merge
  is test_time_limit is failing because of timing?
2018-08-16 12:08:48 -07:00
Noah Levitt
e7d2273856 fix failing tests 2018-08-16 11:40:54 -07:00
Noah Levitt
3c27132aaa test for youtube-dl stitch-up 2018-08-15 17:42:53 -07:00
Noah Levitt
c2ad8427e1 add missing imports and fix mimetype issue 2018-08-15 17:41:35 -07:00
Noah Levitt
33520da8f9 move youtube-dl code into separate file 2018-08-14 15:10:48 -07:00
Noah Levitt
39155ebcc5 push youtube-dl's stitched up videos to warcprox
(no tests yet)
2018-08-13 15:40:48 -07:00
Noah Levitt
4e398e1da2 expose more brozzle-page args 2018-08-13 15:38:24 -07:00
Noah Levitt
b44a444dc2 update pillow dependency to get rid of github vul-
nerability warning
2018-07-24 16:37:25 -05:00
Noah Levitt
771d6aa626 more readme edits 2018-07-23 19:05:49 -05:00
Noah Levitt
073fc713f4
Merge pull request #113 from nlevitt/karl-readme
Karl readme copy edits
2018-07-23 18:36:00 -05:00
Noah Levitt
f7407a87c1 reformat readme to 80 columns 2018-07-23 23:32:56 +00:00
Noah Levitt
a7fb7bcc37 Merge branch 'master' into karl
* master:
  bump up heartbeat interval (see comment)
  back to dev version
  version 1.3 (messed up 1.2)
  setuptools wants README not readme
  back to dev version number
  version 1.2
  bump dev version after merge
  is test_time_limit is failing because of timing?
  fix bug in test, add another one
  treat any error fetching robots.txt as "allow all"
  update instagram behavior
2018-07-23 23:28:42 +00:00
Karl-Rainer Blumenthal
bd78e07232
Copy edits to job-conf readme
Good reading and rampant pedantry!
2018-07-06 15:24:12 -04:00
Noah Levitt
9d18dc6aeb bump up heartbeat interval (see comment) 2018-07-03 18:35:08 -05:00
Barbara Miller
98c21d9d1f Merge branch 'ARI-5689' into qa 2018-07-03 14:50:51 -07:00
Barbara Miller
687d51de20 Revert "skip login for fb groups"
This reverts commit 5e1c86421e0f9a0807a5b0a6ca5bb39715b10e87.
2018-07-03 14:49:58 -07:00
Karl-Rainer Blumenthal
eebbc1d279
Copy edits 2018-06-28 12:59:22 -04:00
Barbara Miller
c3a19d3186 Revert "switch to group discussion tab"
This reverts commit 01bb731f54784e1ca271fb1f9b48ec2827e5dd80.
2018-06-27 16:56:34 -07:00
Barbara Miller
9a8fc15ff2 Merge branch 'ARI-5689' into qa 2018-06-27 16:54:41 -07:00
Barbara Miller
423e91a69f Revert "switch to group discussion tab"
This reverts commit 01bb731f54784e1ca271fb1f9b48ec2827e5dd80.
2018-06-27 16:51:55 -07:00
Barbara Miller
5e1c86421e skip login for fb groups 2018-06-27 16:51:38 -07:00
Barbara Miller
01bb731f54 switch to group discussion tab 2018-06-27 16:51:37 -07:00
Barbara Miller
1fffaa9eee Merge branch 'ARI-5689' into qa 2018-06-27 16:47:46 -07:00
Barbara Miller
76ec00c930 skip login for fb groups 2018-06-27 16:47:32 -07:00
Noah Levitt
783fd0ea87 back to dev version 2018-06-25 19:32:27 +00:00
Noah Levitt
bd63908fb9 version 1.3 (messed up 1.2) 1.3 2018-06-25 19:30:39 +00:00
Noah Levitt
2780c92569 setuptools wants README not readme 2018-06-25 19:10:57 +00:00
Noah Levitt
032c7d2898 back to dev version number 2018-06-25 12:33:34 -05:00
Noah Levitt
442d02b26a version 1.2 1.2 2018-06-25 12:21:00 -05:00
Noah Levitt
196cd555ea bump dev version after merge 2018-06-25 11:44:45 -05:00
Noah Levitt
05ec6a68b0
Merge pull request #110 from nlevitt/robots-errors
treat any error fetching robots.txt as "allow all"
2018-06-25 11:44:18 -05:00
Noah Levitt
d4db8ba9bc is test_time_limit is failing because of timing?
give it up to ten seconds to mark the job finished
2018-06-25 10:35:24 -05:00
Noah Levitt
09dbb4ce1d Merge branch 'robots-errors' into qa
* robots-errors:
  fix bug in test, add another one
2018-06-22 16:10:51 -05:00
Noah Levitt
c52c16c260 fix bug in test, add another one 2018-06-22 16:10:23 -05:00
Noah Levitt
aff67c3b29 Merge branch 'robots-errors' into qa
* robots-errors:
  treat any error fetching robots.txt as "allow all"
2018-06-22 16:01:21 -05:00
Noah Levitt
aeb7c3f825 treat any error fetching robots.txt as "allow all" 2018-06-22 14:50:57 -05:00
Neil Minton
f5f9a1a137
Merge pull request #109 from internetarchive/ARI-5747
update instagram behavior
2018-06-22 09:24:14 -07:00
Barbara Miller
ad5f409078 Merge branch 'ARI-5744' into qa 2018-06-19 14:17:08 -07:00
Barbara Miller
1f93f70bfe Revert "test behavior for event.crowdcompass.com"
This reverts commit 565a472fb0004f89434f1f775c154e9c4393d380.
2018-06-19 14:07:02 -07:00
Barbara Miller
96014606ec Merge branch 'ARI-5747' into qa 2018-06-18 10:37:09 -07:00
Barbara Miller
89e54fd2e6 update instagram behavior 2018-06-18 10:36:13 -07:00
Barbara Miller
0857fffeb6 Merge branch 'ARI-5747' into qa 2018-06-13 12:43:35 -07:00
Barbara Miller
5893b1f982 update instagram behavior 2018-06-13 12:43:12 -07:00
Barbara Miller
6b753623b7 Merge branch 'ARI-5744' into qa 2018-06-11 18:34:50 -07:00
Barbara Miller
565a472fb0 test behavior for event.crowdcompass.com 2018-06-11 18:30:40 -07:00
Noah Levitt
27bdfb65d2 monkey-patch youtube-dl to short-circuit
video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:

Traceback (most recent call first):
  File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
    return _compile(pattern, flags).findall(string)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
    'uploader': video_uploader,
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
    ie_result = self._real_extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
    ie_result = ie.extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
    info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
    self._try_youtube_dl(ydl, site, page)
2018-06-11 11:50:22 -07:00
Noah Levitt
109d05c59a Merge branch 'master' into qa
* master:
  monkey-patch youtube-dl to short-circuit
2018-06-11 11:11:09 -07:00
Noah Levitt
a90a29968c monkey-patch youtube-dl to short-circuit
video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:

Traceback (most recent call first):
  File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
    return _compile(pattern, flags).findall(string)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
    'uploader': video_uploader,
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
    ie_result = self._real_extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
    ie_result = ie.extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
    info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
    self._try_youtube_dl(ydl, site, page)
2018-06-11 11:08:20 -07:00