video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:
Traceback (most recent call first):
File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
return _compile(pattern, flags).findall(string)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
'uploader': video_uploader,
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
ie_result = self._real_extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
ie_result = ie.extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
self._try_youtube_dl(ydl, site, page)
video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:
Traceback (most recent call first):
File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
return _compile(pattern, flags).findall(string)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
'uploader': video_uploader,
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
ie_result = self._real_extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
ie_result = ie.extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
self._try_youtube_dl(ydl, site, page)
* master:
lowercase readme.rst
explain brozzler use of warcprox_meta
update README copyright date
bump dev version after PR #102
these ssurts are strings too
fix bad copy/paste
ssurts are strings now
travis-ci install warcprox from github
incorporate urlcanon fix
update warcprox dependency to include recent fixes
backward compatibility for old scope["surt"]
missed a spot where is_permitted_by_robots needs monkeying
handle new chrome cookie db schema
describe scope rule conditions
more explication of scoping
update docs to match new seed ssurt behavior
ok seriously tests
fix more tests for new approach sans scope['surt']
s/max_hops_off_surt/max_hops_off/
new test of max_hops_off
rename page.hops_off_surt to page.hops_off
doublethink had a bug fix
tests for new approach without scope['surt']
tests for new approach without of scope['surt']
WIP add an accept rule instead of modifying surt
WIP some words on scoping
WIP starting to flesh out "scoping" section
WIP some explanation of automatic login
WIP documentation!