monkey-patch youtube-dl to short-circuit

mirror of https://github.com/internetarchive/brozzler.git synced 2025-12-13 07:38:55 -05:00

video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:

Traceback (most recent call first):
  File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
    return _compile(pattern, flags).findall(string)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
    'uploader': video_uploader,
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
    ie_result = self._real_extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
    ie_result = ie.extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
    info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
    self._try_youtube_dl(ydl, site, page)

This commit is contained in:

Noah Levitt

2018-06-11 11:50:22 -07:00

parent b41ccd7e6b

commit 27bdfb65d2

2 changed files with 12 additions and 1 deletions

									
										2

setup.py
									
										View file
										
				@ -32,7 +32,7 @@ def find_package_data(package):

				setuptools.setup(

				        name='brozzler',

				        version='1.1b13.dev290',

				        version='1.1b13.dev291',

				        description='Distributed web crawling with browsers',

				        url='https://github.com/internetarchive/brozzler',

				        author='Noah Levitt',

Rows
Columns

monkey-patch youtube-dl to short-circuit

2 setup.py Unescape Escape View file

2

setup.py

View file