video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:
Traceback (most recent call first):
File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
return _compile(pattern, flags).findall(string)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
'uploader': video_uploader,
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
ie_result = self._real_extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
ie_result = ie.extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
self._try_youtube_dl(ydl, site, page)
We've been seeing some of this:
2018-02-14 20:16:44,011 17816 CRITICAL BrozzlingThread:36444 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:559) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 528, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 385, in brozzle_page
on_request)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 447, in _browse_page
cookie_db=site.get('cookie_db'))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 338, in start
self._wait_for(lambda: self.websock_thread.is_open, timeout=10)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 311, in _wait_for
elapsed, callback))
brozzler.browser.BrowsingTimeout: timed out after 11.1s waiting for: <function Browser.start.<locals>.<lambda> at 0x7fb2dc772bd8>
Mostly at startup. Now that brozzler claims sites in batches for
brozzling, we have situations where we start up a whole bunch of
browsers at the same time. That's probably why in some cases they are
slow to establish the websocket connection.
avoids this kind of error:
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 youtube_dl.to_stderr(YoutubeDL.py:514) ERROR: Unable to download webpage: <urlopen error no host given> (caused by URLError('no host given',))
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 root.brozzle_site(worker.py:521) proxy error (site.proxy=wbgrp-svc400.us.archive.org:8002), will try to choose a healthy instance next time site is brozzled: youtube-dl hit apparent proxy error from https:/www.laphil.com/press1718