Noah Levitt
2a2952e810
back to dev version
2018-08-21 15:18:18 -07:00
Noah Levitt
b63661ea70
1.4 for pypi
2018-08-21 15:15:38 -07:00
Noah Levitt
eaf7ef74be
explain --warcprox-auto briefly
2018-08-17 12:06:04 -07:00
Karl-Rainer Blumenthal
2081e6388a
Merge pull request #2 from internetarchive/master
...
Updating to upstream origin
2018-08-17 14:26:46 -04:00
Noah Levitt
8cdc3dee21
Merge branch 'master' into ydl-stitched
...
* master:
vagrant readme fixes (thanks funkyfuture)
update cryptography dep version
2018-08-17 10:34:00 -07:00
Noah Levitt
d19e139101
vagrant readme fixes (thanks funkyfuture)
2018-08-17 10:31:01 -07:00
Noah Levitt
ffa8021968
update cryptography dep version
...
github tells me there's a vulnerability <2.3
2018-08-16 14:32:03 -07:00
Noah Levitt
e7d2273856
fix failing tests
2018-08-16 11:40:54 -07:00
Noah Levitt
3c27132aaa
test for youtube-dl stitch-up
2018-08-15 17:42:53 -07:00
Noah Levitt
c2ad8427e1
add missing imports and fix mimetype issue
2018-08-15 17:41:35 -07:00
Noah Levitt
33520da8f9
move youtube-dl code into separate file
2018-08-14 15:10:48 -07:00
Noah Levitt
39155ebcc5
push youtube-dl's stitched up videos to warcprox
...
(no tests yet)
2018-08-13 15:40:48 -07:00
Noah Levitt
4e398e1da2
expose more brozzle-page args
2018-08-13 15:38:24 -07:00
Noah Levitt
b44a444dc2
update pillow dependency to get rid of github vul-
...
nerability warning
2018-07-24 16:37:25 -05:00
Noah Levitt
771d6aa626
more readme edits
2018-07-23 19:05:49 -05:00
Noah Levitt
073fc713f4
Merge pull request #113 from nlevitt/karl-readme
...
Karl readme copy edits
2018-07-23 18:36:00 -05:00
Noah Levitt
f7407a87c1
reformat readme to 80 columns
2018-07-23 23:32:56 +00:00
Noah Levitt
a7fb7bcc37
Merge branch 'master' into karl
...
* master:
bump up heartbeat interval (see comment)
back to dev version
version 1.3 (messed up 1.2)
setuptools wants README not readme
back to dev version number
version 1.2
bump dev version after merge
is test_time_limit is failing because of timing?
fix bug in test, add another one
treat any error fetching robots.txt as "allow all"
update instagram behavior
2018-07-23 23:28:42 +00:00
Karl-Rainer Blumenthal
bd78e07232
Copy edits to job-conf readme
...
Good reading and rampant pedantry!
2018-07-06 15:24:12 -04:00
Noah Levitt
9d18dc6aeb
bump up heartbeat interval (see comment)
2018-07-03 18:35:08 -05:00
Karl-Rainer Blumenthal
eebbc1d279
Copy edits
2018-06-28 12:59:22 -04:00
Noah Levitt
783fd0ea87
back to dev version
2018-06-25 19:32:27 +00:00
Noah Levitt
bd63908fb9
version 1.3 (messed up 1.2)
1.3
2018-06-25 19:30:39 +00:00
Noah Levitt
2780c92569
setuptools wants README not readme
2018-06-25 19:10:57 +00:00
Noah Levitt
032c7d2898
back to dev version number
2018-06-25 12:33:34 -05:00
Noah Levitt
442d02b26a
version 1.2
1.2
2018-06-25 12:21:00 -05:00
Noah Levitt
196cd555ea
bump dev version after merge
2018-06-25 11:44:45 -05:00
Noah Levitt
05ec6a68b0
Merge pull request #110 from nlevitt/robots-errors
...
treat any error fetching robots.txt as "allow all"
2018-06-25 11:44:18 -05:00
Noah Levitt
d4db8ba9bc
is test_time_limit is failing because of timing?
...
give it up to ten seconds to mark the job finished
2018-06-25 10:35:24 -05:00
Noah Levitt
c52c16c260
fix bug in test, add another one
2018-06-22 16:10:23 -05:00
Noah Levitt
aeb7c3f825
treat any error fetching robots.txt as "allow all"
2018-06-22 14:50:57 -05:00
Neil Minton
f5f9a1a137
Merge pull request #109 from internetarchive/ARI-5747
...
update instagram behavior
2018-06-22 09:24:14 -07:00
Barbara Miller
89e54fd2e6
update instagram behavior
2018-06-18 10:36:13 -07:00
Noah Levitt
27bdfb65d2
monkey-patch youtube-dl to short-circuit
...
video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:
Traceback (most recent call first):
File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
return _compile(pattern, flags).findall(string)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
'uploader': video_uploader,
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
ie_result = self._real_extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
ie_result = ie.extract(url)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
self._try_youtube_dl(ydl, site, page)
2018-06-11 11:50:22 -07:00
Noah Levitt
b41ccd7e6b
Merge pull request #108 from nlevitt/docs
...
Docs
2018-05-31 14:15:12 -07:00
Noah Levitt
62bb540a11
lowercase readme.rst
2018-05-31 18:46:37 +00:00
Noah Levitt
a00b5a7fd5
explain brozzler use of warcprox_meta
2018-05-30 18:06:39 -07:00
Noah Levitt
aef4c40993
Merge pull request #107 from internetarchive/copyright-2018
...
update README copyright date
2018-05-17 11:30:46 -07:00
Barbara Miller
135a13b1c9
update README copyright date
2018-05-17 11:21:47 -07:00
Noah Levitt
8906037d82
bump dev version after PR #102
2018-05-16 17:33:52 -07:00
Noah Levitt
e90e7345a5
Merge pull request #102 from nlevitt/docs
...
complete job configuration documentation
2018-05-16 17:31:27 -07:00
Noah Levitt
331d07fe88
these ssurts are strings too
2018-05-16 17:11:08 -07:00
Noah Levitt
67558528cb
fix bad copy/paste
2018-05-16 16:43:38 -07:00
Noah Levitt
5bb392ec7c
ssurts are strings now
...
because they're friendlier that way in rethinkdb
2018-05-16 16:43:10 -07:00
Noah Levitt
399c097c7c
travis-ci install warcprox from github
2018-05-16 15:48:29 -07:00
Noah Levitt
ac735639ff
incorporate urlcanon fix
2018-05-16 14:41:49 -07:00
Noah Levitt
338d2e48f9
update warcprox dependency to include recent fixes
2018-05-16 14:26:51 -07:00
Noah Levitt
b9b8dcd062
backward compatibility for old scope["surt"]
...
and make sure to store ssurt as string in rethinkdb
2018-05-16 14:19:23 -07:00
Noah Levitt
1572fd3ed6
missed a spot where is_permitted_by_robots needs monkeying
2018-05-15 16:52:48 -07:00
Noah Levitt
a8de9b70d1
handle new chrome cookie db schema
2018-05-15 11:41:02 -07:00