1052 Commits

Author SHA1 Message Date
Noah Levitt
02e98f101d
Merge pull request #116 from kblumenthal/master
Add screenshots
2018-08-22 14:34:52 -07:00
Karl-Rainer Blumenthal
ff1645ef7d
Add screenshots
Add Brozzler Dashboard and Wayback screenshots to readme
2018-08-22 13:02:08 -04:00
Karl-Rainer Blumenthal
7c8b597ad3
Add screenshots
Add screenshots of Brozzler Dashboard and Wayback
2018-08-22 12:55:10 -04:00
Noah Levitt
2a2952e810 back to dev version 2018-08-21 15:18:18 -07:00
Noah Levitt
b63661ea70 1.4 for pypi 2018-08-21 15:15:38 -07:00
Noah Levitt
eaf7ef74be explain --warcprox-auto briefly 2018-08-17 12:06:04 -07:00
Karl-Rainer Blumenthal
2081e6388a
Merge pull request #2 from internetarchive/master
Updating to upstream origin
2018-08-17 14:26:46 -04:00
Noah Levitt
d19e139101 vagrant readme fixes (thanks funkyfuture) 2018-08-17 10:31:01 -07:00
Noah Levitt
ffa8021968 update cryptography dep version
github tells me there's a vulnerability <2.3
2018-08-16 14:32:03 -07:00
Noah Levitt
4e398e1da2 expose more brozzle-page args 2018-08-13 15:38:24 -07:00
Noah Levitt
b44a444dc2 update pillow dependency to get rid of github vul-
nerability warning
2018-07-24 16:37:25 -05:00
Noah Levitt
771d6aa626 more readme edits 2018-07-23 19:05:49 -05:00
Noah Levitt
073fc713f4
Merge pull request #113 from nlevitt/karl-readme
Karl readme copy edits
2018-07-23 18:36:00 -05:00
Noah Levitt
f7407a87c1 reformat readme to 80 columns 2018-07-23 23:32:56 +00:00
Noah Levitt
a7fb7bcc37 Merge branch 'master' into karl
* master:
  bump up heartbeat interval (see comment)
  back to dev version
  version 1.3 (messed up 1.2)
  setuptools wants README not readme
  back to dev version number
  version 1.2
  bump dev version after merge
  is test_time_limit is failing because of timing?
  fix bug in test, add another one
  treat any error fetching robots.txt as "allow all"
  update instagram behavior
2018-07-23 23:28:42 +00:00
Karl-Rainer Blumenthal
bd78e07232
Copy edits to job-conf readme
Good reading and rampant pedantry!
2018-07-06 15:24:12 -04:00
Noah Levitt
9d18dc6aeb bump up heartbeat interval (see comment) 2018-07-03 18:35:08 -05:00
Karl-Rainer Blumenthal
eebbc1d279
Copy edits 2018-06-28 12:59:22 -04:00
Noah Levitt
783fd0ea87 back to dev version 2018-06-25 19:32:27 +00:00
Noah Levitt
bd63908fb9 version 1.3 (messed up 1.2) 1.3 2018-06-25 19:30:39 +00:00
Noah Levitt
2780c92569 setuptools wants README not readme 2018-06-25 19:10:57 +00:00
Noah Levitt
032c7d2898 back to dev version number 2018-06-25 12:33:34 -05:00
Noah Levitt
442d02b26a version 1.2 1.2 2018-06-25 12:21:00 -05:00
Noah Levitt
196cd555ea bump dev version after merge 2018-06-25 11:44:45 -05:00
Noah Levitt
05ec6a68b0
Merge pull request #110 from nlevitt/robots-errors
treat any error fetching robots.txt as "allow all"
2018-06-25 11:44:18 -05:00
Noah Levitt
d4db8ba9bc is test_time_limit is failing because of timing?
give it up to ten seconds to mark the job finished
2018-06-25 10:35:24 -05:00
Noah Levitt
c52c16c260 fix bug in test, add another one 2018-06-22 16:10:23 -05:00
Noah Levitt
aeb7c3f825 treat any error fetching robots.txt as "allow all" 2018-06-22 14:50:57 -05:00
Neil Minton
f5f9a1a137
Merge pull request #109 from internetarchive/ARI-5747
update instagram behavior
2018-06-22 09:24:14 -07:00
Barbara Miller
89e54fd2e6 update instagram behavior 2018-06-18 10:36:13 -07:00
Noah Levitt
27bdfb65d2 monkey-patch youtube-dl to short-circuit
video extraction using generic extractor in case of very large url (more
than 20 mb) that youtube-dl interprets as html, to avoid spinning
forever here:

Traceback (most recent call first):
  File "/opt/brozzler-ve3/lib/python3.5/re.py", line 213, in findall
    return _compile(pattern, flags).findall(string)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 2878, in _real_extract
    'uploader': video_uploader,
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 503, in extract
    ie_result = self._real_extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 792, in extract_info
    ie_result = ie.extract(url)
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 302, in _try_youtube_dl
    info = ydl.extract_info(str(urlcanon.whatwg(page.url)))
  File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 361, in brozzle_page
    self._try_youtube_dl(ydl, site, page)
2018-06-11 11:50:22 -07:00
Noah Levitt
b41ccd7e6b
Merge pull request #108 from nlevitt/docs
Docs
2018-05-31 14:15:12 -07:00
Noah Levitt
62bb540a11 lowercase readme.rst 2018-05-31 18:46:37 +00:00
Noah Levitt
a00b5a7fd5 explain brozzler use of warcprox_meta 2018-05-30 18:06:39 -07:00
Noah Levitt
aef4c40993
Merge pull request #107 from internetarchive/copyright-2018
update README copyright date
2018-05-17 11:30:46 -07:00
Barbara Miller
135a13b1c9
update README copyright date 2018-05-17 11:21:47 -07:00
Noah Levitt
8906037d82 bump dev version after PR #102 2018-05-16 17:33:52 -07:00
Noah Levitt
e90e7345a5
Merge pull request #102 from nlevitt/docs
complete job configuration documentation
2018-05-16 17:31:27 -07:00
Noah Levitt
331d07fe88 these ssurts are strings too 2018-05-16 17:11:08 -07:00
Noah Levitt
67558528cb fix bad copy/paste 2018-05-16 16:43:38 -07:00
Noah Levitt
5bb392ec7c ssurts are strings now
because they're friendlier that way in rethinkdb
2018-05-16 16:43:10 -07:00
Noah Levitt
399c097c7c travis-ci install warcprox from github 2018-05-16 15:48:29 -07:00
Noah Levitt
ac735639ff incorporate urlcanon fix 2018-05-16 14:41:49 -07:00
Noah Levitt
338d2e48f9 update warcprox dependency to include recent fixes 2018-05-16 14:26:51 -07:00
Noah Levitt
b9b8dcd062 backward compatibility for old scope["surt"]
and make sure to store ssurt as string in rethinkdb
2018-05-16 14:19:23 -07:00
Noah Levitt
1572fd3ed6 missed a spot where is_permitted_by_robots needs monkeying 2018-05-15 16:52:48 -07:00
Noah Levitt
a8de9b70d1 handle new chrome cookie db schema 2018-05-15 11:41:02 -07:00
Noah Levitt
de1f240e25 describe scope rule conditions
plus a bunch of tweaks and fixes
2018-05-15 11:01:09 -07:00
Noah Levitt
a327cb626f more explication of scoping 2018-05-14 17:31:45 -07:00
Noah Levitt
2cf474aa1d update docs to match new seed ssurt behavior 2018-05-14 16:59:55 -07:00