138 Commits

Author SHA1 Message Date
Andy Jackson
976f627996
Update Node version for linting
Node 12 is very old and linting is failing. So trying the most recent LTS version.
2024-01-18 15:41:59 +00:00
Andy Jackson
fee5cf873a
Add TOC headings 2024-01-18 15:28:40 +00:00
Andy Jackson
0dd0af911f
Add Services section
Adding a Web Archive Services section to list hosted and self-hostable web archiving options.
2024-01-18 15:23:56 +00:00
kokomo123
86c769597d
Add IA Library to Utilities (#143) 2023-12-20 08:53:11 -05:00
Ed Summers
f0b7cdbae0
Added warcdb (#142) 2023-10-16 12:21:54 -04:00
Ross Spencer
4b12cc7b32
Update the details around HTTPreserve.info (#141) 2023-08-30 07:14:42 -04:00
Ed Summers
034582f3aa
Adjusted jwarc description (#140) 2023-08-01 11:55:09 -04:00
IIPC
d6ca8af2c0
Update README.md (#139)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2023-07-14 07:31:45 -04:00
Greg Lindahl
5d41023b2b
add cc analysis (#138)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Nick Ruest <ruestn@gmail.com>
2023-07-04 12:54:21 -04:00
Greg Lindahl
d4673d008e
add cdx-toolkit (#135)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:37:05 +01:00
Greg Lindahl
d395bb1b44
add common crawl mailing list (#136)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:36:05 +01:00
Greg Lindahl
bf9664ff45
add web data commons (#137)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:34:33 +01:00
Greg Lindahl
54110410bf
warcio was stable a long time ago (#134)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
2023-07-04 09:33:09 +01:00
Greg Lindahl
4c04474998
this link works (#131) 2023-06-28 11:38:31 +01:00
Nick Ruest
11fee57dcb
Fix linter error: Ignore double IA Wayback link. (#129) 2023-06-01 15:58:29 -04:00
Rustem Kamalov
232966c4cb
Add gogetcrawl (#128)
* Add `gogetcrawl`
2023-06-01 15:33:15 -04:00
Nick Ruest
d8631ddf05
Add crau. (#127)
- Resolves #95
2023-04-30 20:05:45 -04:00
Matteo Cargnelutti
4ecc363191
Adding @harvard-lil/scoop (#126) 2023-04-26 16:56:25 -04:00
Ed Summers
46dc9518e4
added warcdedupe (#125) 2023-04-18 20:28:39 -04:00
Andy Jackson
b309687f88
Update runs-on
Was pointing at defunct base image.
2023-04-13 14:46:14 +01:00
Andy Jackson
6bdb3373cb
Add two tools that can do WARC deduplication (#124) 2023-04-12 11:00:52 -04:00
Hendursaga
fc1a73d22d
Rename 22120 to DiskerNet (#123) 2023-01-20 07:59:41 -05:00
Andy Jackson
248f9dc42e
Update README.md (#122) 2022-10-17 19:47:33 -04:00
Mat Kelly
0104c202c8
Fix typo (#121) 2022-09-27 10:46:37 -04:00
Andy Jackson
6b7a3372d4
Add the Bellingcat Auto Archiver (#120) 2022-09-23 22:38:51 -04:00
IIPC
f1a10b71b1
Update README.md 2022-08-23 12:10:15 -04:00
Nick Ruest
62515809d6
Add ARCH and Sparkling (#119)
* Remove Archives Unleashed Cloud.

* Add Sparkling and ARCH.

* linter
2022-05-25 22:30:37 +01:00
IIPC
0391cce057
Update README.md 2022-05-12 00:04:39 -04:00
IIPC
36dadbf3c4
Update README.md 2022-05-11 23:41:43 -04:00
Ross Spencer
82e512bde2
Correct the link for "Web as History" (#118) 2022-03-03 12:19:52 -05:00
Akash Mahanty
232ef44fd2
+ waybackpy (https://github.com/akamhy/waybackpy) (#117) 2022-01-22 23:51:23 -05:00
Mat Kelly
d3cbc44fbd
Add Unwarcit (#115) 2022-01-05 10:32:05 -05:00
Mat Kelly
921cf36496
Add FastWARC (#114)
* Update README.md

* Capitalize the description to appease the linter
2021-12-13 11:30:53 -05:00
Alex Osborne
30661eacd0
Add warc2html to Replay section (#113) 2021-11-08 00:53:24 -05:00
Wayback Archiver
9ff76782d1
Add Wayback to Acquisition (#112) 2021-10-07 08:45:48 -04:00
Andy Jackson
393919d9ee
Add gowarcserver by Norsk nettarkiv (#111) 2021-07-20 09:15:59 -04:00
Andy Jackson
7b5c80c44f
Adding WCT and a separate curation section. (#110)
* Adding WCT and a  separate curation section.

WCT should clearly be on this list.

The curation section is a proposal to capture any tools that integrate web archiving into curation workflows and tools.

* Fix spacing of bullet
2021-07-13 08:33:08 -04:00
Nick Ruest
a9daaebc34
Remove Archives Unleashed Cloud. (#109)
😢
2021-06-30 20:20:30 +01:00
Youssef Eldakar
cf1c8ff4f1
Add Warcprox (#108)
Also moved WAIL up the list to correct alphabetical order.
2021-06-22 14:39:37 +02:00
Ed Summers
5e11c22564
Added Browsertrix and ArchiveWeb.page (#107)
* added browsertrix and archiveweb.page

* wording change

* fixed browsertrix link

* minor tweaks

* capitalization

* capitalization

* note that archiveweb.page is also available as a desktop app
2021-05-28 14:45:29 -04:00
Michael L. Nelson
f2ae23d5ae
added @WebSciDL (#106)
* added @WebSciDL
2021-04-27 14:46:18 -04:00
WaybackBot
9fe7d3558b
Add playback (#105) 2021-04-24 13:36:54 -04:00
Alex Osborne
9d2356b766
Add httrack2warc utility (#104) 2021-04-16 09:08:07 -04:00
Thomas Egense
821eaf9fbc
Patch 1 (#103)
* Update README.md

* Update README.md
2021-03-05 13:02:10 +00:00
Cris Stringfellow
3de3d8c59b
Add 22120 (#102) 2020-11-09 10:54:58 -05:00
WaybackBot
19fc5214e1
Add Cairn and Obelisk to the list. (#100)
* Add Cairn and Obelisk to the list
* Fix awesome lint issues
* Resolve #97
2020-11-06 13:41:21 -05:00
Mat Kelly
98f6832c15
Sort Replay section alphabetically to align with other sections (#96)
Full disclosure: I am one of the authors of ipwb and by no means am attempting to promote it with this change, but figured the consistency would be appreciated.
2020-09-18 10:01:26 +09:00
IIPC PCO
b3ef2514e0
Update README.md 2020-09-16 21:42:41 +00:00
IIPC PCO
ac682223a6
Update README.md 2020-09-16 21:37:20 +00:00
Nick Ruest
d2c8ff8ae2
Move Lentil to deprecated list. (#94) 2020-06-23 09:29:46 +09:00