Commit Graph

140 Commits

Author SHA1 Message Date
IIPC
1953151aae
Update README.md 2024-09-10 10:26:04 -04:00
Natanael Arndt
168526a62c
Fix the jwat link(s) according to answers in the #os-sos@iipc.slack.com channel (#149) 2024-05-08 08:44:42 -04:00
lasztoth
99241ae461
Added warc-safe to list (#148) 2024-05-06 08:26:07 -04:00
Henry Wilkinson
8e713a4388
Update list with current Webrecorder related URLs (#147)
* Update list with current Webrecorder URLs

A few terms have changed!  These should all be the most current, Conifer is notably duped, could remove one of them?

* Remove dupe Conifer link, updates Webrecorder tools

- Update PYWB link
- Update "ReplayWeb.page" casing

* Add stable tag to ReplayWeb.page

* Update README.md

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-04-25 09:43:26 +01:00
Andy Jackson
101ee998d9
Adding a Web Archive Services section to list hosted and self-hostable web archiving options. (#144)
* Add Services section
* Add TOC headings
* Update Node version for linting
  * Node 12 is very old and linting is failing. So trying the most recent LTS version.
* Fix up linting problems
2024-01-18 10:57:01 -05:00
kokomo123
86c769597d
Add IA Library to Utilities (#143) 2023-12-20 08:53:11 -05:00
Ed Summers
f0b7cdbae0
Added warcdb (#142) 2023-10-16 12:21:54 -04:00
Ross Spencer
4b12cc7b32
Update the details around HTTPreserve.info (#141) 2023-08-30 07:14:42 -04:00
Ed Summers
034582f3aa
Adjusted jwarc description (#140) 2023-08-01 11:55:09 -04:00
IIPC
d6ca8af2c0
Update README.md (#139)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2023-07-14 07:31:45 -04:00
Greg Lindahl
5d41023b2b
add cc analysis (#138)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Nick Ruest <ruestn@gmail.com>
2023-07-04 12:54:21 -04:00
Greg Lindahl
d4673d008e
add cdx-toolkit (#135)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:37:05 +01:00
Greg Lindahl
d395bb1b44
add common crawl mailing list (#136)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:36:05 +01:00
Greg Lindahl
bf9664ff45
add web data commons (#137)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:34:33 +01:00
Greg Lindahl
54110410bf
warcio was stable a long time ago (#134)
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
2023-07-04 09:33:09 +01:00
Greg Lindahl
4c04474998
this link works (#131) 2023-06-28 11:38:31 +01:00
Nick Ruest
11fee57dcb
Fix linter error: Ignore double IA Wayback link. (#129) 2023-06-01 15:58:29 -04:00
Rustem Kamalov
232966c4cb
Add gogetcrawl (#128)
* Add `gogetcrawl`
2023-06-01 15:33:15 -04:00
Nick Ruest
d8631ddf05
Add crau. (#127)
- Resolves #95
2023-04-30 20:05:45 -04:00
Matteo Cargnelutti
4ecc363191
Adding @harvard-lil/scoop (#126) 2023-04-26 16:56:25 -04:00
Ed Summers
46dc9518e4
added warcdedupe (#125) 2023-04-18 20:28:39 -04:00
Andy Jackson
b309687f88
Update runs-on
Was pointing at defunct base image.
2023-04-13 14:46:14 +01:00
Andy Jackson
6bdb3373cb
Add two tools that can do WARC deduplication (#124) 2023-04-12 11:00:52 -04:00
Hendursaga
fc1a73d22d
Rename 22120 to DiskerNet (#123) 2023-01-20 07:59:41 -05:00
Andy Jackson
248f9dc42e
Update README.md (#122) 2022-10-17 19:47:33 -04:00
Mat Kelly
0104c202c8
Fix typo (#121) 2022-09-27 10:46:37 -04:00
Andy Jackson
6b7a3372d4
Add the Bellingcat Auto Archiver (#120) 2022-09-23 22:38:51 -04:00
IIPC
f1a10b71b1
Update README.md 2022-08-23 12:10:15 -04:00
Nick Ruest
62515809d6
Add ARCH and Sparkling (#119)
* Remove Archives Unleashed Cloud.

* Add Sparkling and ARCH.

* linter
2022-05-25 22:30:37 +01:00
IIPC
0391cce057
Update README.md 2022-05-12 00:04:39 -04:00
IIPC
36dadbf3c4
Update README.md 2022-05-11 23:41:43 -04:00
Ross Spencer
82e512bde2
Correct the link for "Web as History" (#118) 2022-03-03 12:19:52 -05:00
Akash Mahanty
232ef44fd2
+ waybackpy (https://github.com/akamhy/waybackpy) (#117) 2022-01-22 23:51:23 -05:00
Mat Kelly
d3cbc44fbd
Add Unwarcit (#115) 2022-01-05 10:32:05 -05:00
Mat Kelly
921cf36496
Add FastWARC (#114)
* Update README.md

* Capitalize the description to appease the linter
2021-12-13 11:30:53 -05:00
Alex Osborne
30661eacd0
Add warc2html to Replay section (#113) 2021-11-08 00:53:24 -05:00
Wayback Archiver
9ff76782d1
Add Wayback to Acquisition (#112) 2021-10-07 08:45:48 -04:00
Andy Jackson
393919d9ee
Add gowarcserver by Norsk nettarkiv (#111) 2021-07-20 09:15:59 -04:00
Andy Jackson
7b5c80c44f
Adding WCT and a separate curation section. (#110)
* Adding WCT and a  separate curation section.

WCT should clearly be on this list.

The curation section is a proposal to capture any tools that integrate web archiving into curation workflows and tools.

* Fix spacing of bullet
2021-07-13 08:33:08 -04:00
Nick Ruest
a9daaebc34
Remove Archives Unleashed Cloud. (#109)
😢
2021-06-30 20:20:30 +01:00
Youssef Eldakar
cf1c8ff4f1
Add Warcprox (#108)
Also moved WAIL up the list to correct alphabetical order.
2021-06-22 14:39:37 +02:00
Ed Summers
5e11c22564
Added Browsertrix and ArchiveWeb.page (#107)
* added browsertrix and archiveweb.page

* wording change

* fixed browsertrix link

* minor tweaks

* capitalization

* capitalization

* note that archiveweb.page is also available as a desktop app
2021-05-28 14:45:29 -04:00
Michael L. Nelson
f2ae23d5ae
added @WebSciDL (#106)
* added @WebSciDL
2021-04-27 14:46:18 -04:00
WaybackBot
9fe7d3558b
Add playback (#105) 2021-04-24 13:36:54 -04:00
Alex Osborne
9d2356b766
Add httrack2warc utility (#104) 2021-04-16 09:08:07 -04:00
Thomas Egense
821eaf9fbc
Patch 1 (#103)
* Update README.md

* Update README.md
2021-03-05 13:02:10 +00:00
Cris Stringfellow
3de3d8c59b
Add 22120 (#102) 2020-11-09 10:54:58 -05:00
WaybackBot
19fc5214e1
Add Cairn and Obelisk to the list. (#100)
* Add Cairn and Obelisk to the list
* Fix awesome lint issues
* Resolve #97
2020-11-06 13:41:21 -05:00
Mat Kelly
98f6832c15
Sort Replay section alphabetically to align with other sections (#96)
Full disclosure: I am one of the authors of ipwb and by no means am attempting to promote it with this change, but figured the consistency would be appreciated.
2020-09-18 10:01:26 +09:00
IIPC PCO
b3ef2514e0
Update README.md 2020-09-16 21:42:41 +00:00