Greg Lindahl
d4673d008e
add cdx-toolkit ( #135 )
...
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:37:05 +01:00
Greg Lindahl
d395bb1b44
add common crawl mailing list ( #136 )
...
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:36:05 +01:00
Greg Lindahl
bf9664ff45
add web data commons ( #137 )
...
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
Co-authored-by: Andy Jackson <Andrew.Jackson@bl.uk>
2023-07-04 09:34:33 +01:00
Greg Lindahl
54110410bf
warcio was stable a long time ago ( #134 )
...
Co-authored-by: Greg Lindahl <greg@commomncrawl.org>
2023-07-04 09:33:09 +01:00
Greg Lindahl
4c04474998
this link works ( #131 )
2023-06-28 11:38:31 +01:00
Nick Ruest
11fee57dcb
Fix linter error: Ignore double IA Wayback link. ( #129 )
2023-06-01 15:58:29 -04:00
Rustem Kamalov
232966c4cb
Add gogetcrawl
( #128 )
...
* Add `gogetcrawl`
2023-06-01 15:33:15 -04:00
Nick Ruest
d8631ddf05
Add crau. ( #127 )
...
- Resolves #95
2023-04-30 20:05:45 -04:00
Matteo Cargnelutti
4ecc363191
Adding @harvard-lil/scoop ( #126 )
2023-04-26 16:56:25 -04:00
Ed Summers
46dc9518e4
added warcdedupe ( #125 )
2023-04-18 20:28:39 -04:00
Andy Jackson
b309687f88
Update runs-on
...
Was pointing at defunct base image.
2023-04-13 14:46:14 +01:00
Andy Jackson
6bdb3373cb
Add two tools that can do WARC deduplication ( #124 )
2023-04-12 11:00:52 -04:00
Hendursaga
fc1a73d22d
Rename 22120 to DiskerNet ( #123 )
2023-01-20 07:59:41 -05:00
Andy Jackson
248f9dc42e
Update README.md ( #122 )
2022-10-17 19:47:33 -04:00
Mat Kelly
0104c202c8
Fix typo ( #121 )
2022-09-27 10:46:37 -04:00
Andy Jackson
6b7a3372d4
Add the Bellingcat Auto Archiver ( #120 )
2022-09-23 22:38:51 -04:00
IIPC
f1a10b71b1
Update README.md
2022-08-23 12:10:15 -04:00
Nick Ruest
62515809d6
Add ARCH and Sparkling ( #119 )
...
* Remove Archives Unleashed Cloud.
* Add Sparkling and ARCH.
* linter
2022-05-25 22:30:37 +01:00
IIPC
0391cce057
Update README.md
2022-05-12 00:04:39 -04:00
IIPC
36dadbf3c4
Update README.md
2022-05-11 23:41:43 -04:00
Ross Spencer
82e512bde2
Correct the link for "Web as History" ( #118 )
2022-03-03 12:19:52 -05:00
Akash Mahanty
232ef44fd2
+ waybackpy ( https://github.com/akamhy/waybackpy ) ( #117 )
2022-01-22 23:51:23 -05:00
Mat Kelly
d3cbc44fbd
Add Unwarcit ( #115 )
2022-01-05 10:32:05 -05:00
Mat Kelly
921cf36496
Add FastWARC ( #114 )
...
* Update README.md
* Capitalize the description to appease the linter
2021-12-13 11:30:53 -05:00
Alex Osborne
30661eacd0
Add warc2html to Replay section ( #113 )
2021-11-08 00:53:24 -05:00
Wayback Archiver
9ff76782d1
Add Wayback to Acquisition ( #112 )
2021-10-07 08:45:48 -04:00
Andy Jackson
393919d9ee
Add gowarcserver by Norsk nettarkiv ( #111 )
2021-07-20 09:15:59 -04:00
Andy Jackson
7b5c80c44f
Adding WCT and a separate curation section. ( #110 )
...
* Adding WCT and a separate curation section.
WCT should clearly be on this list.
The curation section is a proposal to capture any tools that integrate web archiving into curation workflows and tools.
* Fix spacing of bullet
2021-07-13 08:33:08 -04:00
Nick Ruest
a9daaebc34
Remove Archives Unleashed Cloud. ( #109 )
...
😢
2021-06-30 20:20:30 +01:00
Youssef Eldakar
cf1c8ff4f1
Add Warcprox ( #108 )
...
Also moved WAIL up the list to correct alphabetical order.
2021-06-22 14:39:37 +02:00
Ed Summers
5e11c22564
Added Browsertrix and ArchiveWeb.page ( #107 )
...
* added browsertrix and archiveweb.page
* wording change
* fixed browsertrix link
* minor tweaks
* capitalization
* capitalization
* note that archiveweb.page is also available as a desktop app
2021-05-28 14:45:29 -04:00
Michael L. Nelson
f2ae23d5ae
added @WebSciDL ( #106 )
...
* added @WebSciDL
2021-04-27 14:46:18 -04:00
WaybackBot
9fe7d3558b
Add playback ( #105 )
2021-04-24 13:36:54 -04:00
Alex Osborne
9d2356b766
Add httrack2warc utility ( #104 )
2021-04-16 09:08:07 -04:00
Thomas Egense
821eaf9fbc
Patch 1 ( #103 )
...
* Update README.md
* Update README.md
2021-03-05 13:02:10 +00:00
Cris Stringfellow
3de3d8c59b
Add 22120 ( #102 )
2020-11-09 10:54:58 -05:00
WaybackBot
19fc5214e1
Add Cairn and Obelisk to the list. ( #100 )
...
* Add Cairn and Obelisk to the list
* Fix awesome lint issues
* Resolve #97
2020-11-06 13:41:21 -05:00
Mat Kelly
98f6832c15
Sort Replay section alphabetically to align with other sections ( #96 )
...
Full disclosure: I am one of the authors of ipwb and by no means am attempting to promote it with this change, but figured the consistency would be appreciated.
2020-09-18 10:01:26 +09:00
IIPC PCO
b3ef2514e0
Update README.md
2020-09-16 21:42:41 +00:00
IIPC PCO
ac682223a6
Update README.md
2020-09-16 21:37:20 +00:00
Nick Ruest
d2c8ff8ae2
Move Lentil to deprecated list. ( #94 )
2020-06-23 09:29:46 +09:00
Alex Wendland
36ac91b158
Update WebRecorder's replay tool to ReplayWeb.Page ( #93 )
...
* Update WebRecorder's replay tool to ReplayWeb.Page
See the WebRecorder announcement on June 11th describing the transition https://webrecorder.net/2020/06/11/webrecorder-conifer-and-replayweb-page.html
* Move WebRecorder's replay tool to deprecated.md
2020-06-22 08:30:25 -04:00
Nick Ruest
8d6217af8d
Update link to aut documentation, and remove Warcbase workshop. ( #92 )
2020-06-10 11:40:50 +09:00
Andy Jackson
078fc3adc1
Adding GLAM Workbench and Awesome Lists section ( #91 )
...
* Adding GLAM Workbench and Awesome Lists section
Adding the results of the GLAM Workbench work and re-jigging to
make clearer links to other community resources.
Also added a link to the Web Archiving Community list.
* Fix formatting problems.
* Update WebMemex link
The `.org` link is not working right now, so switching to https://github.com/WebMemex
* Fixed TOC
Using VS Code plugin 'Markdown All In One'
* Fix linting 'problems'
Jeez that's a picky linter. Stripping out Markdown All In One conventions.
2020-06-05 11:40:01 +01:00
Sawood Alam
84d213689c
Elaborate description and make tags italic ( #90 )
...
* Elaborate on the description of the list
* Make tags after description italic
* Prevent a wildcard from being treated as a markdown symbol
* Make list indentation consistent
* Update description with inspiration from the Wikipedia page
2020-03-26 17:00:22 -04:00
Sawood Alam
cb72a26752
Fix a typo in #87 ( #89 )
2020-03-10 12:33:17 -04:00
Sawood Alam
ea98c25983
Add DSHR Blog ( #87 )
2020-03-07 10:37:24 +00:00
Andy Jackson
3a96fb2d16
Add @ato's guidelines from Slack discussion. ( #81 )
...
* Add @ato's guidelines from Slack discussion.
* Words are hard.
2020-03-05 15:16:20 +00:00
Mat Kelly
3c0dc1e1b5
Update blog description to be more objective. ( #85 )
...
* Update blog description to be more objective.
* Rm note regarding blog's activity.
* Updated Roundtable Blog description per @ruebot in #85
2020-03-03 15:19:44 -05:00
Mat Kelly
78bc949dab
Big D, like the other instances, per CONTRIBUTING.md ( #86 )
2020-03-03 14:40:31 -05:00