mirror of
https://github.com/iipc/awesome-web-archiving.git
synced 2025-03-26 02:28:12 -04:00
Adding GLAM Workbench and Awesome Lists section (#91)
* Adding GLAM Workbench and Awesome Lists section Adding the results of the GLAM Workbench work and re-jigging to make clearer links to other community resources. Also added a link to the Web Archiving Community list. * Fix formatting problems. * Update WebMemex link The `.org` link is not working right now, so switching to https://github.com/WebMemex * Fixed TOC Using VS Code plugin 'Markdown All In One' * Fix linting 'problems' Jeez that's a picky linter. Stripping out Markdown All In One conventions.
This commit is contained in:
parent
84d213689c
commit
078fc3adc1
23
README.md
23
README.md
@ -16,7 +16,11 @@ Web archiving is the process of collecting portions of the World Wide Web to ens
|
||||
* [Analysis](#analysis)
|
||||
* [Quality Assurance](#quality-assurance)
|
||||
* [Community Resources](#community-resources)
|
||||
|
||||
* [Other Awesome Lists](#other-awesome-lists)
|
||||
* [Blogs and Scholarship](#blogs-and-scholarship)
|
||||
* [Mailing Lists](#mailing-lists)
|
||||
* [Slack](#slack)
|
||||
* [Twitter](#twitter)
|
||||
|
||||
## Training/Documentation
|
||||
|
||||
@ -28,12 +32,9 @@ Web archiving is the process of collecting portions of the World Wide Web to ens
|
||||
* The WARC Standard:
|
||||
* The [warc-specifications](https://iipc.github.io/warc-specifications/) community HTML version of the official specification and hub for new proposals.
|
||||
* The [offical ISO 28500 WARC specification homepage](http://bibnum.bnf.fr/WARC/).
|
||||
* More advanced material:
|
||||
* [Awesome Memento](https://github.com/machawk1/awesome-memento)
|
||||
* For researchers using web archives:
|
||||
* [GLAM Workbench: Web Archives](https://glam-workbench.github.io/web-archives/) - See also [this related blog post on 'Asking questions with web archives'](https://netpreserveblog.wordpress.com/2020/05/28/asking-questions-with-web-archives/).
|
||||
* [Archives Unleashed Toolkit documentation](https://github.com/archivesunleashed/aut-docs)
|
||||
* [Heritrix Walkthrough](https://github.com/web-archive-group/heritrix-walkthrough) *(In Development)*
|
||||
* [The WARC Ecosystem](http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem)
|
||||
* [The Web Crawl section of COPTR](http://coptr.digipres.org/Category:Web_Crawl)
|
||||
* [warcbase workshop](https://github.com/web-archive-group/warcbase_workshop_vagrant)
|
||||
|
||||
|
||||
@ -64,6 +65,7 @@ This list of tools and software is intended to briefly describe some of the most
|
||||
* [freeze-dry](https://github.com/WebMemex/freeze-dry) - JavaScript library to turn page into static, self-contained HTML document; useful for browser extensions. *(In Development)*
|
||||
* [grab-site](https://github.com/ArchiveTeam/grab-site) - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns. *(Stable)*
|
||||
* [Heritrix](https://github.com/internetarchive/heritrix3/wiki) - An open source, extensible, web-scale, archival quality web crawler. *(Stable)*
|
||||
* [Heritrix Walkthrough](https://github.com/web-archive-group/heritrix-walkthrough) *(In Development)*
|
||||
* [html2warc](https://github.com/steffenfritz/html2warc) - A simple script to convert offline data into a single WARC file. *(Stable)*
|
||||
* [HTTrack](http://www.httrack.com/) - An open source website copying utility. *(Stable)*
|
||||
* [Lentil](https://github.com/NCSU-Libraries/lentil) - A Ruby on Rails Engine that supports the harvesting of images from Instagram and provides several browsing views, mechanisms for sharing, tools for users to select their favorite images, an administrative interface for moderating images, and a system for harvesting images and submitting donor agreements in preparation of ingest into external repositories. *(Stable)*
|
||||
@ -78,7 +80,7 @@ This list of tools and software is intended to briefly describe some of the most
|
||||
* [Warcworker](https://github.com/peterk/warcworker) - An open source, dockerized, queued, high fidelity web archiver based on Squidwarc with a simple web GUI. *(Stable)*
|
||||
* [WAIL](https://machawk1.github.io/wail/) - A graphical user interface (GUI) atop multiple web archiving tools intended to be used as an easy way for anyone to preserve and replay web pages; [Python](https://machawk1.github.io/wail/), [Electron](https://github.com/n0tan3rd/wail). *(Stable)*
|
||||
* [Web2Warc](https://github.com/helgeho/Web2Warc) - An easy-to-use and highly customizable crawler that enables anyone to create their own little Web archives (WARC/CDX). *(Stable)*
|
||||
* [WebMemex](https://webmemex.org/) - Browser extension for Firefox and Chrome which lets you archive web pages you visit. *(In Development)*
|
||||
* [WebMemex](https://github.com/WebMemex) - Browser extension for Firefox and Chrome which lets you archive web pages you visit. *(In Development)*
|
||||
* [Webrecorder](https://webrecorder.io/) - Create high-fidelity, interactive recordings of any web site you browse. *(Stable)*
|
||||
* [Wget](http://www.gnu.org/software/wget/) - An open source file retrieval utility that of [version 1.14 supports writing warcs](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output). *(Stable)*
|
||||
* [Wget-lua](https://github.com/alard/wget-lua) - Wget with Lua extension. *(Stable)*
|
||||
@ -161,6 +163,13 @@ This list of tools and software is intended to briefly describe some of the most
|
||||
|
||||
## Community Resources
|
||||
|
||||
### Other Awesome Lists
|
||||
|
||||
* [Web Archiving Community](https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-Community)
|
||||
* [Awesome Memento](https://github.com/machawk1/awesome-memento)
|
||||
* [The WARC Ecosystem](http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem)
|
||||
* [The Web Crawl section of COPTR](http://coptr.digipres.org/Category:Web_Crawl)
|
||||
|
||||
### Blogs and Scholarship
|
||||
|
||||
* [IIPC Blog](https://netpreserveblog.wordpress.com/)
|
||||
|
Loading…
x
Reference in New Issue
Block a user