added html2warc (#16)

added html2warc, a simple script to convert offline data into a single warc file
This commit is contained in:
Steffen 2017-06-18 16:26:35 +02:00 committed by Nick Ruest
parent 4e413e2342
commit c370e303dc

View File

@ -54,6 +54,8 @@ To the extent possible under law, the owner has waived all copyright and related
* [Heritrix](https://webarchive.jira.com/wiki/display/Heritrix/Heritrix) (Stable) - An open source, extensible, web-scale, archival quality web crawler.
* [html2warc](https://github.com/steffenfritz/html2warc) (Stable) - A simple script to covert offline data into a single warc file
* [grab-site](https://github.com/ludios/grab-site) (Stable) - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns.
* [HTTrack](http://www.httrack.com/) (Stable) - An open source website copying utility.