From c370e303dccabde722eb860168891a8b007aae71 Mon Sep 17 00:00:00 2001 From: Steffen Date: Sun, 18 Jun 2017 16:26:35 +0200 Subject: [PATCH] added html2warc (#16) added html2warc, a simple script to convert offline data into a single warc file --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 80023a1..7991aec 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,8 @@ To the extent possible under law, the owner has waived all copyright and related * [Heritrix](https://webarchive.jira.com/wiki/display/Heritrix/Heritrix) (Stable) - An open source, extensible, web-scale, archival quality web crawler. +* [html2warc](https://github.com/steffenfritz/html2warc) (Stable) - A simple script to covert offline data into a single warc file + * [grab-site](https://github.com/ludios/grab-site) (Stable) - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns. * [HTTrack](http://www.httrack.com/) (Stable) - An open source website copying utility.