From 82ac3141038b442e8ed388bc26eb8fb88aed3d57 Mon Sep 17 00:00:00 2001 From: Julien Nioche Date: Tue, 25 Jun 2019 15:30:45 +0100 Subject: [PATCH] added StormCrawler (#66) --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 8683180..b8487fe 100644 --- a/README.md +++ b/README.md @@ -104,6 +104,8 @@ This list of tools and software is intended to briefly describe some of the most * [Squidwarc](https://github.com/N0taN3rd/Squidwarc) (In Development) - An [open source, high-fidelity, page interacting](http://ws-dl.blogspot.com/2017/07/2017-07-24-replacing-heritrix-with.html) archival crawler that uses Chrome or Chrome Headless directly. +* [StormCrawler](http://stormcrawler.net/) (Stable) - A collection of resources for building low-latency, scalable web crawlers on Apache Storm. + * [twarc](https://github.com/docnow/twarc) (Stable) - A command line tool and Python library for archiving Twitter JSON data. * [WARCreate](http://matkelly.com/warcreate/) (Stable) - A [Google Chrome](https://www.google.com/intl/en/chrome/browser/) extension for archiving an individual webpage or website to a WARC file.