Updates to PR#24 by @kant as recommended by @ruebot (#27)

* Minor fixes

* Changes per @ruebot in PR#14
This commit is contained in:
Mat Kelly 2017-07-08 09:54:17 -04:00 committed by Nick Ruest
parent c5b04a33e8
commit 31389f46b9

View File

@ -18,7 +18,7 @@ Please ensure your pull request adheres to the following guidelines:
* Use the following format:
* `[Name](link)` (Status: **Stable** or **In Development**) - Brief Description of what the module does
* Make an individual pull request for each new item.
* Link additions should be inserted alphabetically to the relavant category.
* Link additions should be inserted alphabetically to the relevant category.
* New categories or improvements to the existing categorization are welcome.
* Check your spelling and grammar.
* The pull request and commit should have a useful title.
@ -57,7 +57,7 @@ This list of tools and software is intended to briefly describe some of the most
#### Acquisition
* [ArchiveFacebook](https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/) (Stable) - A [Mozilla Firefox](http://www.mozilla.org/en-US/firefox/new/) add-on for individuals to archive their Facebook accounts.
* [ArchiveFacebook](https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/) (Stable) - A [Mozilla Firefox](https://www.mozilla.org/en-US/firefox/new/) add-on for individuals to archive their Facebook accounts.
* [Brozzler](https://github.com/internetarchive/brozzler) (Stable) - A distributed web crawler (爬虫) that uses a real browser (chrome or chromium) to fetch pages and embedded urls and to extract links.
@ -67,7 +67,7 @@ This list of tools and software is intended to briefly describe some of the most
* [Heritrix](https://webarchive.jira.com/wiki/display/Heritrix/Heritrix) (Stable) - An open source, extensible, web-scale, archival quality web crawler.
* [html2warc](https://github.com/steffenfritz/html2warc) (Stable) - A simple script to covert offline data into a single WARC file.
* [html2warc](https://github.com/steffenfritz/html2warc) (Stable) - A simple script to convert offline data into a single WARC file.
* [HTTrack](http://www.httrack.com/) (Stable) - An open source website copying utility.
@ -115,9 +115,9 @@ This list of tools and software is intended to briefly describe some of the most
* [har2warc](https://github.com/webrecorder/har2warc) - Convert HTTP Archive (HAR) -> Web Archive (WARC) format. (Python)
* [Jwat](https://sbforge.org/display/JWAT/JWAT) (Stable) - Libraries and tools for reading/writting/validating WARC/ARC/GZIP files. (Java)
* [Jwat](https://sbforge.org/display/JWAT/JWAT) (Stable) - Libraries and tools for reading/writing/validating WARC/ARC/GZIP files. (Java)
* [node-warc](https://github.com/N0taN3rd/node-warc) - Parse Web ARChive (WARC) files with node.js.
* [node-warc](https://github.com/N0taN3rd/node-warc) - Parse Web ARChive (WARC) files with `node.js`.
* [The Archive Browser](https://archivebrowser.c3.cx/) - The Archive Browser is a program that lets you browse the contents of archives, as well as extract them. It will let you open files from inside archives, and lets you preview them using Quick Look. WARC is supported. (OSX only, Proprietary app)
@ -141,7 +141,7 @@ This list of tools and software is intended to briefly describe some of the most
* [ArchiveSpark](https://github.com/helgeho/ArchiveSpark) (Stable) - An Apache Spark framework (not only) for Web Archives that enables easy data processing, extraction as well as derivation.
* [warcbase](http://warcbase.org/) (Stable) - Warcbase is an open-source platform for managing analyzing web archives.
* [warcbase](http://warcbase.org/) (Stable) - Warcbase is an open-source platform for managing & analyzing web archives.
### Community Resources