Update README.md

This commit is contained in:
libertysoft3 2020-06-30 03:24:20 -07:00 committed by GitHub
parent 78a9e155d6
commit da0624f40e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -6,26 +6,26 @@ pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and
requires python 3 on linux, OSX, or Windows
sudo apt-get install pip
pip install psaw
git clone https://github.com/chid/snudown
cd snudown
sudo python setup.py install
cd ..
git clone [this repo]
cd reddit-html-archiver
chmod u+x *.py
$ sudo apt-get install pip
$ pip install psaw
$ git clone https://github.com/chid/snudown
$ cd snudown
$ sudo python setup.py install
$ cd ..
$ git clone [this repo]
$ cd reddit-html-archiver
$ chmod u+x *.py
Windows users may need to run
chcp 65001
set PYTHONIOENCODING=utf-8
> chcp 65001
> set PYTHONIOENCODING=utf-8
before running `fetch_links.py` or `write_html.py` to resolve encoding errors such as 'codec can't encode character'.
### fetch reddit data
data is fetched by subreddit and date range and is stored as csv files in `data`. You may need to explicitly run the script python3 if it is not the default on your system.
data is fetched by subreddit and date range and is stored as csv files in `data`. You may need to explicitly run the script with python3 if it is not the default on your system.
$ python3 ./fetch_links.py politics 2017-1-1 2017-2-1
# or add some link/post filtering to download less data