Update README.md

This commit is contained in:
libertysoft3 2020-06-30 03:24:20 -07:00 committed by GitHub
parent 78a9e155d6
commit da0624f40e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -6,26 +6,26 @@ pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and
requires python 3 on linux, OSX, or Windows requires python 3 on linux, OSX, or Windows
sudo apt-get install pip $ sudo apt-get install pip
pip install psaw $ pip install psaw
git clone https://github.com/chid/snudown $ git clone https://github.com/chid/snudown
cd snudown $ cd snudown
sudo python setup.py install $ sudo python setup.py install
cd .. $ cd ..
git clone [this repo] $ git clone [this repo]
cd reddit-html-archiver $ cd reddit-html-archiver
chmod u+x *.py $ chmod u+x *.py
Windows users may need to run Windows users may need to run
chcp 65001 > chcp 65001
set PYTHONIOENCODING=utf-8 > set PYTHONIOENCODING=utf-8
before running `fetch_links.py` or `write_html.py` to resolve encoding errors such as 'codec can't encode character'. before running `fetch_links.py` or `write_html.py` to resolve encoding errors such as 'codec can't encode character'.
### fetch reddit data ### fetch reddit data
data is fetched by subreddit and date range and is stored as csv files in `data`. You may need to explicitly run the script python3 if it is not the default on your system. data is fetched by subreddit and date range and is stored as csv files in `data`. You may need to explicitly run the script with python3 if it is not the default on your system.
$ python3 ./fetch_links.py politics 2017-1-1 2017-2-1 $ python3 ./fetch_links.py politics 2017-1-1 2017-2-1
# or add some link/post filtering to download less data # or add some link/post filtering to download less data