diff --git a/README.md b/README.md index 92dce99..14ff729 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and ### install -requires python 3 on linux, OSX, or Windows +requires python 3 on linux, OSX, or Windows. warning: if `$ python --version` outputs a python 2 version, then replace all occurances of `python` with `python3` in the commands below. $ sudo apt-get install pip $ pip install psaw @@ -25,25 +25,33 @@ before running `fetch_links.py` or `write_html.py` to resolve encoding errors su ### fetch reddit data -data is fetched by subreddit and date range and is stored as csv files in `data`. You may need to explicitly run the script with python3 if it is not the default on your system. +fetch data by subreddit and date range, writing to csv files in `data`: - $ python3 ./fetch_links.py politics 2017-1-1 2017-2-1 - # or add some link/post filtering to download less data - $ ./fetch_links.py --self_only --score "> 2000" politics 2015-1-1 2016-1-1 - # show available filters - $ ./fetch_links.py -h + $ python ./fetch_links.py politics 2017-1-1 2017-2-1 + +or you can filter links/posts to download less data: + + $ python ./fetch_links.py --self_only --score "> 2000" politics 2015-1-1 2016-1-1 + +to show all available options and filters run: + + $ python ./fetch_links.py -h decrease your date range or adjust `pushshift_rate_limit_per_minute` in `fetch_links.py` if you are getting connection errors. ### write web pages -write html files for all subreddits to `r`. +write html files for all subreddits to `r`: - $ ./write_html.py - # or add some output filtering for less fluff or a smaller archive size - $ ./write_html.py --min-score 100 --min-comments 100 --hide-deleted-comments - # show available filters - $ ./write_html.py -h + $ python ./write_html.py + +you can add some output filtering to have less empty postssmaller archive size + + $ python ./write_html.py --min-score 100 --min-comments 100 --hide-deleted-comments + +to show all available filters run: + + $ python ./write_html.py -h your html archive has been written to `r`. once you are satisfied with your archive feel free to copy/move the contents of `r` to elsewhere and to delete the git repos you have created. everything in `r` is fully self contained.