mirror of
https://github.com/sys-nyx/red-arch.git
synced 2025-05-06 08:45:31 -04:00
Update README.md
This commit is contained in:
parent
da0624f40e
commit
8522fdad96
1 changed files with 21 additions and 13 deletions
34
README.md
34
README.md
|
@ -4,7 +4,7 @@ pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and
|
||||||
|
|
||||||
### install
|
### install
|
||||||
|
|
||||||
requires python 3 on linux, OSX, or Windows
|
requires python 3 on linux, OSX, or Windows. warning: if `$ python --version` outputs a python 2 version, then replace all occurances of `python` with `python3` in the commands below.
|
||||||
|
|
||||||
$ sudo apt-get install pip
|
$ sudo apt-get install pip
|
||||||
$ pip install psaw
|
$ pip install psaw
|
||||||
|
@ -25,25 +25,33 @@ before running `fetch_links.py` or `write_html.py` to resolve encoding errors su
|
||||||
|
|
||||||
### fetch reddit data
|
### fetch reddit data
|
||||||
|
|
||||||
data is fetched by subreddit and date range and is stored as csv files in `data`. You may need to explicitly run the script with python3 if it is not the default on your system.
|
fetch data by subreddit and date range, writing to csv files in `data`:
|
||||||
|
|
||||||
$ python3 ./fetch_links.py politics 2017-1-1 2017-2-1
|
$ python ./fetch_links.py politics 2017-1-1 2017-2-1
|
||||||
# or add some link/post filtering to download less data
|
|
||||||
$ ./fetch_links.py --self_only --score "> 2000" politics 2015-1-1 2016-1-1
|
or you can filter links/posts to download less data:
|
||||||
# show available filters
|
|
||||||
$ ./fetch_links.py -h
|
$ python ./fetch_links.py --self_only --score "> 2000" politics 2015-1-1 2016-1-1
|
||||||
|
|
||||||
|
to show all available options and filters run:
|
||||||
|
|
||||||
|
$ python ./fetch_links.py -h
|
||||||
|
|
||||||
decrease your date range or adjust `pushshift_rate_limit_per_minute` in `fetch_links.py` if you are getting connection errors.
|
decrease your date range or adjust `pushshift_rate_limit_per_minute` in `fetch_links.py` if you are getting connection errors.
|
||||||
|
|
||||||
### write web pages
|
### write web pages
|
||||||
|
|
||||||
write html files for all subreddits to `r`.
|
write html files for all subreddits to `r`:
|
||||||
|
|
||||||
$ ./write_html.py
|
$ python ./write_html.py
|
||||||
# or add some output filtering for less fluff or a smaller archive size
|
|
||||||
$ ./write_html.py --min-score 100 --min-comments 100 --hide-deleted-comments
|
you can add some output filtering to have less empty postssmaller archive size
|
||||||
# show available filters
|
|
||||||
$ ./write_html.py -h
|
$ python ./write_html.py --min-score 100 --min-comments 100 --hide-deleted-comments
|
||||||
|
|
||||||
|
to show all available filters run:
|
||||||
|
|
||||||
|
$ python ./write_html.py -h
|
||||||
|
|
||||||
your html archive has been written to `r`. once you are satisfied with your archive feel free to copy/move the contents of `r` to elsewhere and to delete the git repos you have created. everything in `r` is fully self contained.
|
your html archive has been written to `r`. once you are satisfied with your archive feel free to copy/move the contents of `r` to elsewhere and to delete the git repos you have created. everything in `r` is fully self contained.
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue