mirror of
https://github.com/Watchful1/PushshiftDumps.git
synced 2025-07-01 09:56:44 -04:00
Example scripts for the pushshift dump files
personal | ||
scripts | ||
.gitignore | ||
CITATION.cff | ||
LICENSE.md | ||
Pipfile | ||
Pipfile.lock | ||
README.md |
This repo contains example python scripts for processing the reddit dump files created by pushshift. The files can be downloaded from here or torrented from here.
single_file.py
decompresses and iterates over a single zst compressed fileiterate_folder.py
does the same, but for all files in a foldercombine_folder_multiprocess.py
uses separate processes to iterate over multiple files in parallel, writing lines that match the criteria passed in to text files, then combining them into a final zst compressed file