mirror of
https://github.com/Watchful1/PushshiftDumps.git
synced 2025-12-18 09:42:13 -05:00
585 B
585 B
This repo contains example python scripts for processing the reddit dump files created by pushshift. The files can be torrented from here.
single_file.pydecompresses and iterates over a single zst compressed fileiterate_folder.pydoes the same, but for all files in a foldercombine_folder_multiprocess.pyuses separate processes to iterate over multiple files in parallel, writing lines that match the criteria passed in to text files, then combining them into a final zst compressed file