Merge branch 'master' of github.com:sys-nyx/red-arch

This commit is contained in:
sys-nyx 2024-12-26 17:03:57 -08:00
commit f28098302b

View file

@ -1,3 +1,62 @@
## Red-arch Overview
The goal of this project is to provide a framework for archiving websites and social media - with a particular focus on subreddits - and creating compilations of information in ways that are very easy for non-tech-savy people to consume, copy, and distribute.
[reddit-html-archiver](https://github.com/libertysoft3/reddit-html-archiver) was chosen as the base for this project for a number of reasons:
- It generates a static website. This is very important due to a static website being the best option for compiling data according to the needs of this project.
- Its styled nicely.
- Its written in python which will make integration with other web scrapers or data dumps very simple.
- Takes minimal changes to accept data from popular reddit data dumps such as pushshift
At the moment this project is limited to creating static sites from https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10. the user responsible for those uploads provides a repo [here](https://github.com/Watchful1/PushshiftDumps) with some tools for parsing through the files contained in the torrent. This repo provides a modified version of their 'single_file.py' as 'watchful.py' (named after its creator) which can be used as to convert the subreddit dumps into json files. Those files can then be added to config.toml and used to create a website using reddit-html-archiver.
### install
```
git clone https://github.com/sys-nyx/red-arch
cd red-arch/
# Init a virtual environment first if you prefer
pip install requirements.txt
```
### Usage
```
cd red-arch/
python3 watchful.py subname1_submissions.zst subname1_submissions.json
python3 watchful.py subname1_comments.zst subname1_comments.json
python3 watchful.py subname2_submissions.zst subname2_submissions.json
python3 watchful.py subname2_comments.zst subname2_comments.json
```
```
nano config.toml
```
```
[subname1]
comments= subname1_comments.json
posts= subname1_submissions.json
[subname2]
comments= subname2_comments.json
posts= subname2_submissions.json
```
```
dumps.py config.toml
```
The resulting website will be located within the 'r/' directory and can be viewed by placing it in the webroot of any http server OR by opening index.html in your browser.
The maintainers of this repo are NOT responsible for any problems with your system or data loss that might occur from using anything contained within this repo to modify your local files. Please make copies of our data before you begin modifying it.
## Below is the readme from the original repository. [reddit-html-archiver](https://github.com/libertysoft3/reddit-html-archiver)
Please note that it is ONLY included here for archival purposes and does not necessarily reflect the goals/intentions/usage of red-arch.
python3
## reddit html archiver ## reddit html archiver
pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and renders offline compatible html pages. uses the reddit markdown renderer. pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and renders offline compatible html pages. uses the reddit markdown renderer.