This commit is contained in:
sys-nyx 2024-12-26 17:01:32 -08:00 committed by GitHub
parent b8aace4b7f
commit d74c180cf7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1,15 +1,13 @@
## Red-arch ## Red-arch Overview
## Overview
The goal of this project is to provide a framework for archiving websites and social media - with a particular focus on subreddits - and creating compilations of information in ways that are very easy for non-tech-savy people to consume, copy, and distribute. The goal of this project is to provide a framework for archiving websites and social media - with a particular focus on subreddits - and creating compilations of information in ways that are very easy for non-tech-savy people to consume, copy, and distribute.
[reddit-html-archiver](https://github.com/libertysoft3/reddit-html-archiver) was chosen as the base for this project for a number of reasons: [reddit-html-archiver](https://github.com/libertysoft3/reddit-html-archiver) was chosen as the base for this project for a number of reasons:
- It generates a static website. This is very important due to a static website being the best option for compiling data according to the needs of this project. - It generates a static website. This is very important due to a static website being the best option for compiling data according to the needs of this project.
- Its styled nicely. - Its styled nicely.
- Its written in python which will make integration with other web scrapers or data dumps very simple. - Its written in python which will make integration with other web scrapers or data dumps very simple.
- Takes minimal changes to accept data from popular reddit data dumps like such as pushshift - Takes minimal changes to accept data from popular reddit data dumps such as pushshift
At the moment this project is limited to creating static sites from https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10. the user esponsible for the uplaods provides a repo [here](https://github.com/Watchful1/PushshiftDumps) with some tools for parsing through the files contained in the torrent. This repo provides a modified version of their 'single_file.py' as 'watchful.py' which can be used as to convert the subreddit dumps into json files. Those files can then be added to config.toml and used to create a website. At the moment this project is limited to creating static sites from https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10. the user responsible for those uploads provides a repo [here](https://github.com/Watchful1/PushshiftDumps) with some tools for parsing through the files contained in the torrent. This repo provides a modified version of their 'single_file.py' as 'watchful.py' (named after its creator) which can be used as to convert the subreddit dumps into json files. Those files can then be added to config.toml and used to create a website using reddit-html-archiver.
### install ### install