mirror of
https://github.com/sys-nyx/red-arch.git
synced 2025-05-06 08:45:31 -04:00
Merge branch 'master' of github.com:sys-nyx/red-arch
This commit is contained in:
commit
83e0580de9
1 changed files with 13 additions and 20 deletions
33
README.md
33
README.md
|
@ -7,7 +7,7 @@ The goal of this project is to provide a framework for archiving websites and so
|
||||||
- Its written in python which will make integration with other web scrapers or data dumps very simple.
|
- Its written in python which will make integration with other web scrapers or data dumps very simple.
|
||||||
- Takes minimal changes to accept data from popular reddit data dumps such as pushshift
|
- Takes minimal changes to accept data from popular reddit data dumps such as pushshift
|
||||||
|
|
||||||
At the moment this project is limited to creating static sites from https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10. the user responsible for those uploads provides a repo [here](https://github.com/Watchful1/PushshiftDumps) with some tools for parsing through the files contained in the torrent. This repo provides a modified version of their 'single_file.py' as 'watchful.py' (named after its creator) which can be used as to convert the subreddit dumps into json files. Those files can then be added to config.toml and used to create a website using reddit-html-archiver.
|
At the moment this project is limited to creating static sites from https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10. the user responsible for those uploads provides a repo [here](https://github.com/Watchful1/PushshiftDumps) with some tools for parsing through the files contained in the torrent. This repo (red-arch) provides a modified version of their 'single_file.py' as 'watchful.py' (named after its creator) which can be used as to convert the subreddit dumps into valid python dictionaries and then used to create a website using reddit-html-archiver.
|
||||||
|
|
||||||
### install
|
### install
|
||||||
|
|
||||||
|
@ -20,16 +20,6 @@ pip install requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
### Usage
|
### Usage
|
||||||
create json files for multiple subreddits
|
|
||||||
```
|
|
||||||
cd red-arch/
|
|
||||||
python3 watchful.py subname1_submissions.zst subname1_submissions.json
|
|
||||||
python3 watchful.py subname1_comments.zst subname1_comments.json
|
|
||||||
|
|
||||||
python3 watchful.py subname2_submissions.zst subname2_submissions.json
|
|
||||||
python3 watchful.py subname2_comments.zst subname2_comments.json
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
```
|
||||||
nano config.toml
|
nano config.toml
|
||||||
|
@ -39,17 +29,17 @@ add multiple entries to config (or just one)
|
||||||
|
|
||||||
```
|
```
|
||||||
[subname1]
|
[subname1]
|
||||||
comments= subname1_comments.json
|
comments= subname1_comments.zst
|
||||||
posts= subname1_submissions.json
|
posts= subname1_submissions.zst
|
||||||
|
|
||||||
[subname2]
|
[subname2]
|
||||||
comments= subname2_comments.json
|
comments= subname2_comments.zst
|
||||||
posts= subname2_submissions.json
|
posts= subname2_submissions.zst
|
||||||
```
|
```
|
||||||
|
|
||||||
build the site
|
Build the site.
|
||||||
```
|
```
|
||||||
dumps.py config.toml
|
redarch.py config.toml
|
||||||
```
|
```
|
||||||
|
|
||||||
The resulting website will be located within the 'r/' directory and can be viewed by placing it in the webroot of any http server OR by opening index.html in your browser.
|
The resulting website will be located within the 'r/' directory and can be viewed by placing it in the webroot of any http server OR by opening index.html in your browser.
|
||||||
|
@ -58,10 +48,12 @@ The maintainers of this repo are NOT responsible for any problems with your syst
|
||||||
|
|
||||||
|
|
||||||
## TODO
|
## TODO
|
||||||
- Create a unified script for building from zst files from push shift dumps
|
|
||||||
- Create a more modular API for parsing data from a variety of sources
|
|
||||||
- Incorporate a local, static site search such as [lunrjs](https://github.com/olivernn/lunr.js)
|
- Incorporate a local, static site search such as [lunrjs](https://github.com/olivernn/lunr.js)
|
||||||
|
- Create a more modular API for parsing data from a variety of sources
|
||||||
- Create a web scraper with a more robust feature set
|
- Create a web scraper with a more robust feature set
|
||||||
|
- Refactor code and improve buildtime
|
||||||
|
- Reduce final build size
|
||||||
|
- Incorporate a real templating engine such as Jinja
|
||||||
|
|
||||||
## Contribute
|
## Contribute
|
||||||
if you would like to contribute just let me know!
|
if you would like to contribute just let me know!
|
||||||
|
@ -69,7 +61,7 @@ if you would like to contribute just let me know!
|
||||||
## Below is the readme from the original repository. [reddit-html-archiver](https://github.com/libertysoft3/reddit-html-archiver)
|
## Below is the readme from the original repository. [reddit-html-archiver](https://github.com/libertysoft3/reddit-html-archiver)
|
||||||
Please note that it is ONLY included here for archival purposes and does not necessarily reflect the goals/intentions/usageopinons/etc of red-arch.
|
Please note that it is ONLY included here for archival purposes and does not necessarily reflect the goals/intentions/usageopinons/etc of red-arch.
|
||||||
|
|
||||||
|
```
|
||||||
## reddit html archiver
|
## reddit html archiver
|
||||||
|
|
||||||
pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and renders offline compatible html pages. uses the reddit markdown renderer.
|
pulls reddit data from the [pushshift](https://github.com/pushshift/api) api and renders offline compatible html pages. uses the reddit markdown renderer.
|
||||||
|
@ -168,3 +160,4 @@ copy the contents of the `r` directory to a web root or appropriately served git
|
||||||
|
|
||||||

|

|
||||||

|

|
||||||
|
```
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue