Commit graph

45 commits

Author SHA1 Message Date
Watchful1
ec977a76b2 Didn't mean to commit that 2024-03-22 19:25:53 -07:00
Watchful1
ef186b7bd7 Update merge to not overwrite scores 2024-03-07 20:41:04 -08:00
Watchful1
fe8fef722f Fix csv script for comments without a permalink 2024-02-19 21:20:37 -08:00
Watchful1
ef12dc5694 Don't crash when the dict is empty 2024-02-09 21:51:58 -08:00
Watchful1
09334829f6 Bit of cleanup for to_csv 2024-02-04 20:32:03 -08:00
Watchful1
7727934db0 Initial implementation of multiprocess recompress script 2024-01-25 22:01:17 -08:00
Watchful1
6114afb53f Fix last month of year 2024-01-07 10:45:51 -08:00
Watchful1
82966bf7f6 Add partial matching support to multiprocess script 2023-12-23 17:36:08 -08:00
Watchful1
4e140e218b Add recompress file test script 2023-11-12 16:18:04 -08:00
Watchful1
f35762e203 Add split blocks by minute 2023-11-02 21:04:03 -07:00
Watchful1
8a0256285f Update comment 2023-08-22 22:13:37 -07:00
Watchful1
d7beff9a08 Support multiple files in filter_file 2023-08-22 22:12:30 -07:00
Watchful1
0827eee152 Evidently this is a string sometimes 2023-08-22 19:43:38 -07:00
Watchful1
78c1814a60 Support empty filter 2023-08-21 21:45:58 -07:00
Watchful1
f7146593a0 Log on bad lines too 2023-08-09 19:42:56 -07:00
Watchful1
4110374fe8 Add more logging to filter file 2023-08-09 19:38:16 -07:00
Watchful1
4a50ca6605 Add overlapping users finder 2023-05-25 18:28:37 -07:00
Watchful1
897332b1d7 Update filter file with id output and filter by input list 2023-05-25 18:13:05 -07:00
Watchful1
e103298be3 Update iterate folder with the new decode method 2023-03-16 18:26:40 -07:00
Watchful1
f7286b7572 Full redesign of multiprocess 2023-03-08 17:48:59 -08:00
Watchful1
31ad7179dc Update the parse here, switch to counting instead of adding scores 2023-03-08 17:48:48 -08:00
Watchful1
1f7a3137f4 Update multiprocess to handle large numbers of output files 2023-03-06 20:37:15 -08:00
Watchful1
8dcc65abf7 Initial work on filter file 2023-03-02 18:41:50 -08:00
Watchful1
8282a5e765 Count matched lines 2023-01-30 17:53:02 -08:00
Watchful1
4e8d6c9b6b Fix to_csv chunk sizing 2023-01-30 17:05:03 -08:00
Watchful1
33b5b938c1 Change to FileHandle 2023-01-28 11:39:27 -08:00
Watchful1
87d2b22a73 Change the pool chunksize to 1 to reduce parallelization 2023-01-24 20:52:53 -08:00
Watchful1
2358bf555b Add value_list argument to take a large list of values to filter on 2023-01-24 20:51:10 -08:00
Watchful1
3415c7880e Remove filter here 2023-01-24 09:40:41 -08:00
Watchful1
cae4434c33 Bit more cleanup for combine and add count 2023-01-20 11:33:08 -08:00
Watchful1
894961c3ee Save the arguments in the status json so we don't accidentally reuse the same files for a different run 2023-01-17 22:37:25 -08:00
Watchful1
edf82d3d90 Change default encoding 2023-01-16 09:58:48 -08:00
Watchful1
1a3789c298 Work on multiprocess, change up argument format, handle comments and submissions at the same time, split the output 2023-01-12 16:46:58 -08:00
Watchful1
c4d652d0cf Update frame sizes 2022-07-17 15:56:06 -07:00
Watchful1
3fa63048e3 Merge remote-tracking branch 'origin/master' 2022-07-15 23:39:45 -07:00
Watchful1
1a99630073 Some cleanup, optimize multiprocess 2022-07-15 23:39:37 -07:00
Peter Eckersley
ff8d844d43 Make matches case insensitive by default
Since things like subreddits are unhelpfully cased.

--case-sensitive turns the default behaviour back on.
2022-04-26 13:50:00 -07:00
Watchful1
461028b401 Add csv script 2022-02-14 16:04:27 -08:00
Watchful1
c08f5f212f Add word counting script 2021-12-10 21:08:22 -08:00
Watchful1
e4e8ad480c Add mongo libraries 2021-11-20 19:05:10 -08:00
Watchful1
50be918a1c Add support for multiple values 2021-10-14 19:33:25 -07:00
Watchful1
4501ec236f More fixes 2021-09-10 22:37:55 -07:00
Watchful1
021d033732 Fix comments 2021-09-10 19:20:50 -07:00
Watchful1
dd12687141 Clean up 2021-09-09 22:24:14 -07:00
Watchful1
bd7378ff91 Initial commit 2021-09-04 23:17:53 -07:00