mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-04-20 23:56:34 -04:00
some thoughts on distributed database
This commit is contained in:
parent
ce154fc3db
commit
3d70776ce3
@ -50,4 +50,10 @@ brozzler-worker
|
||||
- reads urls from brozzler.{site-id}.crawl_urls
|
||||
- after each(?) (every n?) urls, feeds brozzler.{site_id}.completed_urls
|
||||
|
||||
|
||||
=== considering distributed database ===
|
||||
preferred database requirements:
|
||||
- secondary index (so we can look up by url or priority)
|
||||
- good performance on updates since we will be doing many updates
|
||||
- good performance of secondary index on updates that change the value of secondarily indexed field
|
||||
- ideally strong consistency, for multiple instances of brozzler-hq will
|
||||
- redundancy, fault tolerance
|
||||
|
Loading…
x
Reference in New Issue
Block a user