mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-08-07 05:52:27 -04:00
some thoughts on distributed database
This commit is contained in:
parent
ce154fc3db
commit
3d70776ce3
1 changed files with 7 additions and 1 deletions
|
@ -50,4 +50,10 @@ brozzler-worker
|
||||||
- reads urls from brozzler.{site-id}.crawl_urls
|
- reads urls from brozzler.{site-id}.crawl_urls
|
||||||
- after each(?) (every n?) urls, feeds brozzler.{site_id}.completed_urls
|
- after each(?) (every n?) urls, feeds brozzler.{site_id}.completed_urls
|
||||||
|
|
||||||
|
=== considering distributed database ===
|
||||||
|
preferred database requirements:
|
||||||
|
- secondary index (so we can look up by url or priority)
|
||||||
|
- good performance on updates since we will be doing many updates
|
||||||
|
- good performance of secondary index on updates that change the value of secondarily indexed field
|
||||||
|
- ideally strong consistency, for multiple instances of brozzler-hq will
|
||||||
|
- redundancy, fault tolerance
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue