mirror of
https://github.com/binhnguyennus/awesome-scalability.git
synced 2024-10-01 01:06:14 -04:00
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
architectureawesomeawesome-listbackendbig-datacomputer-sciencedesign-patternsdevopsdistributed-systemsinterviewinterview-practiceinterview-questionslistsmachine-learningprogrammingresourcesscalabilitysystemsystem-designweb-development
CONTRIBUTING.md | ||
README.md |
Awesome Scalability, Availability, and Stability Back-end Design Patterns
A curated list of selected readings to illustrate Scalability, Availability, and Stability Design Patterns in Back-end Development.
What if your backend went slow?
Understand your problems: performance problem (slow for a single user) or scalability problem (fast for a single user but slow under heavy load) by reviewing some basic design principles.
What if your backend went down?
"Even if you lose all one day, you can build all over again if you retain your calm!" - Thuan Pham, CTO at Uber Technologies Inc.
Contributing
Please take a look at the contribution guidelines first. Contributions are always welcome!
Contents
Principles
- CAP Theorem and Trade-offs
- Scaling Up and Scaling Out
- ACID and BASE
- Synchronous and Asynchronous
- SQL and NoSQL
- Understand why Cache is King!
- Understand Latency
- Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
- 20 Common Bottlenecks
- Advantages and Drawbacks of Microservices
- Avoid Overengineering
- Don't Repeat Yourself (DRY)
- Design for Loose-coupling
- Design for Resiliency
- Design for Self-healing when failures occur
- Design for Scale out
- Best Practices for Scaling Out
- Design for Evolution
Scalability
-
- Understanding When to use RabbitMQ or Apache Kafka
- Running Kafka at scale at Linkedin
- Delaying Asynchronous Message Processing with RabbitMQ at Indeed
- Real-time Data Pipeline with Kafka at Yelp
- Audit Kafka End-to-End at Uber (count each message exactly once, audit a message across tiers)
- Deduplication Techniques
-
- Why SQL is beating NoSQL, and what this means for the future of data
- Sharding MySQL at Pinterest
- How Airbnb Partitioned Main MySQL Database in Two Weeks
- Replication is the Key for Scalability & High Availability
- How Twitch uses PostgreSQL
- Scaling MySQL-based financial reporting system at Airbnb
- Scaling to 100M at Wix: MySQL is a Better NoSQL
- Why Uber Engineering Switched from Postgres to MySQL
- Handling Growth with Postgres at Instagram
-
- Scalable Deep Learning Platform On Spark In Baidu
- Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow
- Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp
- TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo
- CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo
Availability
- Fail-over
- Replication
- NodeJS High Availability at Yahoo
- Every Day Is Monday in Operations - LinkedIn (11 part series)
Stability
- Circuit Breaker
- Always use timeouts (if possible)
- Let it crash/Supervisors: Embrace failure as a natural state in the life-cycle of the application
- Crash early: An error now is better than a response tomorrow
- Bulkheads: Partition and tolerate failure in one part
- Steady state: Always put logs on separate disk
- Throttling: Maintain a steady pace
- Multi-clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn
Others
- Distributed Git server at Palantir
- Configuration management for distributed systems (using GitHub and cfg4j) at Flickr
- Seagull: Distributed system that helps running > 20 million tests per day at Yelp
- Cloud Bouncer: Distributed Rate Limiting at Yahoo
- Scalable gaming patterns on AWS (Sep 2017)
- Building a modern bank backend at Monzo
- Selecting a cloud provider at Etsy
- Architecture of Tripod (Flickr’s Backend)
- How eBay's Shopping Cart used compression techniques to solve network I/O bottlenecks
- Optimizing web servers for high throughput and low latency at Dropbox
Books
- The Art of Scalability
- Designing Data-Intensive Applications
- Web Scalability for Startup Engineers
- Scalability Rules: 50 Principles for Scaling Web Sites
Talks
- Harvard CS75 - Lecture 9: Scalability
- How We've Scaled Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox
- Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook
- Scaling Instagram Infrastructure - Lisa Guo, Instagram Engineering
- Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer
- Designing for Failure: Scaling Uber's Backend by Breaking Everything - Matt Ranney, Chief Systems Architect at Uber
- Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix
Special Thanks
- Jonas Bonér, CTO at Lightbend, for the original inspiration