mirror of
https://github.com/autistic-symposium/tensorflow-for-deep-learning-py.git
synced 2025-05-12 03:34:59 -04:00
68 lines
3.8 KiB
Markdown
68 lines
3.8 KiB
Markdown
# Curated Resources on ETL, Machine Learning, and ML Pipelines
|
|
|
|
```
|
|
The morale of this repository is to cover resources for deploying Machine learning
|
|
in production environments, a task that includes data sourcing, data ingestion, data
|
|
transformation, pre-processing data for use in training, training a model, and hosting
|
|
the model.
|
|
```
|
|
|
|
Three conceptual steps are how most data pipelines are designed and structured:
|
|
|
|
* **Extract**: sensors wait for upstream data sources.
|
|
* **Transform**: business logic is applied (e.g. filtering, grouping, and aggregation to translate raw data into analysis-ready datasets).
|
|
* **Load**: processed data is transported to a final destination.
|
|
|
|
|
|
## Tools & Code Samples
|
|
|
|
* [Data science resources](https://github.com/davidyakobovitch/data_science_resources).
|
|
* [Incubator Airflow data pipelining](https://github.com/apache/incubator-airflow)
|
|
* [Awesome Airflow Resources](https://github.com/jghoman/awesome-apache-airflow).
|
|
* [Airflow in Kubernetes](https://github.com/rolanddb/airflow-on-kubernetes).
|
|
* [Lorte data pipelining](https://github.com/instacart/lore).
|
|
* [Astronomer: Airflow as a Service](https://github.com/astronomer/astronomer).
|
|
* [AWS Data pipeline samples](https://github.com/aws-samples/data-pipeline-samples/tree/master/samples).
|
|
|
|
## MOOCs
|
|
|
|
* [Coursera's Big Data Pipeline course](https://www.coursera.org/lecture/big-data-integration-processing/big-data-processing-pipelines-c4Wyd).
|
|
* [Udemy's Airflow for Beginners](https://www.udemy.com/airflow-basic-for-beginners/).
|
|
|
|
|
|
## Tutorials & Articles
|
|
|
|
#### 2019
|
|
|
|
* [How to Code Neat Machine Learning Pipelines](https://www.neuraxio.com/en/blog/neuraxle/2019/10/26/neat-machine-learning-pipelines.html).
|
|
|
|
|
|
## Enterprise Solutions
|
|
|
|
* [Netflix data pipeline](https://medium.com/netflix-techblog/evolution-of-the-netflix-data-pipeline-da246ca36905).
|
|
* [Netlix data videos](https://www.youtube.com/channel/UC00QATOrSH4K2uOljTnnaKw).
|
|
* [Yelp data pipeline](https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html).
|
|
* [Gusto data pipeline](https://engineering.gusto.com/building-a-data-informed-culture/).
|
|
* [500px data pipeline](https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83.)
|
|
* [Twitter data pipeline](https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html).
|
|
* [Coursera data pipeline](https://medium.com/@zhaojunzhang/building-data-infrastructure-in-coursera-15441ebe18c2).
|
|
* [Cloudfare data pipeline](https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/).
|
|
* [Pandora data pipeline](https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee).
|
|
* [Heroku data pipeline](https://medium.com/@damesavram/running-airflow-on-heroku-ed1d28f8013d).
|
|
* [Zillow data pipeline](https://www.zillow.com/data-science/airflow-at-zillow/).
|
|
* [Airbnb data pipeline](https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-23fd17c91166).
|
|
* [Walmart data pipeline](https://medium.com/walmartlabs/how-we-built-a-data-pipeline-with-lambda-architecture-using-spark-spark-streaming-9d3b4b4555d3).
|
|
* [Robinwood data pipeline](https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8).
|
|
* [Lyft data pipeline](https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff).
|
|
* [Slack data pipeline](https://speakerdeck.com/vananth22/operating-data-pipeline-with-airflow-at-slack).
|
|
* [Remind data pipeline](https://medium.com/@RemindEng/beyond-a-redshift-centric-data-model-1e5c2b542442).
|
|
* [Wish data pipeline](https://medium.com/wish-engineering/scaling-analytics-at-wish-619eacb97d16).
|
|
* [Databrick data pipeline](https://databricks.com/blog/2017/03/31/delivering-personalized-shopping-experience-apache-spark-databricks.html).
|
|
|
|
|
|
## Talks
|
|
|
|
* [Industrial Machine Learning Talk](https://www.youtube.com/watch?v=3JYDT8lap5U).
|
|
|
|
|
|
|