diff --git a/README.md b/README.md index 3e44651..ab3b581 100644 --- a/README.md +++ b/README.md @@ -1,39 +1,57 @@ -# Resources for Machine Learning & Deep Learning +# Curated Resources on ETL, Machine Learning, and ML Pipelines + +``` +The morale of this repository is the fact that Machine learning involves tasks that include data sourcing, data ingestion, data transformation, pre-processing data for use in training, training a model and hosting the model. +``` + +Three conceptual steps are how most data pipelines are designed and structured: + +* **Extract**: sensors wait for upstream data sources. +* **Transform**: business logic is applied (e.g. filtering, grouping, and aggregation to translate raw data into analysis-ready datasets). +* **Load**: processed data is transported to a final destination. -## In this Repository +--- -* [Tensorflow_examples](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/TensorFlow): examples in TF. -* [Caffe_examples](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Caffe): examples in Caffe. -* [DeepArt](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Numpy): deep learning generated art. -* [ML Notebooks](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Notebooks): jupyter notebooks with ML examples. -* [Numpy examples](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Numpy): some snippetes in Numpy. +## Learning References + +### Courses and Lists + +* [Data science resources](https://github.com/davidyakobovitch/data_science_resources). +* [Lorte data pipelining](https://github.com/instacart/lore). +* [Incubator Airflow data pipelining](https://github.com/apache/incubator-airflow) +* [Udemy's Airflow for Beginners](https://www.udemy.com/airflow-basic-for-beginners/). +* [Awesome Airflow Resources](https://github.com/jghoman/awesome-apache-airflow). +* [Airflow in Kubernetes](https://github.com/rolanddb/airflow-on-kubernetes). +* [Astronomer: Airflow as a Service](https://github.com/astronomer/astronomer). +* [Data pipeline samples](https://github.com/aws-samples/data-pipeline-samples/tree/master/samples). +* [Awesome Scalability: a lot of articles and resources on the subject](https://github.com/binhnguyennus/awesome-scalability). +* [Coursera's Big Data Pipeline course](https://www.coursera.org/lecture/big-data-integration-processing/big-data-processing-pipelines-c4Wyd). +* [Industrial Machine Learning Talk](https://www.youtube.com/watch?v=3JYDT8lap5U). + +#### Enterprise Solutions + +* [Netflix data pipeline](https://medium.com/netflix-techblog/evolution-of-the-netflix-data-pipeline-da246ca36905). +* [Netlix data videos](https://www.youtube.com/channel/UC00QATOrSH4K2uOljTnnaKw). +* [Yelp data pipeline](https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html). +* [Gusto data pipeline](https://engineering.gusto.com/building-a-data-informed-culture/). +* [500px data pipeline](https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83.) +* [Twitter data pipeline](https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html). +* [Coursera data pipeline](https://medium.com/@zhaojunzhang/building-data-infrastructure-in-coursera-15441ebe18c2). +* [Cloudfare data pipeline](https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/). +* [Pandora data pipeline](https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee). +* [Heroku data pipeline](https://medium.com/@damesavram/running-airflow-on-heroku-ed1d28f8013d). +* [Zillow data pipeline](https://www.zillow.com/data-science/airflow-at-zillow/). +* [Airbnb data pipeline](https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-23fd17c91166). +* [Walmart data pipeline](https://medium.com/walmartlabs/how-we-built-a-data-pipeline-with-lambda-architecture-using-spark-spark-streaming-9d3b4b4555d3). +* [Robinwood data pipeline](https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8). +* [Lyft data pipeline](https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff). +* [Slack data pipeline](https://speakerdeck.com/vananth22/operating-data-pipeline-with-airflow-at-slack). +* [Remind data pipeline](https://medium.com/@RemindEng/beyond-a-redshift-centric-data-model-1e5c2b542442). +* [Wish data pipeline](https://medium.com/wish-engineering/scaling-analytics-at-wish-619eacb97d16). +* [Databrick data pipeline](https://databricks.com/blog/2017/03/31/delivering-personalized-shopping-experience-apache-spark-databricks.html). -## Learning -### Introductory Courses - -* [Stanford's Machine Learning Course](http://cs229.stanford.edu/) -* [Google's Developer Machine Learning Course](https://developers.google.com/machine-learning) - - -### Deep Learning - - -* [A Chart of Neural Networks](http://www.asimovinstitute.org/neural-network-zoo/). -* [UCL Course on RL](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html) -* [Stanford's Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/) -* [The 9 CNN Papers You Need To Know About](https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html). -* [NVIDIA Deep Learning Course](https://www.youtube.com/playlist?list=PL5B692fm6--tI-ijknnVZWbXU2H4JpSYe) -* [DeepBench](https://github.com/baidu-research/DeepBench). - -#### Deep Learning Applications - -* [Deep Fake source code](https://github.com/deepfakes/faceswap/). - -#### Deep Learning Tools - -* [Tensorflow plaground](http://playground.tensorflow.org).