mirror of
https://github.com/The-Art-of-Hacking/h4cker.git
synced 2024-10-01 01:25:43 -04:00
Create ml_ai_datasets.md
This commit is contained in:
parent
8b47729a3d
commit
a222f076e5
62
ai_security/ML_Fundamentals/ml_ai_datasets.md
Normal file
62
ai_security/ML_Fundamentals/ml_ai_datasets.md
Normal file
@ -0,0 +1,62 @@
|
||||
# Datasets for AI / ML Research
|
||||
|
||||
1. **UCI Machine Learning Repository**: A collection of databases, domain theories, and data generators widely used by the machine learning community.
|
||||
Website: [UCI ML Repository](https://archive.ics.uci.edu/ml/index.php)
|
||||
|
||||
2. **Kaggle Datasets**: Offers a wide variety of datasets in different domains including economics, biology, computer vision, and natural language processing.
|
||||
Website: [Kaggle](https://www.kaggle.com/datasets)
|
||||
|
||||
3. **AWS Public Datasets**: Amazon Web Services offers a variety of public datasets that anyone can access.
|
||||
Website: [AWS Public Datasets](https://registry.opendata.aws/)
|
||||
|
||||
4. **Google Dataset Search**: A tool that enables the discovery of datasets stored across the web.
|
||||
Website: [Google Dataset Search](https://datasetsearch.research.google.com/)
|
||||
|
||||
5. **Microsoft Research Open Data**: A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences.
|
||||
Website: [Microsoft Research Open Data](https://msropendata.com/)
|
||||
|
||||
6. **OpenML**: An online platform for collaborative machine learning - easily share data, models, and experiments.
|
||||
Website: [OpenML](https://www.openml.org/)
|
||||
|
||||
7. **Data.gov**: The home of the U.S. Government’s open data, providing data, tools, and resources.
|
||||
Website: [Data.gov](https://www.data.gov/)
|
||||
|
||||
8. **EU Open Data Portal**: Provides access to an expanding range of data from the European Union institutions and other EU bodies.
|
||||
Website: [EU Open Data Portal](https://data.europa.eu/euodp/en/home)
|
||||
|
||||
9. **Awesome Public Datasets on GitHub**: A collection of high-quality open datasets in public domains.
|
||||
GitHub Repository: [Awesome Public Datasets](https://github.com/awesomedata/awesome-public-datasets)
|
||||
|
||||
10. **World Bank Open Data**: Free and open access to global development data.
|
||||
Website: [World Bank Open Data](https://data.worldbank.org/)
|
||||
|
||||
11. **CERN Open Data Portal**: Provides access to data generated by the Large Hadron Collider and other CERN experiments.
|
||||
Website: [CERN Open Data Portal](http://opendata.cern.ch/)
|
||||
|
||||
12. **National Aeronautics and Space Administration (NASA)**: Offers a wide range of datasets related to space and Earth sciences.
|
||||
Website: [NASA](https://data.nasa.gov/)
|
||||
|
||||
13. **NOAA Data Sets**: Provides access to national and global data on climate, weather, oceans, and coasts.
|
||||
Website: [NOAA](https://www.noaa.gov/data)
|
||||
|
||||
14. **ImageNet**: A dataset of over 15 million labeled high-resolution images across 22,000 categories.
|
||||
Website: [ImageNet](http://www.image-net.org/)
|
||||
|
||||
15. **COCO (Common Objects in Context)**: A dataset with millions of images containing objects in complex scenes with annotations.
|
||||
Website: [COCO Dataset](https://cocodataset.org/)
|
||||
|
||||
16. **Wikipedia: List of datasets for machine-learning research**: A wikipedia article providing a comprehensive list of datasets for machine-learning research. Website: [Wikipedia List](https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research)
|
||||
|
||||
17. **Natural Earth Data**: Offers free vector and raster map data at various scales.
|
||||
Website: [Natural Earth Data](https://www.naturalearthdata.com/)
|
||||
|
||||
18. **Reddit Datasets**: A subreddit that has datasets made available by the Reddit community.
|
||||
Website: [Reddit Datasets](https://www.reddit.com/r/datasets/)
|
||||
|
||||
19. **Quandl**: Provides financial, economic, and alternative datasets.
|
||||
Website: [Quandl](https://www.quandl.com/)
|
||||
|
||||
20. **Stanford Large Network Dataset Collection**: A collection of large network datasets including social networks, web graphs, etc.
|
||||
Website: [Stanford Network Analysis Project](http://snap.stanford.edu/data/index.html)
|
||||
|
||||
These sources offer a wide range of datasets from various domains, and you can explore them based on your specific requirements and interests in machine learning.
|
Loading…
Reference in New Issue
Block a user