3.9 KiB
Datasets for AI / ML Research
-
UCI Machine Learning Repository: A collection of databases, domain theories, and data generators widely used by the machine learning community. Website: UCI ML Repository
-
Kaggle Datasets: Offers a wide variety of datasets in different domains including economics, biology, computer vision, and natural language processing. Website: Kaggle
-
AWS Public Datasets: Amazon Web Services offers a variety of public datasets that anyone can access. Website: AWS Public Datasets
-
Google Dataset Search: A tool that enables the discovery of datasets stored across the web. Website: Google Dataset Search
-
Microsoft Research Open Data: A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences. Website: Microsoft Research Open Data
-
OpenML: An online platform for collaborative machine learning - easily share data, models, and experiments. Website: OpenML
-
Data.gov: The home of the U.S. Government’s open data, providing data, tools, and resources. Website: Data.gov
-
EU Open Data Portal: Provides access to an expanding range of data from the European Union institutions and other EU bodies. Website: EU Open Data Portal
-
Awesome Public Datasets on GitHub: A collection of high-quality open datasets in public domains. GitHub Repository: Awesome Public Datasets
-
World Bank Open Data: Free and open access to global development data. Website: World Bank Open Data
-
CERN Open Data Portal: Provides access to data generated by the Large Hadron Collider and other CERN experiments. Website: CERN Open Data Portal
-
National Aeronautics and Space Administration (NASA): Offers a wide range of datasets related to space and Earth sciences. Website: NASA
-
NOAA Data Sets: Provides access to national and global data on climate, weather, oceans, and coasts. Website: NOAA
-
ImageNet: A dataset of over 15 million labeled high-resolution images across 22,000 categories. Website: ImageNet
-
COCO (Common Objects in Context): A dataset with millions of images containing objects in complex scenes with annotations. Website: COCO Dataset
-
Wikipedia: List of datasets for machine-learning research: A wikipedia article providing a comprehensive list of datasets for machine-learning research. Website: Wikipedia List
-
Natural Earth Data: Offers free vector and raster map data at various scales. Website: Natural Earth Data
-
Reddit Datasets: A subreddit that has datasets made available by the Reddit community. Website: Reddit Datasets
-
Quandl: Provides financial, economic, and alternative datasets. Website: Quandl
-
Stanford Large Network Dataset Collection: A collection of large network datasets including social networks, web graphs, etc. Website: Stanford Network Analysis Project
These sources offer a wide range of datasets from various domains, and you can explore them based on your specific requirements and interests in machine learning.