The following is a list of free data sources that can be used for predictive modeling, machine learning and text mining projects.
It contains many datasets that can be used for solving regression and classification problem statements. It is maintained by toronto university.
It contains data and R code for a book named "R and Data Mining".
Please provide the data source name in the comment box below that you found it useful and would like to add to this list
Useful Resources
Data for Statistics |
1. Datasets for Regression and Classification
Link : http://goo.gl/irjNg4
2. Datasets for Text Mining
Link : http://www.rdatamining.com/data
3. Kaggle Competition
It is the best place to discover and analyze public available data. It is a repository of hundreds of public available datasets.
Link : https://goo.gl/fHKuII
4. UCI Machine Learning Repository
It is one of the biggest repository of public data sets that can be used for regression, classification and machine learning projects.
5. Amazon Public Datasets
It provides a centralized repository of public data sets at free of cost for the analytics community.
It is a database of IMDB files which can be used for text mining and other data science projects.
It is the best place to discover and analyze public available data. It is a repository of hundreds of public available datasets.
Link : https://goo.gl/fHKuII
4. UCI Machine Learning Repository
It is one of the biggest repository of public data sets that can be used for regression, classification and machine learning projects.
5. Amazon Public Datasets
It provides a centralized repository of public data sets at free of cost for the analytics community.
Link : http://goo.gl/jRMHro
6. Microsoft Research
It is a repository of many useful big datasets that can be used for practicing any data science and machine learning technique. For example, there is a dataset that identifies 38M tweets collected for the analysis of social media messages related to the 2012 U.S. Presidential election.
Link : http://goo.gl/YE4Vi4
7. Yahoo Datasets
Yahoo has released the largest ever machine learning dataset for researchers and engineers.
Link : http://goo.gl/ajcIpt
8. IMDB Database
It is a database of IMDB files which can be used for text mining and other data science projects.
Link : http://www.imdb.com/interfaces
9. AppliedPredictiveModeling (R package)
It is a R package containing datasets mentioned in book "Applied Predictive Modeling" written by developer of one of the most popular R package 'caret'.
Link : https://goo.gl/QPJAXB
10. Machine Learning Data Set Repository
It is a repository of machine learning data. It contains hundreds of datasets for various streams.
Link : http://mldata.org/
11. Million Song Datasets
It is a collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio),
Link : http://goo.gl/88CcEa
12. Doing Data Science Book
It is the sample dataset that accompanies Doing Data Science by Cathy O'Neil and Rachel Schutt.
Link : https://goo.gl/cU5aQf
13. Revolution R Datasets
It is a repository of sample datasets used in Revolution R (now Microsoft R).
Link : http://goo.gl/Im2xOl
Useful Resources
I am recommending Listen Data as a clear leader in Data Science content. This list especially is well curated.
ReplyDeleteThank you for your lovely words. You have made my day :-)
DeleteI am impressed with your information.
ReplyDeleteGood to see all information of DS under one umbrella. Thank you
ReplyDelete