Free Data Sources for Predictive Modeling and Text Mining

The following is a list of free data sources that can be used for predictive modeling, machine learning and text mining projects.
Data for Statistics

1. Datasets for Regression and Classification

It contains many datasets that can be used for solving regression and classification problem statements. It is maintained by toronto university.

2. Datasets for Text Mining

It contains data and R code for a book named "R and Data Mining".

Link :

3. Kaggle Competition

It is the best place to discover and analyze public available data. It is a repository of hundreds of public available datasets.

Link :

4. UCI Machine Learning Repository

It is one of the biggest repository of public data sets that can be used for regression, classification and machine learning projects.

5. Amazon Public Datasets

It provides a centralized repository of public data sets at free of cost for the analytics community.

Link :

6. Microsoft Research

It is a repository of many useful big datasets that can be used for practicing any data science and machine learning technique. For example, there is a dataset that identifies 38M tweets collected for the analysis of social media messages related to the 2012 U.S. Presidential election.

7. Yahoo Datasets

Yahoo has released the largest ever machine learning dataset for researchers and engineers.

8. IMDB Database

It is a database of IMDB files which can be used for text mining and other data science projects.

Link :

9. AppliedPredictiveModeling (R package)

It is a R package containing datasets mentioned in book "Applied Predictive Modeling" written by developer of one of the most popular R package 'caret'.

10. Machine Learning Data Set Repository

It is a repository of machine learning data. It contains hundreds of datasets for various streams.

11. Million Song Datasets

It is a collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio),

12. Doing Data Science Book

It is the sample dataset that accompanies Doing Data Science by Cathy O'Neil and Rachel Schutt.

13. Revolution R Datasets

It is a repository of sample datasets used in Revolution R (now Microsoft R).

