Free Data Sources for Predictive Modeling and Text Mining

Deepanshu Bhalla 4 Comments
The following is a list of free data sources that can be used for predictive modeling, machine learning and text mining projects.
Data for Statistics

1. Datasets for Regression and Classification

It contains many datasets that can be used for solving regression and classification problem statements. It is maintained by toronto university.

2. Datasets for Text Mining

It contains data and R code for a book named "R and Data Mining".

Link :

3. Kaggle Competition

It is the best place to discover and analyze public available data. It is a repository of hundreds of public available datasets.

Link :

4. UCI Machine Learning Repository

It is one of the biggest repository of public data sets that can be used for regression, classification and machine learning projects.

5. Amazon Public Datasets

It provides a centralized repository of public data sets at free of cost for the analytics community.

Link :

6. Microsoft Research

It is a repository of many useful big datasets that can be used for practicing any data science and machine learning technique. For example, there is a dataset that identifies 38M tweets collected for the analysis of social media messages related to the 2012 U.S. Presidential election.

7. Yahoo Datasets

Yahoo has released the largest ever machine learning dataset for researchers and engineers.

8. IMDB Database

It is a database of IMDB files which can be used for text mining and other data science projects.

Link :

9. AppliedPredictiveModeling (R package)

It is a R package containing datasets mentioned in book "Applied Predictive Modeling" written by developer of one of the most popular R package 'caret'.

10. Machine Learning Data Set Repository

It is a repository of machine learning data. It contains hundreds of datasets for various streams.

11. Million Song Datasets

It is a collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio),

12. Doing Data Science Book

It is the sample dataset that accompanies Doing Data Science by Cathy O'Neil and Rachel Schutt.

13. Revolution R Datasets

It is a repository of sample datasets used in Revolution R (now Microsoft R).

Please provide the data source name in the comment box below that you found it useful and would like to add to this list

Useful Resources
Related Posts
Spread the Word!
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 4 Responses to "Free Data Sources for Predictive Modeling and Text Mining"
  1. I am recommending Listen Data as a clear leader in Data Science content. This list especially is well curated.

    1. Thank you for your lovely words. You have made my day :-)

  2. I am impressed with your information.

  3. Good to see all information of DS under one umbrella. Thank you

Next → ← Prev