When we have an interview in person and the interviewers want you to show them some machine learning skills, you may get nervous and you will think what kind of data you want to use. The datasets from sklearn will be very helpful in this kind of situation. As we know machine learning can be divided into three parts — supervised learning , unsupervised learning and reinforcement learning.
1. Supervised learning
1.1 Regression
There are several datasets which are very common used for regression such as : load_boston, load_linnerud, Also we can use real life data from sklearn datasets such as: fetch_california_housing.
For example: If we want to apply Linear Regression, Lasso Regression, Ridge Regression and ElasticNet Regression, we can use Boston housing price dataset . It is really efficient to use datasets from sklearn and some datasets are already cleaned up for us and ready to use. We can do train test split and analyze data using different models directly.
PS: for Boston dataset, it no needs to do ‘feature selection’.
1.2 Classification
1.2.1 Logistic Regression
We can apply Logistic Regression using load_breast_cancer.
1.2.2 Random Forest
We can apply Random Forest using load_wine dataset. The best parameter is n_estimators is 39 ,the random_state=90 and the score is 1.