DS-3 Data reduction tasks using Scikit-learn

Variance Threshold

Univariate Feature Selection

  • Univariate feature selection works by selecting the best features based on univariate statistical tests.
  • To see whether there is a statistically significant relationship between them, Compare each feature to the target variable.
  • When we analyze the relationship between one feature and the target variable we ignore the other features. That is why it is called ‘univariate’.
  • Each feature has its own test score.
  • Finally, all the test scores are compared, and the features with top scores will be selected.
  • These objects take as input a scoring function that returns univariate scores and p-values (or only scores for SelectKBest and SelectPercentile):
  • For regression: f_regression, mutual_info_regression
    For classification: chi2, f_classif, mutual_info_classif
  1. f_classif (ANOVA)

Recursive Feature Elimination

Principal Component Analysis (PCA)




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

BigQuery — Is clustering more efficient than partitioning…? Yes !

Keeping Up With Data — Week 15 Reading List

How Does Python help in Data Analytics?

TASCHA Launches Development and Access to Information Dashboards

Bar and line graph of South African population connected to 3G network

Pneumonia Detection: Pushing the Boundaries of Human Ability with Deep Learning

Agile in Data Science: How can Scrum Work Effectively for Your Team?

How does MeaLeon use NLP? Part 3: Some Results Comparing One Hot Encoding and TF-IDF

UN Data Forum: Integrating Geospatial Analysis (Live Blog)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Happy Makadiya

Happy Makadiya

More from Medium

Exploration of Titanic Survival Using Decision Trees

Recognising Handwritten Digits

Recognizing Handwritten Digits with Scikit-learn

Making Subplots with Stacked Bar Charts Using Matplotlib