Feature screening algorithm for high dimensional data

2023;
: pp. 703–711
https://doi.org/10.23939/mmc2023.03.703
Received: February 15, 2023
Revised: July 03, 2023
Accepted: July 05, 2023

Mathematical Modeling and Computing, Vol. 10, No. 3, pp. 703–711 (2023)

1
Faculty of Sciences Ain Chock, Hassan II University
2
Faculty of Sciences Ain Chock, Hassan II University
3
Faculty of Sciences Ain Chock, Hassan II University

Currently, feature screening is becoming an important topic in the fields of machine learning and high-dimensional data analysis.  Filtering out irrelevant features from a set of variables is considered to be an important preliminary step that should be performed before any data analysis.  Many approaches have been proposed to the same topic after the work of Fan and Lv (J. Royal Stat. Soc., Ser. B.  70 (5), 849–911 (2008)), who introduced the sure screening property.  However, the performance of these methods differs from one paper to another.  In this work, we aim to add to this list a new algorithm performing feature screening inspired by the Kendall interaction filter (J. Appl. Stat. 50 (7), 1496–1514 (2020)) when the response variable is continuous.  The good behavior of our algorithm is proved through a comparison with an existing method, proposed in this work under several simulation scenarios.

  1. Mai Q., Zou H.  The fused Kolmogorov filter: A nonparametric model-free screening method.  The Annals of Statistics.  43 (4), 1471–1497 (2015).
  2. Fan J., Song R.  Sure Independence Screening in Generalized Linear Models With NPDimensionality.  The Annals of Statistics.  38 (6), 3567–3604 (2010).
  3. Huang D., Li R., Wang H.  Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.  Journal of Business & Economic Statistics.  32 (2), 237–244 (2014).
  4. Fan Y., Kong Y., Li D., Lv J.  Interaction pursuit with feature screening and selection.  Preprint arXiv:1605.08933 (2016).
  5. Fan J., Lv J.  Sure independence screening for ultrahigh dimensional feature space.  Journal of the Royal Statistical Society, Series B: Statistical Methodology.  70 (5), 849–911 (2008).
  6. Anzarmou Y., Mkhadri A., Oualkacha K.  The Kendall interaction filter for variable interaction screening in ultra high dimensional classification problems.  Journal of Applied Statistics.  50 (7), 1496–1514 (2020).
  7. Reese R., Dai X., Fu G.  Strong Sure Screening of Ultra-high Dimensional Data with Interaction Effects.  Preprint arXiv:1801.07785 (2018).
  8. Hao N., Zhang H. H.  Interaction Screening for Ultrahigh-Dimensional Data.  Journal of the American Statistical Association.  109 (507), 1285–1301 (2014).
  9. Niu Y. S., Hao N., Zhang H. H.  Interaction screening by partial correlation.  Statistics and Its Interface.  11 (2), 317–325 (2018).
  10. Moore J. H.  The ubiquitous nature of epistasis in determining susceptibility to common human diseases.  Human Heredity.  56 (1–3), 73–82 (2003).
  11. Cordell H. J.  Detecting gene-gene interactions that underlie human diseases.  Nature Reviews Genetics.  10 (6), 392–404 (2009).
  12. Cook R. D., Zhang X.  Fused estimators of the central subspace in sufficient dimension reduction.  Journal of the American Statistical Association.  109 (506), 815–827 (2014).