The Precision Motion and Intelligent Robotics Technology Group from the Ningbo Institute of Materials Technology and Engineering (NIMTE) of the Chinese Academy of Sciences (CAS) has proposed a robust feature selection method by removing noise entropy within mutual information. The study was published in IEEE Transactions on Industrial Informatics.
Feature selection is one of the key issues for machine learning and data mining, attracting increasing attention. As a common technique for dimensionality reduction, feature selection removes irrelevant and redundant information from the data set to obtain an optimal feature subset.
The data obtained during setting up equipment or products in various industrial sectors generally feature small sample sizes and high dimensionality. This can lead to substantial computational costs and overfitting of the model if such original data are used without treatment.
Information-theoretic feature selection methods can effectively assess the most discriminative features using mutual information metrics. However, sensor noise in features can bias mutual information, degrading classification accuracy.
Researchers at NIMTE modeled the feature noise as a censored normal distribution. Based on the principle of maximum entropy, the entropy of noise was determined by solving the variance equation in transmission.
Besides, a noise-free mutual information metric was developed to assess the relevance of a label and noise-corrupted features. Thus, the entropy of unknown feature noise within mutual information was removed while retaining noisy samples, eliminating the impact of noise in classification with limited samples.
Compared with conventional methods, the proposed method achieves a more reliable noise assessment for industrial data since it covers feature noise across all noisy samples.
Eventually, a novel maximal noise-free relevance and minimal redundancy (MNFR-MR) criterion was proposed, achieving robust feature selection.
Experimental results on fifteen industrial datasets from telecommunication, cybersecurity, medical, and pharmaceutical industries have validated the proposed method’s effectiveness in improving classification accuracy.
This innovative feature selection method addresses critical challenges in processing industrial data with limited samples. As industries increasingly embrace data-driven intelligence, this versatile approach holds great potential to unlock actionable insights across domains, like the industrial internet of things (IIoT) and digital twins.
Fig. Framework of the proposed robust feature selection method (Image by NIMTE)
Contact
CHEN Silu
Ningbo Institute of Materials Technology and Engineering
E-mail: chensilu@nimte.ac.cn