Data Discretization in Data Mining and Machine Learning
Synopsis
Data discretization is one of fundamental and essential preprocessing techniques used in Knowledge Discovery, Data Mining (DM) and Machine Learning (ML). The fundamental objective of discretization is to transform continuous attributes into discrete. This transformation allows quantitative data to be treated as qualitative. The transformation plays an important role in improving efficiency, interpretability, and compatibility of ML algorithms and specific models. This paper presents the basic concept of data discretization, its importance, discretization process, methods, framework, advantages, and associated limitations. The study covers theoretical concepts of discretization techniques such as equal-frequency binning, equal-width binning, clustering-based discretization, and entropy-based methods. Additionally, it also discusses the role of discretization in enhancing classification accuracy and reducing computational complexity.








