An Overview of the Classification Problem in Unbalanced Datasets Using the Statistical Construction of European Community Economic Activities
DOI:
https://doi.org/10.46291/ICONTECHvol5iss3pp31-37Keywords:
Text Mining, Unbalanced Dataset, Classifiers, NaceAbstract
The use of classical classifiers in unbalanced and multi-class data sets has always been a problem. In this study, a text mining work has been applied with well-known classifiers on the definitions of Statistical Construction of Economic Activities (NACE) codes in the European Community. In the study, first of all, the application was made on the unbalanced structure of the original data, then the performance measurement was performed by retesting the result data by making it balanced by weighting on a class basis. Common classifiers such as Decision Trees, Naiv Bayes, Support Vector Machines, Diametric Based Functions and Random Forest algorithms were used in the tests. The study showed us that as a result of data balancing of Decision Trees, the F-score value increased from 17.43% to 92%, giving the best performance.
References
Agrawal, R., & Batra, M. 2013. A detailed study on text mining techniques. International Journal of Soft Computing and Engineering, 2(6), 118-121.
Berry, M. W. 2004. Survey of text mining. Computing Reviews, 45(9), 548.
Duygu Analizi. In International Artificial Intelligence and Data Processing Symposium (IDAP'16), September (pp. 17-18).
Jusoh, S., & Alfawareh, H. M. 2012. Techniques, applications and challenging issue in text mining. International Journal of Computer Science Issues (IJCSI), 9(6), 431.
Kaynar, O., Görmez, Y., Yıldız, M., & Albayrak, A. 2016. Makine öğrenmesi yöntemleri ile Schnabl, E., & Zenker, A. 2013. Statistical classification of knowledge-intensive business services (KIBS) with NACE Rev. 2. Karlsruhe: Fraunhofer ISI.
Nace. 2008. Konu: Avrupa Topluluğunda Ekonomik Faaliyetlerin İstatistiki Sınıflaması. https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_CLS_DLD_NOHDR&StrNom=NACE_REV2&StrLanguageCode=TR .Erişim:Ağustos, 2021.
Tobback, E., Naudts, H., Daelemans, W., de Fortuny, E. J., & Martens, D. 2018. Belgian economic policy uncertainty index: Improvement through text mining. International journal of forecasting, 34(2), 355-365.
Uyumsoft, 2020. Konu: Nace Kodlama Sistemi. Konu: https://www.uyumsoft.com/nace-kodu-nedir-ne-ise-yarar/ . Erişim: Ağustos, 2021
Van den Brakel, J. 2010. Sampling and estimation techniques for the implementation of new classification systems: the change-over from NACE Rev. 1.1 to NACE Rev. 2 in business surveys. In Survey Research Methods (Vol. 4, No. 2, pp. 103-119).
Zulfikar, W. B., Irfan, M., Alam, C. N., & Indra, M. 2017.. The comparation of text mining with Naive Bayes classifier, nearest neighbor, and decision tree to detect Indonesian swear words on Twitter. In 2017 5th International Conference on Cyber and IT Service Management (CITSM) (pp. 1-5). IEEE.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 ICONTECH INTERNATIONAL JOURNAL
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.