An Overview of the Classification Problem in Unbalanced Datasets Using the Statistical Construction of European Community Economic Activities

Authors

  • Yasin Bektas Mersin University
  • Jale BEKTAŞ

DOI:

https://doi.org/10.46291/ICONTECHvol5iss3pp31-37

Keywords:

Text Mining, Unbalanced Dataset, Classifiers, Nace

Abstract

The use of classical classifiers in unbalanced and multi-class data sets has always been a problem. In this study, a text mining work has been applied with well-known classifiers on the definitions of Statistical Construction of Economic Activities (NACE) codes in the European Community. In the study, first of all, the application was made on the unbalanced structure of the original data, then the performance measurement was performed by retesting the result data by making it balanced by weighting on a class basis. Common classifiers such as Decision Trees, Naiv Bayes, Support Vector Machines, Diametric Based Functions and Random Forest algorithms were used in the tests. The study showed us that as a result of data balancing of Decision Trees, the F-score value increased from 17.43% to 92%, giving the best performance.

References

Agrawal, R., & Batra, M. 2013. A detailed study on text mining techniques. International Journal of Soft Computing and Engineering, 2(6), 118-121.

Berry, M. W. 2004. Survey of text mining. Computing Reviews, 45(9), 548.

Duygu Analizi. In International Artificial Intelligence and Data Processing Symposium (IDAP'16), September (pp. 17-18).

Jusoh, S., & Alfawareh, H. M. 2012. Techniques, applications and challenging issue in text mining. International Journal of Computer Science Issues (IJCSI), 9(6), 431.

Kaynar, O., Görmez, Y., Yıldız, M., & Albayrak, A. 2016. Makine öğrenmesi yöntemleri ile Schnabl, E., & Zenker, A. 2013. Statistical classification of knowledge-intensive business services (KIBS) with NACE Rev. 2. Karlsruhe: Fraunhofer ISI.

Nace. 2008. Konu: Avrupa Topluluğunda Ekonomik Faaliyetlerin İstatistiki Sınıflaması. https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_CLS_DLD_NOHDR&StrNom=NACE_REV2&StrLanguageCode=TR .Erişim:Ağustos, 2021.

Tobback, E., Naudts, H., Daelemans, W., de Fortuny, E. J., & Martens, D. 2018. Belgian economic policy uncertainty index: Improvement through text mining. International journal of forecasting, 34(2), 355-365.

Uyumsoft, 2020. Konu: Nace Kodlama Sistemi. Konu: https://www.uyumsoft.com/nace-kodu-nedir-ne-ise-yarar/ . Erişim: Ağustos, 2021

Van den Brakel, J. 2010. Sampling and estimation techniques for the implementation of new classification systems: the change-over from NACE Rev. 1.1 to NACE Rev. 2 in business surveys. In Survey Research Methods (Vol. 4, No. 2, pp. 103-119).

Zulfikar, W. B., Irfan, M., Alam, C. N., & Indra, M. 2017.. The comparation of text mining with Naive Bayes classifier, nearest neighbor, and decision tree to detect Indonesian swear words on Twitter. In 2017 5th International Conference on Cyber and IT Service Management (CITSM) (pp. 1-5). IEEE.

Published

2021-09-25

How to Cite

Bektas, Y., & BEKTAŞ, J. (2021). An Overview of the Classification Problem in Unbalanced Datasets Using the Statistical Construction of European Community Economic Activities. ICONTECH INTERNATIONAL JOURNAL, 5(3), 31-37. https://doi.org/10.46291/ICONTECHvol5iss3pp31-37

Issue

Section

Articles