Acessibilidade / Reportar erro
This document is related to:
This document is related to:

Product description classification in portuguese: performance assessment of machine learning algorithms, preprocessing and attribute extraction

Abstract:

The growing demand for automated product classification systems in e-commerce platforms has fueled the search for efficient solutions for product categorization, particularly in Portuguese. This study investigates the adaptation of classical information retrieval techniques, such as bag-of-words, TF, and TF-IDF, for the task of classifying short product descriptions. The research evaluates different preprocessing and tokenization strategies, including analyzing normalization impact. The results show that simple information retrieval methods, when combined with appropriate preprocessing and parameter optimization, can achieve significantly superior performance.

Keywords:
machine learning; natural language processing; text classification; product description; short text; bag of words; term frequency; inverse document frequency

Universidade Federal do Rio Grande do Sul Rua Ramiro Barcelos, 2705, sala 519 , CEP: 90035-007., Fone: +55 (51) 3308- 2141 - Porto Alegre - RS - Brazil
E-mail: emquestao@ufrgs.br