Acessibilidade / Reportar erro

University of Macau Portuguese learner corpus and teaching of Portuguese L2

Abstract

This article presents a corpus of Chinese learners of Portuguese L2 with PoS and lemma annotations, highlighting its potential for quantitative and qualitative analysis in identifying linguistic patterns among learners, thus contributing to the teaching of Portuguese L2. This corpus (University of Macau Portuguese Learners Corpus), named UMPLC, contains a total of 933 compositions produced by 122 Portuguese students from University of Macau over three consecutive years of study. PoS and lemma annotation was performed using Stanza, an automatic annotator developed by Qi et al. ( 2020 QI, Peng; ZHANG, Yuhao; ZHANG, Yuhui; BOLTON, Jason; MANNING, Christopher D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In: PROCEEDINGS of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. [S. l.: s. n.], 2020. Disponível em: https://nlp.stanford.edu/pubs/qi2020stanza.pdf .
https://nlp.stanford.edu/pubs/qi2020stan...
). To ensure annotation consistency, the results were manually reviewed. In this research, the PoS and lemma information enables us to quantitatively and qualitatively investigate various phenomena in the corpus relating to lexical aspects and diachronic changes in this regard. Two studies were conducted based on a contrastive approach, comparing the Portuguese of learners in the corpus with native Portuguese. Non-native linguistic characteristics were discovered, allowing Portuguese L2 teachers to focus on areas requiring corrective work.

Keywords:
Learner corpus; Chinese learners of Portuguese L2; Quantitative and qualitative analysis; Pedagogical applications

Universidade Federal de Minas Gerais - UFMG Av. Antônio Carlos, 6627 - Pampulha, Cep: 31270-901, Belo Horizonte - Minas Gerais / Brasil, Tel: +55 (31) 3409-6009 - Belo Horizonte - MG - Brazil
E-mail: revistatextolivre@letras.ufmg.br