Kristin hagen textlab

1/12/2023

Finally, we present the first results of data-driven dependency parsing of Norwegian, contrasting four state-of-the-art dependency parsers trained on the treebank. We then present the selection of texts and distribution between genres, as well as the annotation process and an evaluation of the inter-annotator agreement. This paper presents the core principles behind the syntactic annotation and how these principles were employed in certain specific cases. It is the first publically available treebank for Norwegian. The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmäl and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) Proceedings of the 21st Nordic Conference on Computational Linguistics 2016Ĭonstructing a Norwegian Academic Wordlist Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) 2017Ī modernised version of the Glossa corpus search system The LIA Treebank of Spoken Norwegian Dialects

The results show that the taggers based on either conditional random fields or neural networks perform much better than the rest, with the LSTM tagger getting the highest score. We go into some of the challenges posed by the task of tagging spoken, as opposed to written, language, and in particular a wide range of dialects as is found in the recordings of the LIA (Language Infrastructure made Accessible) project. The taggers all rely on different machine learning mechanisms: decision trees, hidden Markov models (HMMs), conditional random fields (CRFs), long-short term memory networks (LSTMs), and convolutional neural networks (CNNs). This paper describes an evaluation of five data-driven part-of-speech (PoS) taggers for spoken Norwegian. Proceedings of the Twelfth Language Resources and Evaluation Conference 2020Ĭomparing Methods for Measuring Dialect Similarity in Norwegian We have developed a spoken language parser on the basis of the annotated material and report on its accuracy both on a test set across the dialects and by holding out single dialects. We follow earlier efforts for Norwegian, in particular the LIA Treebank of spoken dialects transcribed in the Nynorsk variety of Norwegian, in the annotation principles to ensure interusability of the resources. The nature of the spoken data gives rise to various challenges both in segmentation and annotation.

It consists of dialect recordings made between 20 which have been digitised, segmented, transcribed and subsequently annotated with morphological and syntactic analysis. Based on synchronic data, we propose a diachronic account of the geographical distribution and argue that the development from V2 to non-V2 has started in subject questions, thus allowing us to relate the loss of the V2 requirement to changes in the properties of the complementizer som.This paper presents the NDC Treebank of spoken Norwegian dialects in the Bokmål variety of Norwegian. We also discuss the connection between non-V2 and the possibility of inserting the complementizer som under extraction of a wh-subject from an embedded clause, i.e. In subject questions, non-V2 is realized by inserting the complementizer som in second position instead of the verb. monosyllabic wh-elements (the latter argued to be heads) and subject vs.

We trace the geographical distribution of the two main variables: phrasal vs. In this paper, we consider variation in Verb Second (V2) word order in wh-questions across Norwegian dialects by investigating data from the Nordic Syntax Database (NSD), which consists of acceptability judgments collected at more than 100 locations in Norway.

0 Comments

Kristin hagen textlab

Leave a Reply.

Author

Archives

Categories