Digitala Vetenskapliga Arkivet

Ändra sökning
Avgränsa sökresultatet
123 1 - 50 av 119
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Megyesi, BeátaUppsala universitet, Institutionen för lingvistik och filologi.
    Proceedings of the Workshop on NLP and Pseudonymisation2019Proceedings (redaktörskap) (Refereegranskat)
    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 2.
    Ahrenberg, Lars
    et al.
    Linköping University, Sweden.
    Megyesi, BeátaUppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Proceedings of the Workshop on NLP and Pseudonymisation2019Proceedings (redaktörskap) (Refereegranskat)
    Ladda ner fulltext (pdf)
    fulltext
  • 3. Alemu, Atelach
    et al.
    Hulth, Anette
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. Datorlingvistik.
    General-Purpose Text Categorization Applied to the Medical Domain.2007Rapport (Övrigt vetenskapligt)
    Abstract [en]

    This paper presents work where a general-purpose text categorization method was applied to categorize medical free-texts. The purpose of the experiments was to examine how such a method performs without any domain-specific knowledge, hand-crafting or tuning. Additionally, we compare the results from the general-purpose method with results from runs in which a medical thesaurus as well as automatically extracted keywords were used when building the classifiers. We show that standard text categorization techniques using stemmed unigrams as the basis for learning can be applied directly to categorize medical reports, yielding an F-measure of 83.9, and outperforming the more sophisticated methods.

  • 4.
    Andréasson, Maia
    et al.
    Department of Swedish Language, University of Gothenburg.
    Borin, Lars
    Department of Swedish Language, University of Gothenburg.
    Forsberg, Markus
    Department of Swedish Language, University of Gothenburg.
    Beskow, Jonas
    School of Computer Science and Communication, KTH.
    Carlsson, Rolf
    School of Computer Science and Communication, KTH.
    Edlund, Jens
    School of Computer Science and Communication, KTH.
    Elenius, Kjell
    School of Computer Science and Communication, KTH.
    Hellmer, Kahl
    School of Computer Science and Communication, KTH.
    House, David
    School of Computer Science and Communication, KTH.
    Merkel, Magnus
    Department of Computer Science, Linköping University.
    Forsbom, Eva
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Eriksson, Anders
    Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg.
    Strömqvist, Sven
    Centre for Languages and Literature, Lund University.
    Swedish CLARIN Activities2009Ingår i: Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources / [ed] Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard, Eiríkur Rögnvaldsson and Koenraad de Smedt, Northern European Association for Language Technology (NEALT) , 2009, s. 1-5Konferensbidrag (Refereegranskat)
    Abstract [en]

    Although Sweden has yet to allocate funds specifically intended for CLARIN activities, there are some ongoing activities which are directly relevant to CLARIN, and which are explicitly linked to CLARIN. These activities have been funded by the Committee for Research Infrastructures and its subcommittee DISC (Database Infrastructure Committee) of the Swedish Research Council.

  • 5.
    Baró, Arnau
    et al.
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
    Chen, Jialuo
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
    Fornés, Alicia
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts2019Ingår i: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage: DATeCH2019, New York: ACM , 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an unsupervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods.

  • 6. Bethelsen, Harald
    et al.
    Megyesi, Beata
    Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora2000Ingår i: Proceedings of the Third International Workshop on TEXT, SPEECH and DIALOGUE, 2000, s. 27-32Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.

  • 7.
    Borin, Lars
    et al.
    University of Gothenburg.
    Tahmasebi, Nina
    University of Gothenburg.
    Volodina, Elena
    University of Gothenburg.
    Ekman, Stefan
    Swedish National Data Service, University of Gothenburg.
    Jordan, Caspar
    Swedish National Data Service, University of Gothenburg.
    Viklund, Jon
    Uppsala University.
    Megyesi, Beáta
    Uppsala University.
    Näsman, Jesper
    Uppsala University.
    Palmér, Anne
    Uppsala University.
    Wirén, Mats
    Stockholm University.
    Björkenstam, Kristina
    Stockholm University.
    Grigonytė, Gintaré
    Stockholm University.
    Gustafson Capková, Sofia
    Stockholm University.
    Kosiński, Tomasz
    Chalmers University of Technology.
    Swe-Clarin: Language Resources and Technology for Digital Humanities2016Ingår i: Extended Papers of the International Symposium on Digital Humanities, CEUR , 2016, Vol. 2021, s. 29-51Konferensbidrag (Refereegranskat)
    Abstract [en]

    CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an eresearch tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council.

    In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes.

    While the pilot projects clearly make substantial contributions to DH, they also reveal some needs for more development, and in particular a need for document-level access to the text materials. As a consequence of this, work has now been initiated in Swe-Clarin to meet this need, so that Swe-Clarin together with HSS scholars investigating intricate research questions can take on the methodological challenges of big-data language-based digital humanities.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 8.
    Borin, Lars
    et al.
    Språkbanken, Department of Swedish, University of Gothenburg.
    Tahmasebi, Nina
    Språkbanken, Department of Swedish, University of Gothenburg.
    Volodina, Elena
    Språkbanken, Department of Swedish, University of Gothenburg.
    Ekman, Stefan
    Swedish National Data Service, University of Gothenburg.
    Jordan, Caspar
    Swedish National Data Service, University of Gothenburg.
    Viklund, Jon
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Historisk-filosofiska fakulteten, Litteraturvetenskapliga institutionen.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Näsman, Jesper
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Palmér, Anne
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för nordiska språk.
    Wirén, Mats
    Department of Linguistics, Stockholm University.
    Björkenstam, Kristina N.
    Department of Linguistics, Stockholm University.
    Grigonytė, Gintaré
    Department of Linguistics, Stockholm University.
    Gustafson Capková, Sofia
    Department of Linguistics, Stockholm University.
    Kosiński, Tomasz
    Department of Applied IT, Chalmers University of Technology.
    Swe-Clarin: Language Resources and Technology for Digital Humanities2016Ingår i: Digital Humanities 2016: Extended Papers of the International Symposium on Digital Humanities (DH 2016), Växjö, Sweden, November, 7-8, 2016 / [ed] Koraljka Golub; Marcelo Milrad, 2016, s. 29-51Konferensbidrag (Refereegranskat)
    Abstract [en]

    CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an eresearch tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council.

    In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes.

    While the pilot projects clearly make substantial contributions to DH, they also reveal some needs for more development, and in particular a need for document-level access to the text materials. As a consequence of this, work has now been initiated in Swe-Clarin to meet this need, so that Swe-Clarin together with HSS scholars investigating intricate research questions can take on the methodological challenges of big-data language-based digital humanities.

    Ladda ner fulltext (pdf)
    fulltext
  • 9. Carlson, Rolf
    et al.
    Granström, Björn
    Heldner, Mattias
    House, David
    Megyesi, Beata
    Strangert, Eva
    Swerts, Mark
    Boundaries and groupings - the structuring of speech in different communicative situations: a description of the GROG project2002Ingår i: Proceedings of Fonetik 2002, 2002Konferensbidrag (Refereegranskat)
    Abstract [en]

    The goal of the project is to model the prosodic structuring of speech in terms of boundaries and groupings. The modeling will include different communicative situations and be based on existing as well as new speech corpora. Production and perception studies will be used in parallel with automatic methods developed for analysis, modeling and prediction of prosody. The model will be perceptually evaluated using synthetic speech.

  • 10.
    Chen, Jialuo
    et al.
    Computer Vision Center, Computer Science Department, Universitat Autonoma de Barcelona, Spain.
    Souibgui, Mohamed Ali
    Computer Vision Center, Computer Science Department, Universitat Autonoma de Barcelona, Spain.
    Fornes, Alicia
    Computer Vision Center, Computer Science Department, Universitat Autonoma de Barcelona, Spain.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images2021Ingår i: Proceedings of the 4th International Conference on Historical Cryptology HistoCrypt 2021 / [ed] Carola Dahlke, 2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    Historical ciphers contain a wide range of symbols from various symbol sets. Identifying the cipher alphabet is a prerequisite before decryption can take place and is a time-consuming process. In this work we explore the use of image processing for identifying the underlying alphabet in cipher images, and to compare alphabets between ciphers. The experiments show that ciphers with similar alphabets can be successfully discovered through clustering.

    Ladda ner fulltext (pdf)
    fulltext
  • 11.
    Chen, Jialuo
    et al.
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona.
    Souibgui, Mohamed Ali
    Universitat Autònoma de Barcelona.
    Fornés, Alicia
    Universitat Autònoma de Barcelona.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    A Web-based Interactive Transcription Tool for Encrypted Manuscripts2020Ingår i: Proceedings of the 3rd International Conference on Historical Cryptology HistoCrypt 2020 / [ed] Beáta Megyesi, Linköping, 2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available 1 .

    Ladda ner fulltext (pdf)
    fulltext
  • 12.
    Csató, Éva Ágnes
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Saxena, Anju
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    Sågvall Hein, Anna
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik.
    A Turkish-Swedish parallel corpus: Orhan Pamuk Beyaz Kale-Vita Borgen2006Övrigt (Refereegranskat)
  • 13.
    Csató, Éva Ágnes
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Kilimci, Songul
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Using Parallel Corpora in Data-Driven Teaching of Turkish in Sweden.2010Konferensbidrag (Refereegranskat)
    Abstract [en]

    The paper demonstrates how data-driven learning methods are applied in teaching Turkish as a foreign language at the Department of Linguistics and Philology, Uppsala University. In data-driven teaching, language corpora, concordance programs, and annotation tools developed in collaboration with computational linguists are employed. This paper illustrates how resources developed initially for research purposes in different subjects (such as Computational Linguistics, Linguistics, Turkic languages), are now being used in teaching environments.

    We present the Swedish-Turkish parallel corpus providing students and researchers with easily accessible annotated linguistic data. The web-based corpora can be used both by regular and distance students. They function also as learning tools for formulating and testing hypotheses concerning lexical, morphological and syntactic aspects of Turkish. Furthermore, they help the students to practice contrastive studies and translation between Swedish and Turkish.

  • 14. Dahlke, Carola
    et al.
    Megyesi, BeátaUppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Proceedings of the 5th International Conference on Historical Cryptology2022Proceedings (redaktörskap) (Refereegranskat)
  • 15. Dahlqvist, Bengt
    et al.
    Megyesi, Beata
    Changing the tokenization in Talbanken to SUC2.02007Rapport (Övrigt vetenskapligt)
  • 16.
    Elenius, Kjell
    et al.
    Speech, Music and Hearing, KTH.
    Forsbom, Eva
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Language Resources and Tools for Swedish: A Survey2008Ingår i: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Paris: European Language Resources Association (ELRA) , 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    Language resources and tools to create and process these resources are necessary components in human language technology and natural language applications. In this paper, we describe a survey of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on a questionnaire sent to industry and academia, institutions and organizations, and to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide.

  • 17.
    Elenius, Kjell
    et al.
    Speech, Music and Hearing.
    Forsbom, Eva
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Survey on Swedish Language Resources2008Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Language resources, such as lexicons, databases, dictionaries, corpora, and tools to create and process these resources are necessary components in human language technology and natural language applications. In this survey, we describe the inventory process and the results of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on an investigation sent to industry and academia, institutions and organizations, to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide. This study is a result of the project called “An Infrastructure for Swedish language technology” supported by the Swedish Research Council´s Committee for Research Infrastructures 2007 - 2008.

  • 18.
    Fornes, Alicia
    et al.
    Computer Vision Center, Universitat Autònoma de Barcelona, Spain.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Mas, Joan
    Computer Vision Center, Universitat Autònoma de Barcelona, Spain.
    Transcription of Encoded Manuscripts with Image Processing Techniques2017Ingår i: Proceedings of Digital Humanities 2017., Canada, 2017Konferensbidrag (Refereegranskat)
    Ladda ner fulltext (pdf)
    fulltext
  • 19. Fornés, Alicia
    et al.
    Chen, Jialuo
    Torras, Pau
    Badal, Carles
    Megyesi, Beáta
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Waldispühl, Michelle
    Kopal, Nils
    Lasry, George
    ICDAR 2024 Competition on Handwriting Recognition of Historical Ciphers2024Ingår i: Document Analysis and Recognition - ICDAR 2024, 2024Konferensbidrag (Refereegranskat)
    Abstract [en]

    Handwritten Text Recognition (HTR) in low-resource scenarios (i.e. when the amount of labeled data is scarce) is a challenging problem. This is particularly true for historical encrypted manuscripts, commonly known as ciphers, which contain secret messages and were typically used in military or diplomatic correspondence, records of secret societies, or private letters. To hide their contents, the sender and receiver created their own secret method of writing. The cipher alphabets often include digits, Latin or Greek letters, Zodiac and alchemical signs, combined with various diacritics, as well as invented ones. The first step in the decryption process is the transcription of these manuscripts, which is difficult due to the great variation in handwriting styles and cipher alphabets with a limited number of pages. Although different strategies can be considered to deal with the insufficient amount of training data (e.g., few-shot learning, self-supervised learning), the performance of available HTR models is not yet satisfactory. Thus, the proposed competition, which includes ciphers with a large number of symbol sets and scribes, aims to boost research in HTR in low-resource scenarios.

  • 20. Gambardell, Maria Elena
    et al.
    Pettersson, Eva
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Identifying Cleartext in Historical Ciphers2022Ingår i: Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022. / [ed] Rachele Sprugnoli and Marco Passarotti, 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    In historical encrypted sources we can find encrypted text sequences, also called ciphertext, as well as non-encrypted cleartexts written in a known language. While most of the cryptanalysis focuses on the decryption of ciphertext, cleartext is often overlooked although it can give us important clues about the historical interpretation and contextualisation of the manuscript. In this paper, we investigate to what extent we can automatically distinguish cleartext from ciphertext in historical ciphers and to what extent we are able to identify its language. The problem is challenging as cleartext sequences in ciphers are often short, up to a few words, in different languages due to historical code-switching. To identify the sequences and the language(s), we chose a rule-based approach and run 7 different models using historical language models on various ciphertexts.

  • 21. García, Goio
    et al.
    Torras, Pau
    Fornés, Alicia
    Megyesi, Beáta
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Exploring the Alignment of Transcriptions to Images of Encrypted Manuscripts2024Ingår i: Proceedings of the 7th International Conference on Historical Cryptology HistoCrypt 2024 / [ed] Michelle Waldispühl; Beáta Megyesi, Tartu University Library , 2024, s. 103-107Konferensbidrag (Refereegranskat)
    Abstract [en]

    The automatic transcription of encrypted manuscripts is a challenge due to the different handwriting styles and the often invented symbol alphabets. Many transcription methods require annotated sources, including symbol locations. However, most existing transcriptions are provided at line or page level, making it necessary to find the bounding boxes of the transcribed symbols in the image, a process referred to as alignment. So, in this work, we develop several alignment methods, and discuss their performance on encrypted documents with various symbol sets.

  • 22. Gustafson-Capkova, Sofia
    et al.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    A Comparative Study of Pauses in Dialogues and Read Speech.2001Ingår i: Proceedings of Eurospeech 2001, 2001, s. 931-935Konferensbidrag (Refereegranskat)
    Abstract [en]

    This study aims to investigate the length, frequency and position of various types of pauses in three different speaking styles: elicited spontaneous dialogues, professional reading and non-professional reading.

  • 23. Gustafson-Capkova, Sofia
    et al.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Silence and Discourse Context in Read Speech and Dialogues in Swedish2002Ingår i: Proceedings of the Speech Prosody 2002 conference, 2002, s. 363-366Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this study, we investigate the correlation between silent pauses and discourse boundaries in the notion of theme shift. We examine three speaking styles in Swedish: professional and non-professional reading, and elicited spontaneous dialogues. Considerable attention is given to the syntactic and discourse context in which pauses appear, as well as the characteristics of the discourse structure in terms of pauses.

  • 24. Hall, Johan
    et al.
    Nilsson, Jens
    Nivre, Joakim
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. Datorlingvistik.
    Eryigit, Gulsen
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. Datorlingvistik.
    Nisson, Mattias
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. datorlingvistik.
    Saers, Markus
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. datorlingvistik.
    Single Malt or Blended? A Study in Multilingual Parser Optimization.2007Ingår i: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 2007Konferensbidrag (Refereegranskat)
  • 25.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Megyesi, Beáta
    KTH Speech, Music and Hearing.
    Exploring the prosody-syntax interface in conversations2003Ingår i: ceedings ICPhS 2003, Barcelona, Spain: ICPhS , 2003, s. 2501-2504Konferensbidrag (Refereegranskat)
  • 26. Hulth, Anette
    et al.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. Datorlingvistik.
    A Study on Automatically Extracted Keywords in Text Categorization2006Ingår i: Proceedings of International Conference of Association for Computational Linguistics, 2006Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents a study on if and how automatically extracted

    keywords can be used to improve text categorization. In summary we

    show that a higher performance --- as measured by micro-averaged

    F-measure on a standard text categorization collection --- is achieved

    when the full-text representation is combined with the automatically

    extracted keywords. The combination is obtained by giving higher

    weights to words in the full-texts that are also extracted as

    keywords. We also present results for experiments in which the

    keywords are the only input to the categorizer, either represented as

    unigrams or intact. Of these two experiments, the unigrams have the

    best performance, although neither performs as well as headlines only.

  • 27. Héder, Mihály
    et al.
    Fornés, Alicia
    Kopal, Nils
    Szigeti, Ferenc
    Megyesi, Beáta
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Supporting Historical Cryptology: The Decrypt Pipeline2024Ingår i: Proceedings of the 7th International Conference on Historical Cryptology HistoCrypt 2024 / [ed] Michelle Waldispühl; Beáta Megyesi, Tartu University Library , 2024, s. 127-134Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a set of resources and tools to support research and development in the field of historical cryptology. The tools aim to support transcription and decipherment of ciphertexts, developed to work together in a pipeline. It encompasses cataloging these documents into the Decode database, which houses ciphers dating from the 14th century to 1965, transcription using both manual and AI-assisted methods, cryptanalysis, and subsequent historical and linguistic analysis to contextualize decrypted content. The project encounters challenges with the accuracy of automated transcription technologies and the necessity for significant user involvement in the transcription and analysis processes. These insights highlight the critical balance between technological innovation and the indispensable input of domain expertise in advancing the field of historical cryptology.

  • 28. Héder, Mihály
    et al.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    The DECODE Database of Historical Ciphers and Keys: Version 22022Ingår i: Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022. / [ed] Carola Dahlke and Beáta Megyesi, Linköping: Linköping University Electronic Press, 2022, s. 111-114Konferensbidrag (Refereegranskat)
    Abstract [en]

    We report recent developments of the DE-CODE database aimed for the system-atic collection and annotation of encryptedsources: ciphertexts, keys and related doc-uments. We released a new, more func-tional graphical user interface, revisedsome metadata features and enlarged thecollection and tripled its size.

  • 29. Ilinykh, Nikolai
    et al.
    Morger, FelixDannélls, DanaDobnik, SimonMegyesi, BeátaUppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.Nivre, JoakimUppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. RISE.
    Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)2023Proceedings (redaktörskap) (Refereegranskat)
  • 30.
    Knight, Kevin
    et al.
    University of Southern California.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Schaefer, Christiane
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    The Copiale Cipher2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Copiale cipher is a 105-page enciphered book dated 1866. We describe the features of the book and the method by which we deciphered it.

  • 31.
    Knight, Kevin
    et al.
    University of Southern California.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Schaefer, Christiane
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    The Secrets of the Copiale Cipher2011Ingår i: Research into Freemasonry and Fraternalism, ISSN 1757-2460, Vol. 2, nr 2, s. 314-324Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The Copiale Cipher is a 105-page, hand-written encrypted manuscript from the mid-eighteenth century. Its code was cracked and the text was deciphered by using modern computational technology combined with philological methods. We describe the book, the features of the text, and give a brief summary of the method by which we deciphered it. Finally, we present the content and the secret society, namely the Oculists, who were hiding behind the cipher. 

  • 32. Lasry, George
    et al.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Kopal, Nils
    Deciphering Papal Ciphers from the 16th to the 18th Century2021Ingår i: Cryptologia, ISSN 0161-1194, E-ISSN 1558-1586, Vol. 45, nr 6, s. 479-540Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In Meister’s 1906 landmark study, “Die Geheimschrift im Dienste der päpstlichen Kurie von ihren Anfängen bis zum Ende des XVI Jahrhunderts”, the 16th Century papal cryptographic service is described as a vibrant, highly professional organization, at the forefront of the science of cryptography in the Late Renaissance. In his work from 1993, Alvarez concluded that by the 19th Century, “the reputation of papal cryptography, once so lustrous, has sadly faded.” However, until now, very little was known about the evolution of papal cryptography from the 16th to the 18th Century. In this article, we describe how we obtained a large collection of original papal ciphertexts from the Vatican archives, transcribed them, and how we were able to recover most of the keys, and to decipher the original plaintexts using novel cryptanalysis methods and the open-source e-learning CrypTool platform. The recovered keys and decipherments provide unique insights into papal cryptographic practices from the 16th to the 18th Century. The 16th Century is characterized by innovation and a high level of sophistication, with a primary focus on cryptographic security. From the 17th Century, only the simpler but also less secure forms of ciphers remain in use, and papal cryptography significantly lags behind other European states.

    Ladda ner fulltext (pdf)
    fulltext
  • 33. Láng, Benedek
    et al.
    Megyesi, Beáta
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    An STS analysis of a digital humanities collaboration: trading zones, boundary objects, and interactional expertise in the DECRYPT project2024Ingår i: Humanities and Social Sciences Communications, E-ISSN 2662-9992, Vol. 11, nr 1, artikel-id 618Artikel i tidskrift (Refereegranskat)
    Abstract [en]

     A widely shared recognition over the past decade is that the methodology and the basic concepts of science and technology studies (STS) can be used to analyze collaborations in the cross-disciplinary field of digital humanities (DH). The concepts of trading zones (Galison, 2010), boundary objects (Star and Griesemer, 1989), and interactional expertise (Collins and Evans, 2007) are particularly fruitful for describing projects in which researchers from massively different epistemic cultures (Knorr Cetina, 1999) are trying to develop a common language. The literature, however, primarily concentrates on examples where only two parties, historians and IT experts, work together. More exciting perspectives open up for analysis when more than two, more nuanced and different epistemic cultures seek a common language and common research goals. In the DECRYPT project funded by the Swedish Research Council, computational linguists, historians, computer scientists and AI experts, cryptologists, computer vision specialists, historical linguists, archivists, and philologists collaborate with strikingly different methodologies, publication patterns, and approaches. They develop and use common resources (including a database and a large collection of European historical texts) and tools (among others a code-breaking software, a hand-written text recognition tool for transcription), researching partly overlapping topics (handwritten historical ciphers and keys) to reach common goals. In this article, we aim to show how the STS concepts are illuminating when describing the mechanisms of the DECRYPT collaboration and shed some light on the best practices and challenges of a truly cross-disciplinary DH project.

  • 34. Láng, Benedek
    et al.
    Megyesi, Beáta
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Kopal, Nils
    Mikhalev, Vasily
    Tudor, Crina
    Waldispühl, Michelle
    Cipher key instructions in early modern Europe: analysis and text edition2024Ingår i: Cryptologia, ISSN 0161-1194, E-ISSN 1558-1586Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present an overview of instructions for the use of European historical cipher keys in early modern times. We describe the structure of instructions and the content presented to the key users. We exemplify various key instruction types and give a text edition of typical examples in various languages. The study is based on the analysis of more than 1,600 cipher keys collected from archives and libraries in ten European countries. We examine the practical implementation of cipher keys to the extent that instructions offer insights into everyday cryptographic practices. We focus on the typical rules scribes were expected to adhere to and the common errors they were instructed to avoid. We aim to reconstruct the apprehensions and considerations of the authors of cipher keys: They sought to offer assistance to users while likely harboring concerns regarding the potential misuse of their intellectual product. Given the secretive nature of cryptology, the documentation of knowledge transfer is scarce. In addition to the detailed manuals authored by well-known cryptologists, anonymous cipher key instructions offer valuable insights into this knowledge transfer process. By studying these instructions, historians gain direct access to a realm of knowledge that would otherwise remain hidden from their view.

  • 35. Magnifico, Giacomo
    et al.
    Megyesi, Beáta
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Souibgui, Mohamed Ali
    Universitat Autonoma de Barcelona, Spain.
    Chen, Jialuo
    Universitat Autonoma de Barcelona, Spain.
    Fornés, Alicia
    Universitat Autonoma de Barcelona, Spain.
    Lost in Transcription of Graphic Signs in Ciphers2022Ingår i: Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022 / [ed] Carola Dahlke and Beáta Megyesi, Linköping: Linköping University Electronic Press, 2022, s. 153-158Konferensbidrag (Refereegranskat)
    Abstract [en]

    Hand-written Text Recognition techniques withthe aim to automatically identify and transcribehand-written text have been applied to histor-ical sources including ciphers. In this paper,we compare the performance of two machinelearning architectures, an unsupervised methodbased on clustering and a deep learning methodwith few-shot learning. Both models are testedon seen and unseen data from historical cipherswith different symbol sets consisting of varioustypes of graphic signs. We compare the modelsand highlight their differences in performance,with their advantages and shortcomings.

  • 36. Mattias, Heldner
    et al.
    Beata, Megyesi
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Exploring the Prosody-Syntax Interface in Conversations2003Ingår i: Proceeding of the 15th International Congress of Phonetic Sciences, 2003Konferensbidrag (Refereegranskat)
    Abstract [en]

    The goal of this study is to investigate the structuring of speech in terms of prosodic boundaries in spontaneous dialogues in Swedish. In particular, the relation between boundaries as percieved by listeners, and their acoustic and linguistic realizations as uttered by the speakers is examined.

  • 37. Mattias, Heldner
    et al.
    Beata, Megyesi
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    The Acoustic and Morpho-Syntactic Context of Prosodic Boundaries in Dialogs.2003Ingår i: Proceedings of Fonetik 2003, 2003Konferensbidrag (Refereegranskat)
    Abstract [en]

    This study investigates the structuring of speech in terms of prosodic boundaries. In particular, the relation between boundaries as perceived by the listeners, and their acoustic and linguistic realizations as uttered by speakers is examined.

  • 38.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Brill's PoS Tagger with Extended Lexical Templates for Hungarian1999Ingår i: Proceedings of the Workshop (W01) on Machine Learning in Human Language Technology: ACAI'99, 1999, s. 22-28Konferensbidrag (Refereegranskat)
  • 39.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish2001Ingår i: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2001), 2001Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of this study is a systematic evaluation and comparison of four state-of-the-art data-driven learning algorithms applied to part of speech tagging of Swedish. The algorithms included in this study are Hidden Markov Model, Maximum Entropy, Memory-Based Learning, and Transformation-Based Learning. The systems are evaluated from several aspects. Both the effects of tag set and the effects of the size of training data are examined. The accuracy is calculated as well as the error rate for known and unknown tokens. The results show differences between the approaches due to the different linguistic information built into the systems.

  • 40.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Data-Driven Methods for PoS tagging and Chunking of Swedish2001Ingår i: In the Proceedings of the Nordic Conference on Computational Linguistics, Nodalida 2001, 2001Konferensbidrag (Övrig (populärvetenskap, debatt, mm))
    Abstract [en]

    In this paper well-known state-of-the-art data-driven algorithms are applied to

    part-of-speech tagging and shallow parsing of Swedish texts.

  • 41.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. datorlingvistik.
    Improving Brill's PoS Tagger for an Agglutinative Language1999Ingår i: Proceedings of the Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: EMNLP/VLC '99, 1999, s. 275-284Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper Brill's rule-based PoS tagger is tested and adapted for Hungarian. It is shown that the present system does not obtain as high accuracy for Hungarian as it does for English (and other Germanic languages) because of the structural difference between these languages. Hungarian, unlike English, has rich morphology, is agglutinative with some inflectional characteristics and has fairly free word order. The tagger has the greatest difficulties with parts-of-speech belonging to open classes because of their complicated morphological structure. It is shown that the accuracy of tagging can be increased from approximately 83% to 97% by simply changing the rule generating mechanisms, namely the lexical templates in the lexical training module.

  • 42. Megyesi, Beata
    Phrasal Parsing by Using Data-Driven PoS Taggers2001Ingår i: Proceedings of the Conference on Recent Advances in Natural Language Processing: Euro Conference RANLP-2001, 2001, s. 166-173Konferensbidrag (Refereegranskat)
    Abstract [en]

    Three data-driven algorithms are applied to shallow parsing of Swedish texts by using PoS taggers as the basis for parsing. The constituent structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to. The results show that best performance can be obtained by training on the basis of PoS tags with labels marking the phrasal constituents without considering the words themselves. Transformation-based learning gives highest accuracy (94.44%) followed by the Maximum Entropy framework (mxpost) (92.47%) and the Hidden Markov model (TnT) (92.42%).

  • 43.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Proceedings of the 20th Nordic Conference of Computational Linguistics2015Samlingsverk (redaktörskap) (Refereegranskat)
  • 44.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi. Datorlingvistik.
    Production and Perception of Pauses and their Linguistic Context in Read and Spontaneous Speech in Swedish.2002Ingår i: Proceedings of ICSLP'2002 - 7th International Conference on Spoken Language Processing, 2002Konferensbidrag (Refereegranskat)
    Abstract [en]

    We investigate the relationship between prosodic phrase boundaries in terms of pausing and the linguistic structure on morpho-syntactic and discourse levels in

    spontaneous dialogues as well as in read aloud speech in Swedish. Both the speakers' production and the listeners' perception of pausing are considered and mapped to the linguistic structure.

  • 45.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Shallow Parsing with PoS Taggers and Linguistic Features.2002Ingår i: Journal of Machine Learning Research: Special Issue on Shallow Parsing, Vol. 2, s. 639-668Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts. The phrase structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to in the parse tree. The encoding is based on the concatenation of the phrase tags on the path from lowest to higher nodes. Various linguistic features are used in learning; the taggers are trained on the basis of lexical information only, part-of-speech only, and a combination of both, to predict the phrase structure of the tokens with or without part-of-speech. Special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as the taggers' sensitivity to the size and the various types of training data sets. The method can be easily transferred to other languages.

  • 46.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    The Open Source Tagger HunPoS for Swedish.2009Ingår i: Proceedings of the 17th Nordic Conference on Computational Linguistics (NODALIDA), 2009Konferensbidrag (Refereegranskat)
    Abstract [en]

    HunPoS, a freely available open source

    part-of-speech tagger—a reimplementation

    of one of the best performing taggers,

    TnT—is applied to Swedish and evaluated

    when the tagger is trained on various sizes

    of training data. The tagger’s accuracy is

    compared to other data-driven taggers for

    Swedish. The results show that the tagging

    performance of HunPoS is as accurate as

    TnT and can be used efficiently to tag running

    text.

  • 47.
    Megyesi, Beata
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    The Open Source Tagger HunPoS for Swedish2008Rapport (Övrigt vetenskapligt)
    Abstract [en]

    HunPoS, a freely available open source part-of-speech tagger—a reimplementation of one of the best performing taggers, TnT—is applied to Swedish and evaluated when the tagger is trained on various sizes of training data. The tagger’s accuracy is compared to other data-driven taggers for Swedish. The results show that the tagging performance of HunPoS is as accurate as TnT and can be used efficiently to tag running text.

  • 48. Megyesi, Beata
    et al.
    Carlson, Rolf
    Data-Driven Methods for Building a Swedish Treebank.2002Ingår i: Swedish Treebank Symposium, 2002Konferensbidrag (Övrigt vetenskapligt)
  • 49.
    Megyesi, Beata
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Csató, Éva Ágnes
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Gustafson-Capková, Sofia
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Nivre, Joakim
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Pettersson, Eva
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Sågvall Hein, Anna
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Supporting Research Environment for Swedish and Turkish2008Rapport (Övrig (populärvetenskap, debatt, mm))
  • 50.
    Megyesi, Beata
    et al.
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Dahlqvist, Bengt
    Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
    Converting SUC2.0 to XCES with stand-off annotation2007Rapport (Övrig (populärvetenskap, debatt, mm))
123 1 - 50 av 119
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf