Digitala Vetenskapliga Arkivet

Change search
Refine search result
123 1 - 50 of 117
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahrenberg, Lars
    et al.
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.
    Megyesi, BeátaUppsala universitet, Institutionen för lingvistik och filologi.
    Proceedings of the Workshop on NLP and Pseudonymisation2019Conference proceedings (editor) (Refereed)
    Download full text (pdf)
    FULLTEXT01
  • 2.
    Ahrenberg, Lars
    et al.
    Linköping University, Sweden.
    Megyesi, BeátaUppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Proceedings of the Workshop on NLP and Pseudonymisation2019Conference proceedings (editor) (Refereed)
    Download full text (pdf)
    fulltext
  • 3. Alemu, Atelach
    et al.
    Hulth, Anette
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    General-Purpose Text Categorization Applied to the Medical Domain.2007Report (Other academic)
    Abstract [en]

    This paper presents work where a general-purpose text categorization method was applied to categorize medical free-texts. The purpose of the experiments was to examine how such a method performs without any domain-specific knowledge, hand-crafting or tuning. Additionally, we compare the results from the general-purpose method with results from runs in which a medical thesaurus as well as automatically extracted keywords were used when building the classifiers. We show that standard text categorization techniques using stemmed unigrams as the basis for learning can be applied directly to categorize medical reports, yielding an F-measure of 83.9, and outperforming the more sophisticated methods.

  • 4.
    Andréasson, Maia
    et al.
    Department of Swedish Language, University of Gothenburg.
    Borin, Lars
    Department of Swedish Language, University of Gothenburg.
    Forsberg, Markus
    Department of Swedish Language, University of Gothenburg.
    Beskow, Jonas
    School of Computer Science and Communication, KTH.
    Carlsson, Rolf
    School of Computer Science and Communication, KTH.
    Edlund, Jens
    School of Computer Science and Communication, KTH.
    Elenius, Kjell
    School of Computer Science and Communication, KTH.
    Hellmer, Kahl
    School of Computer Science and Communication, KTH.
    House, David
    School of Computer Science and Communication, KTH.
    Merkel, Magnus
    Department of Computer Science, Linköping University.
    Forsbom, Eva
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Eriksson, Anders
    Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg.
    Strömqvist, Sven
    Centre for Languages and Literature, Lund University.
    Swedish CLARIN Activities2009In: Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources / [ed] Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard, Eiríkur Rögnvaldsson and Koenraad de Smedt, Northern European Association for Language Technology (NEALT) , 2009, p. 1-5Conference paper (Refereed)
    Abstract [en]

    Although Sweden has yet to allocate funds specifically intended for CLARIN activities, there are some ongoing activities which are directly relevant to CLARIN, and which are explicitly linked to CLARIN. These activities have been funded by the Committee for Research Infrastructures and its subcommittee DISC (Database Infrastructure Committee) of the Swedish Research Council.

  • 5.
    Baró, Arnau
    et al.
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
    Chen, Jialuo
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
    Fornés, Alicia
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts2019In: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage: DATeCH2019, New York: ACM , 2019Conference paper (Refereed)
    Abstract [en]

    Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an unsupervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods.

  • 6. Bethelsen, Harald
    et al.
    Megyesi, Beata
    Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora2000In: Proceedings of the Third International Workshop on TEXT, SPEECH and DIALOGUE, 2000, p. 27-32Conference paper (Refereed)
    Abstract [en]

    In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.

  • 7.
    Borin, Lars
    et al.
    University of Gothenburg.
    Tahmasebi, Nina
    University of Gothenburg.
    Volodina, Elena
    University of Gothenburg.
    Ekman, Stefan
    Swedish National Data Service, University of Gothenburg.
    Jordan, Caspar
    Swedish National Data Service, University of Gothenburg.
    Viklund, Jon
    Uppsala University.
    Megyesi, Beáta
    Uppsala University.
    Näsman, Jesper
    Uppsala University.
    Palmér, Anne
    Uppsala University.
    Wirén, Mats
    Stockholm University.
    Björkenstam, Kristina
    Stockholm University.
    Grigonytė, Gintaré
    Stockholm University.
    Gustafson Capková, Sofia
    Stockholm University.
    Kosiński, Tomasz
    Chalmers University of Technology.
    Swe-Clarin: Language Resources and Technology for Digital Humanities2016In: Extended Papers of the International Symposium on Digital Humanities, CEUR , 2016, Vol. 2021, p. 29-51Conference paper (Refereed)
    Abstract [en]

    CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an eresearch tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council.

    In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes.

    While the pilot projects clearly make substantial contributions to DH, they also reveal some needs for more development, and in particular a need for document-level access to the text materials. As a consequence of this, work has now been initiated in Swe-Clarin to meet this need, so that Swe-Clarin together with HSS scholars investigating intricate research questions can take on the methodological challenges of big-data language-based digital humanities.

    Download full text (pdf)
    FULLTEXT01
  • 8.
    Borin, Lars
    et al.
    Språkbanken, Department of Swedish, University of Gothenburg.
    Tahmasebi, Nina
    Språkbanken, Department of Swedish, University of Gothenburg.
    Volodina, Elena
    Språkbanken, Department of Swedish, University of Gothenburg.
    Ekman, Stefan
    Swedish National Data Service, University of Gothenburg.
    Jordan, Caspar
    Swedish National Data Service, University of Gothenburg.
    Viklund, Jon
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Literature.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Näsman, Jesper
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Palmér, Anne
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Scandinavian Languages.
    Wirén, Mats
    Department of Linguistics, Stockholm University.
    Björkenstam, Kristina N.
    Department of Linguistics, Stockholm University.
    Grigonytė, Gintaré
    Department of Linguistics, Stockholm University.
    Gustafson Capková, Sofia
    Department of Linguistics, Stockholm University.
    Kosiński, Tomasz
    Department of Applied IT, Chalmers University of Technology.
    Swe-Clarin: Language Resources and Technology for Digital Humanities2016In: Digital Humanities 2016: Extended Papers of the International Symposium on Digital Humanities (DH 2016), Växjö, Sweden, November, 7-8, 2016 / [ed] Koraljka Golub; Marcelo Milrad, 2016, p. 29-51Conference paper (Refereed)
    Abstract [en]

    CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an eresearch tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council.

    In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes.

    While the pilot projects clearly make substantial contributions to DH, they also reveal some needs for more development, and in particular a need for document-level access to the text materials. As a consequence of this, work has now been initiated in Swe-Clarin to meet this need, so that Swe-Clarin together with HSS scholars investigating intricate research questions can take on the methodological challenges of big-data language-based digital humanities.

    Download full text (pdf)
    fulltext
  • 9. Carlson, Rolf
    et al.
    Granström, Björn
    Heldner, Mattias
    House, David
    Megyesi, Beata
    Strangert, Eva
    Swerts, Mark
    Boundaries and groupings - the structuring of speech in different communicative situations: a description of the GROG project2002In: Proceedings of Fonetik 2002, 2002Conference paper (Refereed)
    Abstract [en]

    The goal of the project is to model the prosodic structuring of speech in terms of boundaries and groupings. The modeling will include different communicative situations and be based on existing as well as new speech corpora. Production and perception studies will be used in parallel with automatic methods developed for analysis, modeling and prediction of prosody. The model will be perceptually evaluated using synthetic speech.

  • 10.
    Chen, Jialuo
    et al.
    Computer Vision Center, Computer Science Department, Universitat Autonoma de Barcelona, Spain.
    Souibgui, Mohamed Ali
    Computer Vision Center, Computer Science Department, Universitat Autonoma de Barcelona, Spain.
    Fornes, Alicia
    Computer Vision Center, Computer Science Department, Universitat Autonoma de Barcelona, Spain.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images2021In: Proceedings of the 4th International Conference on Historical Cryptology HistoCrypt 2021 / [ed] Carola Dahlke, 2021Conference paper (Refereed)
    Abstract [en]

    Historical ciphers contain a wide range of symbols from various symbol sets. Identifying the cipher alphabet is a prerequisite before decryption can take place and is a time-consuming process. In this work we explore the use of image processing for identifying the underlying alphabet in cipher images, and to compare alphabets between ciphers. The experiments show that ciphers with similar alphabets can be successfully discovered through clustering.

    Download full text (pdf)
    fulltext
  • 11.
    Chen, Jialuo
    et al.
    Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona.
    Souibgui, Mohamed Ali
    Universitat Autònoma de Barcelona.
    Fornés, Alicia
    Universitat Autònoma de Barcelona.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    A Web-based Interactive Transcription Tool for Encrypted Manuscripts2020In: Proceedings of the 3rd International Conference on Historical Cryptology HistoCrypt 2020 / [ed] Beáta Megyesi, Linköping, 2020Conference paper (Refereed)
    Abstract [en]

    Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available 1 .

    Download full text (pdf)
    fulltext
  • 12.
    Csató, Éva Ágnes
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Dahlqvist, Bengt
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Saxena, Anju
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    A Turkish-Swedish parallel corpus: Orhan Pamuk Beyaz Kale-Vita Borgen2006Other (Refereed)
  • 13.
    Csató, Éva Ágnes
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kilimci, Songul
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Using Parallel Corpora in Data-Driven Teaching of Turkish in Sweden.2010Conference paper (Refereed)
    Abstract [en]

    The paper demonstrates how data-driven learning methods are applied in teaching Turkish as a foreign language at the Department of Linguistics and Philology, Uppsala University. In data-driven teaching, language corpora, concordance programs, and annotation tools developed in collaboration with computational linguists are employed. This paper illustrates how resources developed initially for research purposes in different subjects (such as Computational Linguistics, Linguistics, Turkic languages), are now being used in teaching environments.

    We present the Swedish-Turkish parallel corpus providing students and researchers with easily accessible annotated linguistic data. The web-based corpora can be used both by regular and distance students. They function also as learning tools for formulating and testing hypotheses concerning lexical, morphological and syntactic aspects of Turkish. Furthermore, they help the students to practice contrastive studies and translation between Swedish and Turkish.

  • 14. Dahlke, Carola
    et al.
    Megyesi, BeátaUppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Proceedings of the 5th International Conference on Historical Cryptology2022Conference proceedings (editor) (Refereed)
  • 15. Dahlqvist, Bengt
    et al.
    Megyesi, Beata
    Changing the tokenization in Talbanken to SUC2.02007Report (Other academic)
  • 16.
    Elenius, Kjell
    et al.
    Speech, Music and Hearing, KTH.
    Forsbom, Eva
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Language Resources and Tools for Swedish: A Survey2008In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Paris: European Language Resources Association (ELRA) , 2008Conference paper (Refereed)
    Abstract [en]

    Language resources and tools to create and process these resources are necessary components in human language technology and natural language applications. In this paper, we describe a survey of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on a questionnaire sent to industry and academia, institutions and organizations, and to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide.

  • 17.
    Elenius, Kjell
    et al.
    Speech, Music and Hearing.
    Forsbom, Eva
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Survey on Swedish Language Resources2008Report (Other academic)
    Abstract [en]

    Language resources, such as lexicons, databases, dictionaries, corpora, and tools to create and process these resources are necessary components in human language technology and natural language applications. In this survey, we describe the inventory process and the results of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on an investigation sent to industry and academia, institutions and organizations, to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide. This study is a result of the project called “An Infrastructure for Swedish language technology” supported by the Swedish Research Council´s Committee for Research Infrastructures 2007 - 2008.

  • 18.
    Fornes, Alicia
    et al.
    Computer Vision Center, Universitat Autònoma de Barcelona, Spain.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Mas, Joan
    Computer Vision Center, Universitat Autònoma de Barcelona, Spain.
    Transcription of Encoded Manuscripts with Image Processing Techniques2017In: Proceedings of Digital Humanities 2017., Canada, 2017Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 19. Gambardell, Maria Elena
    et al.
    Pettersson, Eva
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Identifying Cleartext in Historical Ciphers2022In: Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022. / [ed] Rachele Sprugnoli and Marco Passarotti, 2022Conference paper (Refereed)
    Abstract [en]

    In historical encrypted sources we can find encrypted text sequences, also called ciphertext, as well as non-encrypted cleartexts written in a known language. While most of the cryptanalysis focuses on the decryption of ciphertext, cleartext is often overlooked although it can give us important clues about the historical interpretation and contextualisation of the manuscript. In this paper, we investigate to what extent we can automatically distinguish cleartext from ciphertext in historical ciphers and to what extent we are able to identify its language. The problem is challenging as cleartext sequences in ciphers are often short, up to a few words, in different languages due to historical code-switching. To identify the sequences and the language(s), we chose a rule-based approach and run 7 different models using historical language models on various ciphertexts.

  • 20.
    Goio, García
    et al.
    Universitat Auto`noma de Barcelona, Spain.
    Torras, Pau
    Universitat Auto`noma de Barcelona, Spain.
    Fornés, Alicia
    Universitat Auto`noma de Barcelona, Spain.
    Megyesi, Beáta
    Exploring the Alignment of Transcriptions to Images of Encrypted Manuscripts2024In: Proceedings of the 7th International Conference on Historical Cryptology (HistoCrypt 2024), Tartu University Library , 2024Conference paper (Refereed)
    Abstract [en]

    The automatic transcription of encrypted manuscripts is a challenge due to the different handwriting styles and the often invented symbol alphabets. Many transcription methods require annotated sources, including symbol locations. However, most existing transcriptions are provided at line or page level, making it necessary to find the bounding boxes of the transcribed symbols in the image, a process referred to as alignment. So, in this work, we develop several alignment methods, and discuss their performance on encrypted documents with various symbol sets.

  • 21. Gustafson-Capkova, Sofia
    et al.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    A Comparative Study of Pauses in Dialogues and Read Speech.2001In: Proceedings of Eurospeech 2001, 2001, p. 931-935Conference paper (Refereed)
    Abstract [en]

    This study aims to investigate the length, frequency and position of various types of pauses in three different speaking styles: elicited spontaneous dialogues, professional reading and non-professional reading.

  • 22. Gustafson-Capkova, Sofia
    et al.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Silence and Discourse Context in Read Speech and Dialogues in Swedish2002In: Proceedings of the Speech Prosody 2002 conference, 2002, p. 363-366Conference paper (Refereed)
    Abstract [en]

    In this study, we investigate the correlation between silent pauses and discourse boundaries in the notion of theme shift. We examine three speaking styles in Swedish: professional and non-professional reading, and elicited spontaneous dialogues. Considerable attention is given to the syntactic and discourse context in which pauses appear, as well as the characteristics of the discourse structure in terms of pauses.

  • 23. Hall, Johan
    et al.
    Nilsson, Jens
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    Eryigit, Gulsen
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    Nisson, Mattias
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. datorlingvistik.
    Saers, Markus
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. datorlingvistik.
    Single Malt or Blended? A Study in Multilingual Parser Optimization.2007In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 2007Conference paper (Refereed)
  • 24.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Megyesi, Beáta
    KTH Speech, Music and Hearing.
    Exploring the prosody-syntax interface in conversations2003In: ceedings ICPhS 2003, Barcelona, Spain: ICPhS , 2003, p. 2501-2504Conference paper (Refereed)
  • 25. Hulth, Anette
    et al.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    A Study on Automatically Extracted Keywords in Text Categorization2006In: Proceedings of International Conference of Association for Computational Linguistics, 2006Conference paper (Refereed)
    Abstract [en]

    This paper presents a study on if and how automatically extracted

    keywords can be used to improve text categorization. In summary we

    show that a higher performance --- as measured by micro-averaged

    F-measure on a standard text categorization collection --- is achieved

    when the full-text representation is combined with the automatically

    extracted keywords. The combination is obtained by giving higher

    weights to words in the full-texts that are also extracted as

    keywords. We also present results for experiments in which the

    keywords are the only input to the categorizer, either represented as

    unigrams or intact. Of these two experiments, the unigrams have the

    best performance, although neither performs as well as headlines only.

  • 26.
    Héder, Mihály
    et al.
    Budapest University of Technology and Economics, Hungary.
    Fornés, Alicia
    Autonomous University of Barcelona, Spain.
    Kopal, Nils
    University of Siegen, Germany.
    Szigeti, Ferenc
    Budapest University of Technology and Economics, Hungary.
    Megyesi, Beáta
    Stockholm University.
    Supporting Historical Cryptology: The Decrypt Pipeline2024In: Proceedings of the 7th International Conference on Historical Cryptology (HistoCrypt 2024), Tartu, Estonia: Tartu University Library , 2024Conference paper (Refereed)
    Abstract [en]

    We present a set of resources and tools to support research and development in the field of historical cryptology. The tools aim to support transcription and decipherment of ciphertexts, developed to work together in a pipeline. It encompasses cataloging these documents into the Decode database, which houses ciphers dating from the 14th century to 1965, transcription using both manual and AI-assisted methods, cryptanalysis, and subsequent historical and linguistic analysis to contextualize decrypted content. The project encounters challenges with the accuracy of automated transcription technologies and the necessity for significant user involvement in the transcription and analysis processes. These insights highlight the critical balance between technological innovation and the indispensable input of domain expertise in advancing the field of historical cryptology.

  • 27. Héder, Mihály
    et al.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The DECODE Database of Historical Ciphers and Keys: Version 22022In: Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022. / [ed] Carola Dahlke and Beáta Megyesi, Linköping: Linköping University Electronic Press, 2022, p. 111-114Conference paper (Refereed)
    Abstract [en]

    We report recent developments of the DE-CODE database aimed for the system-atic collection and annotation of encryptedsources: ciphertexts, keys and related doc-uments. We released a new, more func-tional graphical user interface, revisedsome metadata features and enlarged thecollection and tripled its size.

  • 28. Ilinykh, Nikolai
    et al.
    Morger, FelixDannélls, DanaDobnik, SimonMegyesi, BeátaUppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.Nivre, JoakimUppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. RISE.
    Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)2023Conference proceedings (editor) (Refereed)
  • 29.
    Knight, Kevin
    et al.
    University of Southern California.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Schaefer, Christiane
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Copiale Cipher2011Conference paper (Refereed)
    Abstract [en]

    The Copiale cipher is a 105-page enciphered book dated 1866. We describe the features of the book and the method by which we deciphered it.

  • 30.
    Knight, Kevin
    et al.
    University of Southern California.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Schaefer, Christiane
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Secrets of the Copiale Cipher2011In: Research into Freemasonry and Fraternalism, ISSN 1757-2460, Vol. 2, no 2, p. 314-324Article in journal (Refereed)
    Abstract [en]

    The Copiale Cipher is a 105-page, hand-written encrypted manuscript from the mid-eighteenth century. Its code was cracked and the text was deciphered by using modern computational technology combined with philological methods. We describe the book, the features of the text, and give a brief summary of the method by which we deciphered it. Finally, we present the content and the secret society, namely the Oculists, who were hiding behind the cipher. 

  • 31. Lasry, George
    et al.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kopal, Nils
    Deciphering Papal Ciphers from the 16th to the 18th Century2021In: Cryptologia, ISSN 0161-1194, E-ISSN 1558-1586, Vol. 45, no 6, p. 479-540Article in journal (Refereed)
    Abstract [en]

    In Meister’s 1906 landmark study, “Die Geheimschrift im Dienste der päpstlichen Kurie von ihren Anfängen bis zum Ende des XVI Jahrhunderts”, the 16th Century papal cryptographic service is described as a vibrant, highly professional organization, at the forefront of the science of cryptography in the Late Renaissance. In his work from 1993, Alvarez concluded that by the 19th Century, “the reputation of papal cryptography, once so lustrous, has sadly faded.” However, until now, very little was known about the evolution of papal cryptography from the 16th to the 18th Century. In this article, we describe how we obtained a large collection of original papal ciphertexts from the Vatican archives, transcribed them, and how we were able to recover most of the keys, and to decipher the original plaintexts using novel cryptanalysis methods and the open-source e-learning CrypTool platform. The recovered keys and decipherments provide unique insights into papal cryptographic practices from the 16th to the 18th Century. The 16th Century is characterized by innovation and a high level of sophistication, with a primary focus on cryptographic security. From the 17th Century, only the simpler but also less secure forms of ciphers remain in use, and papal cryptography significantly lags behind other European states.

    Download full text (pdf)
    fulltext
  • 32. Láng, Benedek
    et al.
    Megyesi, Beáta
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    An STS analysis of a digital humanities collaboration: trading zones, boundary objects, and interactional expertise in the DECRYPT project2024In: Humanities and Social Sciences Communications, E-ISSN 2662-9992, Vol. 11, no 1, article id 618Article in journal (Refereed)
    Abstract [en]

     A widely shared recognition over the past decade is that the methodology and the basic concepts of science and technology studies (STS) can be used to analyze collaborations in the cross-disciplinary field of digital humanities (DH). The concepts of trading zones (Galison, 2010), boundary objects (Star and Griesemer, 1989), and interactional expertise (Collins and Evans, 2007) are particularly fruitful for describing projects in which researchers from massively different epistemic cultures (Knorr Cetina, 1999) are trying to develop a common language. The literature, however, primarily concentrates on examples where only two parties, historians and IT experts, work together. More exciting perspectives open up for analysis when more than two, more nuanced and different epistemic cultures seek a common language and common research goals. In the DECRYPT project funded by the Swedish Research Council, computational linguists, historians, computer scientists and AI experts, cryptologists, computer vision specialists, historical linguists, archivists, and philologists collaborate with strikingly different methodologies, publication patterns, and approaches. They develop and use common resources (including a database and a large collection of European historical texts) and tools (among others a code-breaking software, a hand-written text recognition tool for transcription), researching partly overlapping topics (handwritten historical ciphers and keys) to reach common goals. In this article, we aim to show how the STS concepts are illuminating when describing the mechanisms of the DECRYPT collaboration and shed some light on the best practices and challenges of a truly cross-disciplinary DH project.

  • 33. Magnifico, Giacomo
    et al.
    Megyesi, Beáta
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Souibgui, Mohamed Ali
    Universitat Autonoma de Barcelona, Spain.
    Chen, Jialuo
    Universitat Autonoma de Barcelona, Spain.
    Fornés, Alicia
    Universitat Autonoma de Barcelona, Spain.
    Lost in Transcription of Graphic Signs in Ciphers2022In: Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022 / [ed] Carola Dahlke and Beáta Megyesi, Linköping: Linköping University Electronic Press, 2022, p. 153-158Conference paper (Refereed)
    Abstract [en]

    Hand-written Text Recognition techniques withthe aim to automatically identify and transcribehand-written text have been applied to histor-ical sources including ciphers. In this paper,we compare the performance of two machinelearning architectures, an unsupervised methodbased on clustering and a deep learning methodwith few-shot learning. Both models are testedon seen and unseen data from historical cipherswith different symbol sets consisting of varioustypes of graphic signs. We compare the modelsand highlight their differences in performance,with their advantages and shortcomings.

  • 34. Mattias, Heldner
    et al.
    Beata, Megyesi
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Exploring the Prosody-Syntax Interface in Conversations2003In: Proceeding of the 15th International Congress of Phonetic Sciences, 2003Conference paper (Refereed)
    Abstract [en]

    The goal of this study is to investigate the structuring of speech in terms of prosodic boundaries in spontaneous dialogues in Swedish. In particular, the relation between boundaries as percieved by listeners, and their acoustic and linguistic realizations as uttered by the speakers is examined.

  • 35. Mattias, Heldner
    et al.
    Beata, Megyesi
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Acoustic and Morpho-Syntactic Context of Prosodic Boundaries in Dialogs.2003In: Proceedings of Fonetik 2003, 2003Conference paper (Refereed)
    Abstract [en]

    This study investigates the structuring of speech in terms of prosodic boundaries. In particular, the relation between boundaries as perceived by the listeners, and their acoustic and linguistic realizations as uttered by speakers is examined.

  • 36.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Brill's PoS Tagger with Extended Lexical Templates for Hungarian1999In: Proceedings of the Workshop (W01) on Machine Learning in Human Language Technology: ACAI'99, 1999, p. 22-28Conference paper (Refereed)
  • 37.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish2001In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2001), 2001Conference paper (Refereed)
    Abstract [en]

    The aim of this study is a systematic evaluation and comparison of four state-of-the-art data-driven learning algorithms applied to part of speech tagging of Swedish. The algorithms included in this study are Hidden Markov Model, Maximum Entropy, Memory-Based Learning, and Transformation-Based Learning. The systems are evaluated from several aspects. Both the effects of tag set and the effects of the size of training data are examined. The accuracy is calculated as well as the error rate for known and unknown tokens. The results show differences between the approaches due to the different linguistic information built into the systems.

  • 38.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Data-Driven Methods for PoS tagging and Chunking of Swedish2001In: In the Proceedings of the Nordic Conference on Computational Linguistics, Nodalida 2001, 2001Conference paper (Other (popular science, discussion, etc.))
    Abstract [en]

    In this paper well-known state-of-the-art data-driven algorithms are applied to

    part-of-speech tagging and shallow parsing of Swedish texts.

  • 39.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. datorlingvistik.
    Improving Brill's PoS Tagger for an Agglutinative Language1999In: Proceedings of the Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: EMNLP/VLC '99, 1999, p. 275-284Conference paper (Refereed)
    Abstract [en]

    In this paper Brill's rule-based PoS tagger is tested and adapted for Hungarian. It is shown that the present system does not obtain as high accuracy for Hungarian as it does for English (and other Germanic languages) because of the structural difference between these languages. Hungarian, unlike English, has rich morphology, is agglutinative with some inflectional characteristics and has fairly free word order. The tagger has the greatest difficulties with parts-of-speech belonging to open classes because of their complicated morphological structure. It is shown that the accuracy of tagging can be increased from approximately 83% to 97% by simply changing the rule generating mechanisms, namely the lexical templates in the lexical training module.

  • 40. Megyesi, Beata
    Phrasal Parsing by Using Data-Driven PoS Taggers2001In: Proceedings of the Conference on Recent Advances in Natural Language Processing: Euro Conference RANLP-2001, 2001, p. 166-173Conference paper (Refereed)
    Abstract [en]

    Three data-driven algorithms are applied to shallow parsing of Swedish texts by using PoS taggers as the basis for parsing. The constituent structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to. The results show that best performance can be obtained by training on the basis of PoS tags with labels marking the phrasal constituents without considering the words themselves. Transformation-based learning gives highest accuracy (94.44%) followed by the Maximum Entropy framework (mxpost) (92.47%) and the Hidden Markov model (TnT) (92.42%).

  • 41.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Proceedings of the 20th Nordic Conference of Computational Linguistics2015Collection (editor) (Refereed)
  • 42.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    Production and Perception of Pauses and their Linguistic Context in Read and Spontaneous Speech in Swedish.2002In: Proceedings of ICSLP'2002 - 7th International Conference on Spoken Language Processing, 2002Conference paper (Refereed)
    Abstract [en]

    We investigate the relationship between prosodic phrase boundaries in terms of pausing and the linguistic structure on morpho-syntactic and discourse levels in

    spontaneous dialogues as well as in read aloud speech in Swedish. Both the speakers' production and the listeners' perception of pausing are considered and mapped to the linguistic structure.

  • 43.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Shallow Parsing with PoS Taggers and Linguistic Features.2002In: Journal of Machine Learning Research: Special Issue on Shallow Parsing, Vol. 2, p. 639-668Article in journal (Refereed)
    Abstract [en]

    Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts. The phrase structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to in the parse tree. The encoding is based on the concatenation of the phrase tags on the path from lowest to higher nodes. Various linguistic features are used in learning; the taggers are trained on the basis of lexical information only, part-of-speech only, and a combination of both, to predict the phrase structure of the tokens with or without part-of-speech. Special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as the taggers' sensitivity to the size and the various types of training data sets. The method can be easily transferred to other languages.

  • 44.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Open Source Tagger HunPoS for Swedish.2009In: Proceedings of the 17th Nordic Conference on Computational Linguistics (NODALIDA), 2009Conference paper (Refereed)
    Abstract [en]

    HunPoS, a freely available open source

    part-of-speech tagger—a reimplementation

    of one of the best performing taggers,

    TnT—is applied to Swedish and evaluated

    when the tagger is trained on various sizes

    of training data. The tagger’s accuracy is

    compared to other data-driven taggers for

    Swedish. The results show that the tagging

    performance of HunPoS is as accurate as

    TnT and can be used efficiently to tag running

    text.

  • 45.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Open Source Tagger HunPoS for Swedish2008Report (Other academic)
    Abstract [en]

    HunPoS, a freely available open source part-of-speech tagger—a reimplementation of one of the best performing taggers, TnT—is applied to Swedish and evaluated when the tagger is trained on various sizes of training data. The tagger’s accuracy is compared to other data-driven taggers for Swedish. The results show that the tagging performance of HunPoS is as accurate as TnT and can be used efficiently to tag running text.

  • 46. Megyesi, Beata
    et al.
    Carlson, Rolf
    Data-Driven Methods for Building a Swedish Treebank.2002In: Swedish Treebank Symposium, 2002Conference paper (Other academic)
  • 47.
    Megyesi, Beata
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Csató, Éva Ágnes
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Dahlqvist, Bengt
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Gustafson-Capková, Sofia
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Pettersson, Eva
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Supporting Research Environment for Swedish and Turkish2008Report (Other (popular science, discussion, etc.))
  • 48.
    Megyesi, Beata
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Dahlqvist, Bengt
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Converting SUC2.0 to XCES with stand-off annotation2007Report (Other (popular science, discussion, etc.))
  • 49.
    Megyesi, Beata
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    Dahlqvist, Bengt
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    The Swedish-Turkish Parallel Corpus and Tools for its Creation2007In: Proceedings of NoDaLida 2007, 2007Conference paper (Refereed)
  • 50.
    Megyesi, Beata
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Dahlqvist, Bengt
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Pettersson, Eva
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Gustafson-Capkova, Sofia
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Supporting Research Environment for Less Explored Languages: A Case Study of Swedish and Turkish2008In: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein / [ed] Nivre, Joakim, Dahllöf, Mats, Megyesi, Beáta, Uppsala: Uppsala universitet, 2008, p. 96-110Chapter in book (Other academic)
123 1 - 50 of 117
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf