Digitala Vetenskapliga Arkivet

Change search
Refine search result
12 1 - 50 of 53
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Agić, Zeljko
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Merkler, Danijela
    Krek, Simon
    Dobrovoljc, Kaja
    Moze, Sara
    Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets2014In: Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants, 2014, p. 13-24Conference paper (Refereed)
  • 2. Ahrenberg, Lars
    et al.
    Merkel, Magnus
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Evaluation of LWA and UWA1999Report (Other academic)
  • 3.
    Callin, Jimmy
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hardmeier, Christian
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Part-of-Speech Driven Cross-Lingual Pronoun Prediction with Feed-Forward Neural Networks2015In: Proceedings of the Second Workshop on Discourse in Machine Translation (DiscoMT), Stroudsburg, PA: Association for Computational Linguistics, 2015, p. 59-64Conference paper (Refereed)
    Abstract [en]

    For some language pairs, pronoun translation is a discourse-driven task which requires information that lies beyond its local context. This motivates the task of predicting the correct pronoun given a source sentence and a target translation, where the translated pronouns have been replaced with placeholders. For cross-lingual pronoun prediction, we suggest a neural network-based model using preceding nouns and determiners as features for suggesting antecedent candidates. Our model scores on par with similar models while having a simpler architecture.

    Download full text (pdf)
    DiscoMTWhatelles
  • 4.
    Guillou, Liane
    et al.
    University of Edinburgh.
    Hardmeier, Christian
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Smith, Aaron
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Webber, Bonnie
    University of Edinburgh.
    ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT2014In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Paris: European Language Resources Association, 2014, p. 3191-3198Conference paper (Refereed)
    Abstract [en]

    We present ParCor, a parallel corpus of texts in which pronoun coreference – reduced coreference in which pronouns are used as referringexpressions – has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences inpronoun use between languages and ultimately for developing and testing informed Statistical Machine Translation systems aimed ataddressing the problem of pronoun coreference in translation. At present, the corpus consists of a collection of parallel English-Germandocuments from two different text genres: TED Talks (transcribed planned speech), and EU Bookshop publications (written text). Alldocuments in the corpus have been manually annotated with respect to the type and location of each pronoun and, where relevant, itsantecedent. We provide details of the texts that we selected, the guidelines and tools used to support annotation and some corpus statistics.The texts in the corpus have already been translated into many languages, and we plan to expand the corpus into these other languages, aswell as other genres, in the future.

    Download full text (pdf)
    LREC2014
  • 5.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nakov, Preslav
    Qatar Computing Research Institute.
    Stymne, Sara
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Versley, Yannick
    University of Heidelberg.
    Cettolo, Mauro
    Fondazione Bruno Kessler.
    Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation2015In: Proceedings of the Second Workshop on Discourse in Machine Translation (DiscoMT), Stroudsburg, PA: Association for Computational Linguistics, 2015, p. 1-16Conference paper (Other academic)
    Abstract [en]

    We describe the design, the evaluation setup, and the results of the DiscoMT 2015 shared task, which included two subtasks, relevant to both the machine translation (MT) and the discourse communities: (i) pronoun-focused translation, a practical MT task, and (ii) cross-lingual pronoun prediction, a classification task that requires no specific MT expertise and is interesting as a machine learning task in its own right. We focused on the English–French language pair, for which MT output is generally of high quality, but has visible issues with pronoun translation due to differences in the pronoun systems of the two languages. Six groups participated in the pronoun-focused translation task and eight groups in the cross-lingual pronoun prediction task.

    Download full text (pdf)
    DiscoMTSharedTask
  • 6.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Document-Wide Decoding for Phrase-Based Statistical Machine Translation2012In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, 2012, p. 1179-1190Conference paper (Refereed)
    Abstract [en]

    Independence between sentences is an assumption deeply entrenched in the models and algorithms used for statistical machine translation (SMT), particularly in the popular dynamic programming beam search decoding algorithm. This restriction is an obstacle to research on more sophisticated discourse-level models for SMT. We propose a stochastic local search decoding method for phrase-based SMT, which permits free document-wide dependencies in the models. We explore the stability and the search parameters of this method and demonstrate that it can be successfully used to optimise a document-level semantic language model.

    Download full text (pdf)
    EMNLP2012
  • 7.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tree Kernels for Machine Translation Quality Estimation2012In: Proceedings of the 7th Workshop on Statistical Machine Translation, Association for Computational Linguistics, 2012, p. 109-113Conference paper (Refereed)
    Abstract [en]

    This paper describes Uppsala University’s submissions to the Quality Estimation (QE) shared task at WMT 2012. We present a QE system based on Support Vector Machine regression, using a number of explicitly defined features extracted from the Machine Translation input, output and models in combination with tree kernels over constituency and dependency parse trees for the input and output sentences. We confirm earlier results suggesting that tree kernels can be a useful tool for QE system construction especially in the early stages of system design.

    Download full text (pdf)
    fulltext
  • 8.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Stymne, Sara
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation2013In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, 2013, p. 193-198Conference paper (Refereed)
    Abstract [en]

    We describe Docent, an open-source decoder for statistical machine translation that breaks with the usual sentence-by-sentence paradigm and translates complete documents as units. By taking translation to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and constitutes an essential infrastructure component in the quest for discourse-aware SMT models.

    Download full text (pdf)
    ACL2013Demo
  • 9.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Stymne, Sara
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Smith, Aaron
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Anaphora Models and Reordering for Phrase-Based SMT2014In: Proceedings of the Ninth Workshop on Statistical Machine Translation, Association for Computational Linguistics, 2014, p. 122-129Conference paper (Refereed)
    Abstract [en]

    We describe the Uppsala University systems for WMT14. We look at the integration of a model for translating pronominal anaphora and a syntactic dependency projection model for English–French. Furthermore, we investigate post-ordering and tunable POS distortion models for English–German.

    Download full text (pdf)
    WMT2014
  • 10.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction2013In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2013, p. 380-391Conference paper (Refereed)
    Abstract [en]

    This paper addresses the task of predicting the correct French translations of third-person subject pronouns in English discourse, a problem that is relevant as a prerequisite for machine translation and that requires anaphora resolution. We present an approach based on neural networks that models anaphoric links as latent variables and show that its performance is competitive with that of a system with separate anaphora resolution while not requiring any coreference-annotated training data. This demonstrates that the information contained in parallel bitexts can successfully be used to acquire knowledge about pronominal anaphora in an unsupervised way.

    Download full text (pdf)
    EMNLP2013
  • 11.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Translating Pronouns with Latent Anaphora Resolution2014Conference paper (Other academic)
    Abstract [en]

    We discuss the translation of anaphoric pronouns in statistical machine translation from English into French. Pronoun translation requires resolving the antecedents of the pronouns in the input, a classic discourse processing problem that is usually approached through supervised learning from manually annotated data. We cast cross-lingual pronoun prediction as a classification task and present a neural network architecture that incorporates the links between anaphors and potential antecedents as latent variables, allowing us to train the classifier on parallel text without explicit supervision for the anaphora resolver. We demonstrate that our approach works just as well for classification as using an external coreference resolver whereas its impact in a practical translation experiment is more limited.

    Download full text (pdf)
    MLNLP2014
  • 12.
    Hardmeier, Christian
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Saers, Markus
    Hong Kong University of Science and Technology.
    Federico, Marcello
    Fondazione Bruno Kessler, Trento.
    Prashant, Mathur
    Fondazione Bruno Kessler, Trento.
    The Uppsala-FBK systems at WMT 20112011In: Proceedings of the Sixth (6th) Workshop on Statistical Machine Translation, Association for Computational Linguistics, 2011, p. 372-378Conference paper (Refereed)
    Abstract [en]

    This paper presents our submissions to the shared translation task at WMT 2011. We created two largely independent systems for English-to-French and Haitian Creole-to-English translation to evaluate different features and components from our ongoing research on these language pairs. Key features of our systems include anaphora resolution, hierarchical lexical reordering, data selection for language modelling, linear transduction grammars for word alignment and syntax-based decoding with monolingual dependency information.

    Download full text (pdf)
    Hardmeier-etal-WMT2011.pdf
  • 13. Lonneke, van der Plas
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Manguin, Jean-Luc
    Automatic acquisition of synonyms for French using parallel corpora2010In: Proceedings of the 4th International Workshop on Distributed Agent-Based Retrieval Tools, 2010Conference paper (Refereed)
    Abstract [en]

    In this paper we describe an approach to acquire synonyms for French automatically that is easy to port across domains and across languages. The approach relies on automatic word alignments in parallel texts and uses distributional methods to compute the semantic similarity of words based on these word alignments. As a result the system outputs ranked lists of candidate synonyms for a given word. We compare the performance of the system with a system that uses syntactic contexts to acquire synonyms automatically. Evaluations are done on a large-scale French synonym dictionary. We show that the alignment-based method outperforms the syntactic method by a large margin. In addition we show that the method can easily be ported to a different language and to a different domain.

  • 14. Martinez Garcia, Eva
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    España-Bonet, Cristina
    Màrquez, Lluís
    Word’s Vector Representations meet Machine Translation2014In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014, p. 132-134Conference paper (Refereed)
  • 15. Nakov, Preslav
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages2012In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2012, p. 301-305Conference paper (Refereed)
  • 16.
    Pettersson, Eva
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    An SMT Approach to Automatic Annotation of Historical Texts2013In: Workshop on Computational Historical Linguistics, Nodalida 2013., 2013Conference paper (Refereed)
    Abstract [en]

    In this paper we propose an approach to tagging and parsing of historical text, using characterbased

    SMT methods for translating the historical spelling to a modern spelling before applying

    the NLP tools. This way, existing modern taggers and parsers may be used to analyse historical

    text instead of training new tools specialised in historical language, which might be hard

    considering the lack of linguistically annotated historical corpora. We show that our approach

    to spelling normalisation is successful even with small amounts of training data, and that

    it is generalisable to several languages. For the two languages presented in this paper, the

    proportion of tokens with a spelling identical to the modern gold standard spelling increases

    from 64.8% to 83.9%, and from 64.6% to 92.3% respectively, which has a positive impact on

    subsequent tagging and parsing using modern tools.

  • 17. Shao, Yan
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Machine Transliteration of Names from Different Language Origins into Chinese2014Other (Other academic)
  • 18.
    Shao, Yan
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Boosting English-Chinese Machine Transliteration via High Quality Alignment and Multilingual Resources2015In: Proceedings of the Fifth Named Entity Workshop, Association for Computational Linguistics , 2015, p. 56-60Conference paper (Refereed)
    Abstract [en]

    This paper presents our machine transliteration systems developed for the NEWS 2015 machine transliteration shared task. Our systems are applied to two tasks: English to Chinese and Chinese to English. For standard runs, in which only official data sets are used, we build phrase-based transliteration models with refined alignments provided by the M2M-aligner. For non-standard runs, we add multilingual resources to the systems designed for the standard runs and build different language specific transliteration systems. Linear regression is adopted to rerank the outputs afterwards, which significantly improves the overall transliteration performance.

  • 19. Skadinš, Raivis
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Rozis, Roberts
    Deksne, Daiga
    Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus2014In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-2014), 2014, p. 1850-1855Conference paper (Refereed)
  • 20. Smith, Aaron
    et al.
    Hardmeier, Christian
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    BLEU Is Not the Colour: How Optimising BLEU Reduces Translation Quality2014Other (Other academic)
  • 21.
    Stymne, Sara
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hardmeier, Christian
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Feature Weight Optimization for Discourse-Level SMT2013In: Proceedings of the Workshop on Discourse in Machine Translation (DiscoMT), Association for Computational Linguistics, 2013, p. 60-69Conference paper (Refereed)
    Abstract [en]

    We present an approach to feature weight optimization for document-level decoding. This is an essential task for enabling future development of discourse-level statistical machine translation, as it allows easy integration of discourse features in the decoding process. We extend the framework of sentence-level feature weight optimization to the document-level. We show experimentally that we can get competitive and relatively stable results when using a standard set of features, and that this framework also allows us to optimize document- level features, which can be used to model discourse phenomena.

    Download full text (pdf)
    DiscoMT2013
  • 22.
    Stymne, Sara
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hardmeier, Christian
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tunable Distortion Limits and Corpus Cleaning for SMT2013In: Proceedings of the Eighth Workshop on Statistical Machine Translation, Association for Computational Linguistics, 2013, p. 225-231Conference paper (Refereed)
    Abstract [en]

    We describe the Uppsala University system for WMT13, for English-to-German translation. We use the Docent decoder, a local search decoder that translates at the document level. We add tunable distortion limits, that is, soft constraints on the maximum distortion allowed, to Docent. We also investigate cleaning of the noisy Common Crawl corpus. We show that we can use alignment-based filtering for cleaning with good results. Finally we investigate effects of corpus selection for recasing.

    Download full text (pdf)
    WMT2013
  • 23.
    Stymne, Sara
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hardmeier, Christian
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Statistical Machine Translation with Readability Constraints2013In: Proceedings of the 19th Nordic Conference on Computational Linguistics (NODALIDA 2013), Linköping, Sweden: Linköping University Electronic Press, 2013, p. 375-386Conference paper (Refereed)
    Abstract [en]

    This paper presents experiments with document-level machine translation with readabilityconstraints. We describe the task of producing simplified translations from a given source withthe aim to optimize machine translation for specific target users such as language learners. Inour approach, we introduce global features that are known to affect readability into a document-level SMT decoding framework. We show that the decoder is capable of incorporating thosefeatures and that we can influence the readability of the output as measured by commonmetrics. This study presents the first attempt of jointly performing machine translation and textsimplification, which is demonstrated through the case of translating parliamentary texts fromEnglish to Swedish.

  • 24.
    Tan, Liling
    et al.
    Univ Saarland, D-66123 Saarbrucken, Germany..
    Zampieri, Marcos
    Univ Saarland, D-66123 Saarbrucken, Germany..
    Ljubesic, Nikola
    Univ Zagreb, Zagreb 41000, Croatia..
    Tiedemann, Jorg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection2014In: LREC 2014 - Ninth International Conference On Language Resources And Evaluation, 2014Conference paper (Refereed)
    Abstract [en]

    This paper presents the compilation of the DSL corpus collection created for the DSL (Discriminating Similar Languages) shared task to be held at the VarDial workshop at COLING 2014. The DSL corpus collection were merged from three comparable corpora to provide a suitable dataset for automatic classification to discriminate similar languages and language varieties. Along with the description of the DSL corpus collection we also present results of baseline discrimination experiments reporting performance of up to 87.4% accuracy.

  • 25.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Automatic Construction of Weighted String Similarity Measures.1999In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in NaturalLanguage Processing and Very Large Corpora (EMNLP/VLC-99), Association of Computational Linguistics, 75 Paterson Street, Suite 9, New Brunswick, NJ 08901 USA , 1999, p. 213-219Conference paper (Refereed)
    Abstract [en]

    String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are present

  • 26.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Bitext Alignment2011Book (Refereed)
    Abstract [en]

    This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques.

  • 27.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Book review of Markus Dickinson, Chris Brew and Detmar Meurers: Language and Computers2013In: Machine Translation, ISSN 0922-6567, E-ISSN 1573-0573, Vol. 27, no 3, p. 309-312Article in journal (Refereed)
  • 28.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Can bilingual word alignment improve monolingual phrasal term extraction?2001In: Terminology, ISSN 0929-9971, E-ISSN 1569-9994, Vol. 7, no 2, p. 199-215Article in journal (Refereed)
  • 29. Tiedemann, Jörg
    Character-based PSMT for Closely Related Languages2009In: Proceedings of 13th Annual Conference of the European Association for Machine Translation (EAMT’09), 2009, p. 12-19Conference paper (Refereed)
    Abstract [en]

    Translating unknown words between related languages using a character-based statistical machine translation model can be beneficial. In this paper, we describe a simple method to combine character-based models with standard word-based models to increase the coverage of a phrase-based SMT system. Using this approach, we can show a modest improvement when translating between Norwegian and Swedish. The potentials of applying character-based models to closely related languages is also illustrated by applying the character model on its own. The performance of such an approach is similar to the word-level baseline and closer to the reference in terms of string similarity.

  • 30.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Combining Clues for Word Alignment2003In: Proceedings of the 10th Conference of the European Chapter of the ACL (EACL03), ACL , 2003Conference paper (Refereed)
  • 31.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Extraction of Translation Equivalents from Parallel Corpora.1998In: Proceedings of the 11th Nordic Conference on Computational Linguistics, Center för Sprogteknologi and Department of Genral and Applied Lingusitcs (IAAS), University of Copenhagen, Njalsgade 80, DK-2300 Copenhagen S, Denmark , 1998, p. 120-128Conference paper (Refereed)
  • 32.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Improved Sentence Alignment for Movie Subtitles2007In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP’07), 2007Conference paper (Refereed)
  • 33.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing2014In: Computational Linguistics and Intelligent Text Processing, Cicling 2014, PT I, 2014, p. 102-112Conference paper (Refereed)
    Abstract [en]

    The inability of reliable text extraction from arbitrary documents is often an obstacle for large scale NLP based on resources crawled from the Web. One of the largest problems in the conversion of PDF documents is the detection of the boundaries of common textual units such as paragraphs, sentences and words. PDF is a file format optimized for printing and encapsulates a complete description of the layout of a document including text, fonts, graphics and so on. This paper describes a tool for extracting texts from arbitrary PDF files for the support of large-scale data-driven natural language processing. Our approach combines the benefits of several existing solutions for the conversion of PDF documents to plain text and adds a language-independent post-processing procedure that cleans the output for further linguistic processing. In particular, we use the PDF-rendering libraries pdfXtk, Apache Tika and Poppler in various configurations. From the output of these tools we recover proper boundaries using on-the-fly language models and language-independent extraction heuristics. In our research, we looked especially at publications from the European Union, which constitute a valuable multilingual resource, for example, for training statistical machine translation models. We use our tool for the conversion of a large multilingual database crawled from the EU bookshop with the aim of building parallel corpora. Our experiments show that our conversion software is capable of fixing various common issues leading to cleaner data sets in the end.

  • 34.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Linguistic databases1999In: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 25, no 1, p. 167-169Article, book review (Refereed)
  • 35.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    MatsLex - a Multilingual Lexical Database for Machine Translation2002In: Proceedings of the Third International Conference on Linguistic Resources and Evaluation (LREC 2002), Imprenta Papeleria San Rafael, S.L. - Gran Canaria, Spain , 2002, Vol. Vol VI, p. 1909-1912Conference paper (Refereed)
  • 36.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Parallel Corpora in Linköping, Uppsala and Göteborg (PLUG): The Corpus1999Report (Other academic)
  • 37.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Parallel Data, Tools and Interfaces in OPUS2012In: Workshop abstracts: eighth international conference on language resources and evaluation, 2012, p. 2214-2218Conference paper (Refereed)
    Abstract [en]

    This paper presents the current status of OPUS, a growing language resource of parallel corpora and related tools. The focus in OPUS is to provide freely available data sets in various formats together with basic annotation to be useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.

  • 38.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Predicting Translations in Context2001In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP), INCOMA Ltd., Shoumen, Bulgaria , 2001, p. 240-244Conference paper (Refereed)
  • 39.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Prospects and Trends in Data-Driven Machine Translation2008In: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein / [ed] Joakim Nivre and Mats Dahllöf and Beata Megyesi, Uppsala: Uppsala University , 2008Chapter in book (Refereed)
  • 40.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Rediscovering Annotation Projection for Cross-Lingual Parser Induction2014In: Proceedings of COLING 2014, 2014Conference paper (Refereed)
  • 41.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Synchronizing Translated Movie Subtitles2008In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’2008), 2008Conference paper (Refereed)
  • 42.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    The Use of Parallel Corpora in Monolingual Lexicography - How Word Alignment can identify Morphological and Semantic Relations2001In: Proceedings of the 6th Conference on Computational Lexicography and Corpus Research (COMPLEX), Centre of Corpus Linguistics, Department of English, University of Birmingham, UK , 2001, p. 143-152Conference paper (Refereed)
  • 43.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Uplug - A Modular Corpus Tool for Parallel Corpora2002In: Parallel corpora, parallel worlds: Selected papers from the symposium on parallel corpora at Uppsala University, Sweden / [ed] Lars Borin, Amsterdam - New York: Rodopi , 2002, p. 181-197Chapter in book (Refereed)
  • 44.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Uplug - A Modular Corpus Tool for Parallel Corpora1999Report (Other academic)
  • 45.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Word Alignment - Step by Step.1999In: Proceedings of the 12th Nordic Conference on Computational Linguistics, University of Trondheim/Norway, Department of Linguistic, Norwegian University of Science and Technology, N-7491 Trondheim, Norway , 1999, p. 216-227Conference paper (Refereed)
  • 46.
    Tiedemann, Jörg
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hansen, Dorte Haltrup
    Offersgaard, Lene
    Olsen, Sussi
    Zumpe, Matthias
    A Distributed Resource Repository for Cloud-Based Machine Translation2012In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), 2012, p. 2207-2213Conference paper (Refereed)
    Abstract [en]

    In this paper, we present the architecture of a distributed resource repository developed for collecting training data for building customized statistical machine translation systems. The repository is designed for the cloud-based translation service integrated in the Let'sMT! platform which is about to be launched to the public. The system includes important features such as automatic import and alignment of textual documents in a variety of formats, a flexible database for meta-information using modern key-value stores and a grid-based backend for running off-line processes. The entire system is very modular and supports highly distributed setups to enable a maximum of flexibility and scalability. The system uses secure connections and includes an effective permission management to ensure data integrity. In this paper, we also take a closer look at the task of sentence alignment. The process of alignment is extremely important for the success of translation models trained on the platform. Alignment decisions significantly influence the quality of SMT engines.

  • 47.
    Tiedemann, Jörg
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Mur, Jori
    Simple is Best: Experiments with Different Document Segmentation Strategies for Passage Retrieval2008In: Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering (IR4QA), 2008, p. 17-25Conference paper (Refereed)
  • 48.
    Tiedemann, Jörg
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nakov, Preslav
    Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets2013In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP’13), 2013, p. 676-684Conference paper (Refereed)
  • 49.
    Tiedemann, Jörg
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    van der Plas, Lonneke
    Villada Moirón, Begoña
    Bitexts as Semantic Mirrors2013Other (Other academic)
  • 50. Viereckel, Nina
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Identification of Idiomatic Expressions Using Parallel Subtitle Corpora2014Other (Other academic)
12 1 - 50 of 53
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf