Digitala Vetenskapliga Arkivet

Change search
Refine search result
1234567 51 - 100 of 2745
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 51.
    Al Dakkak, O.
    et al.
    Higher Institute of Applied Sciencenand Technology (HIAST).
    Ghneim, N.
    Higher Institute of Applied Sciencenand Technology (HIAST).
    Abou Zliekha, M.
    Damascus University/Faculty of Information Technology.
    Al Moubayed, Samer
    Damascus University/Faculty of Information Technology.
    Emotional Inclusion in An Arabic Text-To-Speech2005In: Proceedings of the 13th European Signal Processing Conference (EUSIPCO), Antalya, Turkey, 2005Conference paper (Refereed)
    Abstract [en]

    The goal of this paper is to present an emotional audio-visua lText to speech system for the Arabic Language. The system is based on two entities: un emotional audio text to speech system which generates speech depending on the input text and the desired emotion type, and un emotional Visual model which generates the talking heads, by forming the corresponding visemes. The phonemes to visemes mapping, and the emotion shaping use a 3-paramertic face model, based on the Abstract Muscle Model. We have thirteen viseme models and five emotions as parameters to the face model. The TTS produces the phonemes corresponding to the input text, the speech with the suitable prosody to include the prescribed emotion. In parallel the system generates the visemes and sends the controls to the facial model to get the animation of the talking head in real time.

  • 52.
    Al Dakkak, O.
    et al.
    HIAST, Damascus, Syria.
    Ghneim, N.
    HIAST, Damascus, Syria.
    Abou Zliekha, M.
    Damascus University.
    Al Moubayed, Samer
    Damascus University.
    Prosodic Feature Introduction and Emotion Incorporation in an Arabic TTS2006In: Proceedings of IEEE International Conference on Information and Communication Technologies, Damascus, Syria, 2006, p. 1317-1322Conference paper (Refereed)
    Abstract [en]

    Text-to-speech is a crucial part of many man-machine communication applications, such as phone booking and banking, vocal e-mail, and many other applications. In addition to many other applications concerning impaired persons, such as: reading machines for blinds, talking machines for persons with speech difficulties. However, the main drawback of most speech synthesizers in the talking machines, are their metallic sounds. In order to sound naturally, we have to incorporate prosodic features, as close as possible to natural prosody, this helps to improve the quality of the synthetic speech. Actual researches in the world are towards better "automatic prosody generation".

  • 53.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Prosodic Disambiguation in Spoken Systems Output2009In: Proceedings of Diaholmia'09: 2009 Workshop on the Semantics and Pragmatics of Dialogue / [ed] Jens Edlund, Joakim Gustafson, Anna Hjalmarsson, Gabriel Skantze, Stockholm, Sweden., 2009, p. 131-132Conference paper (Refereed)
    Abstract [en]

    This paper presents work on using prosody in the output of spoken dialogue systems to resolve possible structural ambiguity of output utterances. An algorithm is proposed to discover ambiguous parses of an utterance and to add prosodic disambiguation events to deliver the intended structure. By conducting a pilot experiment, the automatic prosodic grouping applied to ambiguous sentences shows the ability to deliver the intended interpretation of the sentences.

  • 54.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Towards rich multimodal behavior in spoken dialogues with embodied agents2013In: 4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Proceedings, IEEE Computer Society, 2013, p. 817-822Conference paper (Refereed)
    Abstract [en]

    Spoken dialogue frameworks have traditionally been designed to handle a single stream of data - the speech signal. Research on human-human communication has been providing large evidence and quantifying the effects and the importance of a multitude of other multimodal nonverbal signals that people use in their communication, that shape and regulate their interaction. Driven by findings from multimodal human spoken interaction, and the advancements of capture devices and robotics and animation technologies, new possibilities are rising for the development of multimodal human-machine interaction that is more affective, social, and engaging. In such face-to-face interaction scenarios, dialogue systems can have a large set of signals at their disposal to infer context and enhance and regulate the interaction through the generation of verbal and nonverbal facial signals. This paper summarizes several design decision, and experiments that we have followed in attempts to build rich and fluent multimodal interactive systems using a newly developed hybrid robotic head called Furhat, and discuss issues and challenges that this effort is facing.

  • 55.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A robotic head using projected animated faces2011In: Proceedings of the International Conference on Audio-Visual Speech Processing 2011 / [ed] Salvi, G.; Beskow, J.; Engwall, O.; Al Moubayed, S., Stockholm: KTH Royal Institute of Technology, 2011, p. 71-Conference paper (Refereed)
    Abstract [en]

    This paper presents a setup which employs virtual animatedagents for robotic heads. The system uses a laser projector toproject animated faces onto a three dimensional face mask. This approach of projecting animated faces onto a three dimensional head surface as an alternative to using flat, two dimensional surfaces, eliminates several deteriorating effects and illusions that come with flat surfaces for interaction purposes, such as exclusive mutual gaze and situated and multi-partner dialogues. In addition to that, it provides robotic heads with a flexible solution for facial animation which takes into advantage the advancements of facial animation using computer graphics overmechanically controlled heads.

  • 56.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ananthakrishnan, Gopal
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Acoustic-to-Articulatory Inversion based on Local Regression2010In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Japan, 2010, p. 937-940Conference paper (Refereed)
    Abstract [en]

    This paper presents an Acoustic-to-Articulatory inversionmethod based on local regression. Two types of local regression,a non-parametric and a local linear regression have beenapplied on a corpus containing simultaneous recordings of positionsof articulators and the corresponding acoustics. A maximumlikelihood trajectory smoothing using the estimated dynamicsof the articulators is also applied on the regression estimates.The average root mean square error in estimating articulatorypositions, given the acoustics, is 1.56 mm for the nonparametricregression and 1.52 mm for the local linear regression.The local linear regression is found to perform significantlybetter than regression using Gaussian Mixture Modelsusing the same acoustic and articulatory features.

  • 57.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Ananthakrishnan, Gopal
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Enflo, Laura
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Automatic Prominence Classification in Swedish2010In: Proceedings of Speech Prosody 2010, Workshop on Prosodic Prominence, Chicago, USA, 2010Conference paper (Refereed)
    Abstract [en]

    This study aims at automatically classifying levels of acoustic prominence on a dataset of 200 Swedish sentences of read speech by one male native speaker. Each word in the sentences was categorized by four speech experts into one of three groups depending on the level of prominence perceived. Six acoustic features at a syllable level and seven features at a word level were used. Two machine learning algorithms, namely Support Vector Machines (SVM) and memory based Learning (MBL) were trained to classify the sentences into their respective classes. The MBL gave an average word level accuracy of 69.08% and the SVM gave an average accuracy of 65.17 % on the test set. These values were comparable with the average accuracy of the human annotators with respect to the average annotations. In this study, word duration was found to be the most important feature required for classifying prominence in Swedish read speech

  • 58.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A novel Skype interface using SynFace for virtual speech reading support2011In: Proceedings from Fonetik 2011, June 8 - June 10, 2011: Speech, Music and Hearing, Quarterly Progress and Status Report, TMH-OPSR, Volume 51, 2011, Stockholm, Sweden, 2011, p. 33-36Conference paper (Other academic)
    Abstract [en]

    We describe in this paper a support client interface to the IP telephony application Skype. The system uses a variant of SynFace, a real-time speech reading support system using facial animation. The new interface is designed for the use by elderly persons, and tailored for use in systems supporting touch screens. The SynFace real-time facial animation system has previously shown ability to enhance speech comprehension for the hearing impaired persons. In this study weemploy at-home field studies on five subjects in the EU project MonAMI. We presentinsights from interviews with the test subjects on the advantages of the system, and onthe limitations of such a technology of real-time speech reading to reach the homesof elderly and the hard of hearing.

  • 59.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Effects of Visual Prominence Cues on Speech Intelligibility2009In: Proceedings of Auditory-Visual Speech Processing AVSP'09, Norwich, England, 2009Conference paper (Refereed)
    Abstract [en]

    This study reports experimental results on the effect of visual prominence, presented as gestures, on speech intelligibility. 30 acoustically vocoded sentences, permutated into different gestural conditions were presented audio-visually to 12 subjects. The analysis of correct word recognition shows a significant increase in intelligibility when focally-accented (prominent) words are supplemented with head-nods or with eye-brow raise gestures. The paper also examines coupling other acoustic phenomena to brow-raise gestures. As a result, the paper introduces new evidence on the ability of the non-verbal movements in the visual modality to support audio-visual speech perception.

  • 60.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Perception of Nonverbal Gestures of Prominence in Visual Speech Animation2010In: Proceedings of the ACM/SSPNET 2nd International Symposium on Facial Analysis and Animation, Edinburgh, UK, 2010, p. 25-Conference paper (Refereed)
    Abstract [en]

    It has long been recognized that visual speech information is important for speech perception [McGurk and MacDonald 1976] [Summerfield 1992]. Recently there has been an increasing interest in the verbal and non-verbal interaction between the visual and the acoustic modalities from production and perception perspectives. One of the prosodic phenomena which attracts much focus is prominence. Prominence is defined as when a linguistic segment is made salient in its context.

  • 61.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Prominence Detection in Swedish Using Syllable Correlates2010In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Japan, 2010, p. 1784-1787Conference paper (Refereed)
    Abstract [en]

    This paper presents an approach to estimating word level prominence in Swedish using syllable level features. The paper discusses the mismatch problem of annotations between word level perceptual prominence and its acoustic correlates, context, and data scarcity. 200 sentences are annotated by 4 speech experts with prominence on 3 levels. A linear model for feature extraction is proposed on a syllable level features, and weights of these features are optimized to match word level annotations. We show that using syllable level features and estimating weights for the acoustic correlates to minimize the word level estimation error gives better detection accuracy compared to word level features, and that both features exceed the baseline accuracy.

  • 62.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Blomberg, Mats
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Mirning, N.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Talking with Furhat - multi-party interaction with a back-projected robot head2012In: Proceedings of Fonetik 2012, Gothenberg, Sweden, 2012, p. 109-112Conference paper (Other academic)
    Abstract [en]

    This is a condensed presentation of some recent work on a back-projected robotic head for multi-party interaction in public settings. We will describe some of the design strategies and give some preliminary analysis of an interaction database collected at the Robotville exhibition at the London Science Museum

  • 63.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Animated Faces for Robotic Heads: Gaze and Beyond2011In: Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues / [ed] Anna Esposito, Alessandro Vinciarelli, Klára Vicsi, Catherine Pelachaud and Anton Nijholt, Springer Berlin/Heidelberg, 2011, p. 19-35Conference paper (Refereed)
    Abstract [en]

    We introduce an approach to using animated faces for robotics where a static physical object is used as a projection surface for an animation. The talking head is projected onto a 3D physical head model. In this chapter we discuss the different benefits this approach adds over mechanical heads. After that, we investigate a phenomenon commonly referred to as the Mona Lisa gaze effect. This effect results from the use of 2D surfaces to display 3D images and causes the gaze of a portrait to seemingly follow the observer no matter where it is viewed from. The experiment investigates the perception of gaze direction by observers. The analysis shows that the 3D model eliminates the effect, and provides an accurate perception of gaze direction. We discuss at the end the different requirements of gaze in interactive systems, and explore the different settings these findings give access to.

  • 64.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Auditory visual prominence From intelligibility to behavior2009In: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 3, no 4, p. 299-309Article in journal (Refereed)
    Abstract [en]

    Auditory prominence is defined as when an acoustic segment is made salient in its context. Prominence is one of the prosodic functions that has been shown to be strongly correlated with facial movements. In this work, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study, a speech intelligibility experiment is conducted, speech quality is acoustically degraded and the fundamental frequency is removed from the signal, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrows raise gestures, which are synchronized with the auditory prominence. The experiment shows that presenting prominence as facial gestures significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a follow-up study examining the perception of the behavior of the talking heads when gestures are added over pitch accents. Using eye-gaze tracking technology and questionnaires on 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch accents opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and the understanding of the talking head.

  • 65.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Mirning, Nicole
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tscheligi, Manfred
    Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space2012In: Proc of LREC Workshop on Multimodal Corpora, Istanbul, Turkey, 2012Conference paper (Refereed)
    Abstract [en]

    In the four days of the Robotville exhibition at the London Science Museum, UK, during which the back-projected head Furhat in a situated spoken dialogue system was seen by almost 8 000 visitors, we collected a database of 10 000 utterances spoken to Furhat in situated interaction. The data collection is an example of a particular kind of corpus collection of human-machine dialogues in public spaces that has several interesting and specific characteristics, both with respect to the technical details of the collection and with respect to the resulting corpus contents. In this paper, we take the Furhat data collection as a starting point for a discussion of the motives for this type of data collection, its technical peculiarities and prerequisites, and the characteristics of the resulting corpus.

  • 66.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech2008In: Proceedings of The second Swedish Language Technology Conference (SLTC), Stockholm, Sweden., 2008, p. 3-6Conference paper (Other academic)
    Abstract [en]

    In this paper, we present new results and comparisons of the real-time lips synchronized talking head SynFace on different Swedish databases and bandwidth. The work involves training SynFace on narrow-band telephone speech from the Swedish SpeechDat, and on the narrow-band and wide-band Speecon corpus. Auditory perceptual tests are getting established for SynFace as an audio visual hearing support for the hearing-impaired. Preliminary results show high recognition accuracy compared to other languages.

  • 67.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Spontaneous spoken dialogues with the Furhat human-like robot head2014In: HRI '14 Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, Bielefeld, Germany, 2014, p. 326-Conference paper (Refereed)
    Abstract [en]

    We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is an anthropomorphic robot head that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations. The dialogue design is performed using the IrisTK [4] dialogue authoring toolkit developed at KTH. The system will also be able to perform a moderator in a quiz-game showing different strategies for regulating spoken situated interactions.

  • 68.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The Furhat Social Companion Talking Head2013In: Interspeech 2013 - Show and Tell, 2013, p. 747-749Conference paper (Refereed)
    Abstract [en]

    In this demonstrator we present the Furhat robot head. Furhat is a highly human-like robot head in terms of dynamics, thanks to its use of back-projected facial animation. Furhat also takes advantage of a complex and advanced dialogue toolkits designed to facilitate rich and fluent multimodal multiparty human-machine situated and spoken dialogue. The demonstrator will present a social dialogue system with Furhat that allows for several simultaneous interlocutors, and takes advantage of several verbal and nonverbal input signals such as speech input, real-time multi-face tracking, and facial analysis, and communicates with its users in a mixed initiative dialogue, using state of the art speech synthesis, with rich prosody, lip animated facial synthesis, eye and head movements, and gestures.

  • 69.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Öster, Anne-Marie
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    van Son, Nic
    Viataal, Nijmegen, The Netherlands.
    Ormel, Ellen
    Viataal, Nijmegen, The Netherlands.
    Herzke, Tobias
    HörTech gGmbH, Germany.
    Studies on Using the SynFace Talking Head for the Hearing Impaired2009In: Proceedings of Fonetik'09: The XXIIth Swedish Phonetics Conference, June 10-12, 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, p. 140-143Conference paper (Other academic)
    Abstract [en]

    SynFace is a lip-synchronized talking agent which is optimized as a visual reading support for the hearing impaired. In this paper wepresent the large scale hearing impaired user studies carried out for three languages in the Hearing at Home project. The user tests focuson measuring the gain in Speech Reception Threshold in Noise and the effort scaling when using SynFace by hearing impaired people, where groups of hearing impaired subjects with different impairment levels from mild to severe and cochlear implants are tested. Preliminaryanalysis of the results does not show significant gain in SRT or in effort scaling. But looking at large cross-subject variability in both tests, it isclear that many subjects benefit from SynFace especially with speech with stereo babble.

  • 70.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Taming Mona Lisa: communicating gaze faithfully in 2D and 3D facial projections2012In: ACM Transactions on Interactive Intelligent Systems, ISSN 2160-6455, E-ISSN 2160-6463, Vol. 1, no 2, p. 25-, article id 11Article in journal (Refereed)
    Abstract [en]

    The perception of gaze plays a crucial role in human-human interaction. Gaze has been shown to matter for a number of aspects of communication and dialogue, especially for managing the flow of the dialogue and participant attention, for deictic referencing, and for the communication of attitude. When developing embodied conversational agents (ECAs) and talking heads, modeling and delivering accurate gaze targets is crucial. Traditionally, systems communicating through talking heads have been displayed to the human conversant using 2D displays, such as flat monitors. This approach introduces severe limitations for an accurate communication of gaze since 2D displays are associated with several powerful effects and illusions, most importantly the Mona Lisa gaze effect, where the gaze of the projected head appears to follow the observer regardless of viewing angle. We describe the Mona Lisa gaze effect and its consequences in the interaction loop, and propose a new approach for displaying talking heads using a 3D projection surface (a physical model of a human head) as an alternative to the traditional flat surface projection. We investigate and compare the accuracy of the perception of gaze direction and the Mona Lisa gaze effect in 2D and 3D projection surfaces in a five subject gaze perception experiment. The experiment confirms that a 3Dprojection surface completely eliminates the Mona Lisa gaze effect and delivers very accurate gaze direction that is independent of the observer's viewing angle. Based on the data collected in this experiment, we rephrase the formulation of the Mona Lisa gaze effect. The data, when reinterpreted, confirms the predictions of the new model for both 2D and 3D projection surfaces. Finally, we discuss the requirements on different spatially interactive systems in terms of gaze direction, and propose new applications and experiments for interaction in a human-ECA and a human-robot settings made possible by this technology.

  • 71.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Analysis of gaze and speech patterns in three-party quiz game interaction2013In: Interspeech 2013, The International Speech Communication Association (ISCA), 2013, p. 1126-1130Conference paper (Refereed)
    Abstract [en]

    In order to understand and model the dynamics between interaction phenomena such as gaze and speech in face-to-face multiparty interaction between humans, we need large quantities of reliable, objective data of such interactions. To date, this type of data is in short supply. We present a data collection setup using automated, objective techniques in which we capture the gaze and speech patterns of triads deeply engaged in a high-stakes quiz game. The resulting corpus consists of five one-hour recordings, and is unique in that it makes use of three state-of-the-art gaze trackers (one per subject) in combination with a state-of-theart conical microphone array designed to capture roundtable meetings. Several video channels are also included. In this paper we present the obstacles we encountered and the possibilities afforded by a synchronised, reliable combination of large-scale multi-party speech and gaze data, and an overview of the first analyses of the data. Index Terms: multimodal corpus, multiparty dialogue, gaze patterns, multiparty gaze.

  • 72.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Effects of 2D and 3D Displays on Turn-taking Behavior in Multiparty Human-Computer Dialog2011In: SemDial 2011: Proceedings of the 15th Workshop on the Semantics and Pragmatics of Dialogue / [ed] Ron Artstein, Mark Core, David DeVault, Kallirroi Georgila, Elsi Kaiser, Amanda Stent, Los Angeles, CA, 2011, p. 192-193Conference paper (Refereed)
    Abstract [en]

    The perception of gaze from an animated agenton a 2D display has been shown to suffer fromthe Mona Lisa effect, which means that exclusive mutual gaze cannot be established if there is more than one observer. In this study, we investigate this effect when it comes to turntaking control in a multi-party human-computerdialog setting, where a 2D display is compared to a 3D projection. The results show that the 2D setting results in longer response times andlower turn-taking accuracy.

  • 73.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays2011In: Proceedings of the International Conference on Audio-Visual Speech Processing 2011, Stockholm: KTH Royal Institute of Technology, 2011, p. 99-102Conference paper (Refereed)
    Abstract [en]

    In a previous experiment we found that the perception of gazefrom an animated agent on a two-dimensional display suffersfrom the Mona Lisa effect, which means that exclusive mutual gaze cannot be established if there is more than one observer. By using a three-dimensional projection surface, this effect can be eliminated. In this study, we investigate whether this difference also holds for the turn-taking behaviour of subjects interacting with the animated agent in a multi-party dialogue. We present a Wizard-of-Oz experiment where five subjects talk toan animated agent in a route direction dialogue. The results show that the subjects to some extent can infer the intended target of the agent’s questions, in spite of the Mona Lisa effect, but that the accuracy of gaze when it comes to selecting an addressee is still significantly lower in the 2D condition, ascompared to the 3D condition. The response time is also significantly longer in the 2D condition, indicating that the inference of intended gaze may require additional cognitive efforts.

  • 74.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Lip-reading: Furhat audio visual intelligibility of a back projected animated face2012In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Berlin/Heidelberg, 2012, p. 196-203Conference paper (Refereed)
    Abstract [en]

    Back projecting a computer animated face, onto a three dimensional static physical model of a face, is a promising technology that is gaining ground as a solution to building situated, flexible and human-like robot heads. In this paper, we first briefly describe Furhat, a back projected robot head built for the purpose of multimodal multiparty human-machine interaction, and its benefits over virtual characters and robotic heads; and then motivate the need to investigating the contribution to speech intelligibility Furhat's face offers. We present an audio-visual speech intelligibility experiment, in which 10 subjects listened to short sentences with degraded speech signal. The experiment compares the gain in intelligibility between lip reading a face visualized on a 2D screen compared to a 3D back-projected face and from different viewing angles. The results show that the audio-visual speech intelligibility holds when the avatar is projected onto a static face model (in the case of Furhat), and even, rather surprisingly, exceeds it. This means that despite the movement limitations back projected animated face models bring about; their audio visual speech intelligibility is equal, or even higher, compared to the same models shown on flat displays. At the end of the paper we discuss several hypotheses on how to interpret the results, and motivate future investigations to better explore the characteristics of visual speech perception 3D projected faces.

  • 75.
    Alaei, Sahel
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    An Automated Discharge Summary System Built for Multiple Clinical English Texts by Pre-trained DistilBART Model2023Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The discharge summary is an important document, summarizing a patient’s medical information during their hospital stay. It is crucial for communication between clinicians and primary care physicians. Creating a discharge sum- mary is a necessary task. However, it is time-consuming for physicians. Using technology to automatically generate discharge summaries can be helpful for physicians and assist them in concentrating more on the patients than writing clinical summarization notes and discharge summaries. This master’s thesis aims to contribute to the research of building a transformer-based model for an automated discharge summary with a pre-trained DistilBART language model. This study plans to answer this main research question: How e↵ective is the pre-trained DistilBART language model in predicting an automated discharge summary for multiple clinical texts?

    The research strategy used in this study is experimental. the dataset is MIMIC- III. To evaluate the e↵ectiveness of the model, ROUGE scores are selected. The result of this model is compared with the result of the baseline BART model, which is implemented on the same dataset in the other recent research. This study regards multiple document summarization as the process of combining multiple inputs into a single input, which is then summarized. The findings indicate an improvement in ROUGE-2 and ROUGE-Lsum in the DistilBART model in comparison with the baseline BART model. However, one important limitation was computational resource constraint. The study also provides eth- ical considerations and some recommendations for future works.

    Download full text (pdf)
    FULLTEXT01
  • 76.
    Alagic, Adrian
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    EVALUATING THE SEMANTIC AND SYNTACTIC EFFECTS OF STANDARDIZED TEXTUAL LLM PROMPTS2024Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    With the recent emergence of Large language models (LLMs) such as ChatGPT, a new technical environment has been introduced to an abundance of new users. As LLMs are known for being dependant on proper input to produce desired responses, different approaches have been proposed to thwart the impact of poorly structured input. By investigating the semantic and syntactic effects of applying known textual processing techniques such as lemmatization, stop word removal and tokenization, this thesis aims to find out whether such standardization could be one of these approaches. To this end, a grading system of four aspects drawing inspiration from human communication was created and used on input prompts of varying shots. The result suggests that standardization, while not always detrimental, potentially has a negative effect on the semantic and syntactic integrity of output. However, it underscores the possibility of utilizing tailored standardization processes to achieve a certain quality without incurring the negative effects

    Download full text (pdf)
    fulltext
  • 77.
    Al-Azzawi, Sana
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Nilsson, Filip
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Adewumi, Tosin
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset2023In: 17th International Workshop on Semantic Evaluation, SemEval 2023: Proceedings of the Workshop, Association for Computational Linguistics, 2023, p. 1421-1427Conference paper (Refereed)
  • 78.
    Albertsson, Sarah
    et al.
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. SICS East Swedish ICT AB, Linköping, Sweden.
    Rennes, Evelina
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. SICS East Swedish ICT AB, Linköping, Sweden.
    Jönsson, Arne
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Arts and Sciences. SICS East Swedish ICT AB, Linköping, Sweden.
    Similarity-Based Alignment of Monolingual Corpora for Text Simplification2016In: CL4LC 2016 - Computational Linguistics for Linguistic Complexity: Proceedings of the Workshop, The COLING 2016 Organizing Committee , 2016, p. 154-163Conference paper (Refereed)
    Abstract [en]

    Comparable or parallel corpora are beneficial for many NLP tasks.  The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus.  The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was devel- oped, and the methods were compared.  For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.

    Download full text (pdf)
    fulltext
  • 79.
    Albinsson, Felicia
    et al.
    Jönköping University, School of Education and Communication, HLK, Practice Based Educational Research, Preschool Education Research.
    Mattsson, Rebecca
    Jönköping University, School of Education and Communication, HLK, Practice Based Educational Research, Preschool Education Research.
    Att arbeta aktivt med språkutveckling i förskolan för att främja barn med annat modersmål.: En kvalitativ intervjustudie om hur förskollärare  beskriver arbetet med språkutveckling2023Independent thesis Basic level (professional degree), 10 credits / 15 HE creditsStudent thesis
    Abstract [sv]

    Studiens ämne är språkutveckling med fokus på barn med annat modersmål. Där syftet med studien är att undersöka hur förskollärare beskriver sitt arbete med att främja språkutvecklingen hos de barn som har svenska som andraspråk. Studiens forskningsfrågor är: hur beskriver förskollärarna att de arbetar med barns språkutveckling? vilka metoder kan användas för att främja svenska som andraspråk enligt förskollärarna? Den teoretiska utgångspunkten för studien har varit det sociokulturella perspektivet. För att besvara syftet och frågeställningarna har sex förskollärare intervjuats med hjälp av kvalitativa metoder.   Resultatet visar att förskollärarna arbetar aktivt för att främja barn med ett annat modersmål, där olika arbetssätt och metoder framkom. Arbetssätten var främst TAKK och bildstöd. Polyglutt är ett annat verktyg som nämns av flera förskollärare, som används för att främja både modersmålet, samt andraspråket. I resultatet framkommer att förskollärarna uppmanar vårdnadshavare att tala modersmålet med barnen för att deras språkutveckling ska föras framåt, då modersmålet är en viktig tillgång i förskolans verksamhet. Modersmålet är en bidragande faktor vid inlärningen av ett andraspråk där ingen förskollärare anser inlärningen som negativ. Vikten av att vårdnadshavarna visar intresse och nyfikenhet i arbetet framkommer tydligt i både litteratur och i intervjuerna. 

    Download full text (pdf)
    fulltext
  • 80.
    Aleksandrova, Anastasiia
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. U.
    Exploring Language Descriptions through Vector Space Models2024Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The abundance of natural languages and the complexities involved in describingtheir structures pose significant challenges for modern linguists, not only in documentation but also in the systematic organization of knowledge. Computational linguisticstools hold promise in comprehending the “big picture”, provided existing grammars aredigitized and made available for analysis using state-of-the-art language models. Extensive efforts have been made by an international team of linguists to compile such aknowledge base, resulting in the DReaM corpus – a comprehensive dataset comprisingtens of thousands of digital books containing multilingual language descriptions.However, there remains a lack of tools that facilitate understanding of concise language structures and uncovering overlooked topics and dialects. This thesis representsa small step towards elucidating the broader picture by utilizing a subset of the DReaMcorpus as a vector space capable of capturing genetic ties among described languages.To achieve this, we explore several encoding algorithms in conjunction with varioussegmentation strategies and vector summarization approaches for generating bothmonolingual and cross-lingual feature representations of selected grammars in Englishand Russian.Our newly proposed sentence-facets TF-IDF model shows promise in unsupervisedgeneration of monolingual representations, conveying sufficient signal to differentiate historical linguistic relations among 484 languages from 26 language familiesbased on their descriptions. However, the construction of a cross-lingual vector spacenecessitates further exploration of advanced technologies.

    Download full text (pdf)
    fulltext
  • 81.
    Alemu Argaw, Atelach
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Asker, Lars
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Cöster, Rickard
    SICS.
    Karlgren, Jussi
    SICS.
    Sahlgren, Magnus
    SICS.
    Dictionary-based Amharic-French information retrieval2006In: Accessing multilingual information repositories: 6th workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, revised selected papers / [ed] Carol Peters, Fredric C. Gey, Julio Gonzalo, Henning Müller, Gareth J. F. Jones, Michael kluck, Bernardo Magnini, Maarten de Rijke, Berlin: Springer Berlin/Heidelberg, 2006, p. 83-92Conference paper (Other academic)
    Abstract [en]

    We present four approaches to the Amharic - French bilingual track at CLEF 2005. All experiments use a dictionary based approach to translate the Amharic queries into French Bags-of-words, but while one approach uses word sense discrimination on the translated side of the queries, the other one includes all senses of a translated word in the query for searching. We used two search engines: The SICS experimental engine and Lucene, hence four runs with the two approaches. Non-content bearing words were removed both before and after the dictionary lookup. TF/IDF values supplemented by a heuristic function was used to remove the stop words from the Amharic queries and two French stopwords lists were used to remove them from the French translations. In our experiments, we found that the SICS search engine performs better than Lucene and that using the word sense discriminated keywords produce a slightly better result than the full set of non discriminated keywords.

  • 82. Alemu, Atelach
    et al.
    Hulth, Anette
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    General-Purpose Text Categorization Applied to the Medical Domain.2007Report (Other academic)
    Abstract [en]

    This paper presents work where a general-purpose text categorization method was applied to categorize medical free-texts. The purpose of the experiments was to examine how such a method performs without any domain-specific knowledge, hand-crafting or tuning. Additionally, we compare the results from the general-purpose method with results from runs in which a medical thesaurus as well as automatically extracted keywords were used when building the classifiers. We show that standard text categorization techniques using stemmed unigrams as the basis for learning can be applied directly to categorize medical reports, yielding an F-measure of 83.9, and outperforming the more sophisticated methods.

  • 83.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions2014In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 28, no 2, p. 607-618Article in journal (Refereed)
    Abstract [en]

    In this paper we study the production and perception of speech in diverse conditions for the purposes of accurate, flexible and highly intelligible talking face animation. We recorded audio, video and facial motion capture data of a talker uttering a,set of 180 short sentences, under three conditions: normal speech (in quiet), Lombard speech (in noise), and whispering. We then produced an animated 3D avatar with similar shape and appearance as the original talker and used an error minimization procedure to drive the animated version of the talker in a way that matched the original performance as closely as possible. In a perceptual intelligibility study with degraded audio we then compared the animated talker against the real talker and the audio alone, in terms of audio-visual word recognition rate across the three different production conditions. We found that the visual intelligibility of the animated talker was on par with the real talker for the Lombard and whisper conditions. In addition we created two incongruent conditions where normal speech audio was paired with animated Lombard speech or whispering. When compared to the congruent normal speech condition, Lombard animation yields a significant increase in intelligibility, despite the AV-incongruence. In a separate evaluation, we gathered subjective opinions on the different animations, and found that some degree of incongruence was generally accepted.

  • 84.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Can Anybody Read Me? Motion Capture Recordings for an Adaptable Visual Speech Synthesizer2012In: In proceedings of The Listening Talker, Edinburgh, UK., 2012, p. 52-52Conference paper (Refereed)
  • 85.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Towards Fully Automated Motion Capture of Signs -- Development and Evaluation of a Key Word Signing Avatar2015In: ACM Transactions on Accessible Computing, ISSN 1936-7228, Vol. 7, no 2, p. 7:1-7:17Article in journal (Refereed)
    Abstract [en]

    Motion capture of signs provides unique challenges in the field of multimodal data collection. The dense packaging of visual information requires high fidelity and high bandwidth of the captured data. Even though marker-based optical motion capture provides many desirable features such as high accuracy, global fitting, and the ability to record body and face simultaneously, it is not widely used to record finger motion, especially not for articulated and syntactic motion such as signs. Instead, most signing avatar projects use costly instrumented gloves, which require long calibration procedures. In this article, we evaluate the data quality obtained from optical motion capture of isolated signs from Swedish sign language with a large number of low-cost cameras. We also present a novel dual-sensor approach to combine the data with low-cost, five-sensor instrumented gloves to provide a recording method with low manual postprocessing. Finally, we evaluate the collected data and the dual-sensor approach as transferred to a highly stylized avatar. The application of the avatar is a game-based environment for training Key Word Signing (KWS) as augmented and alternative communication (AAC), intended for children with communication disabilities.

  • 86.
    Alexanderson, Simon
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Henter, Gustav Eje
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Kucherenko, Taras
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.
    Beskow, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows2020In: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659, Vol. 39, no 2, p. 487-496Article in journal (Refereed)
    Abstract [en]

    Automatic synthesis of realistic gestures promises to transform the fields of animation, avatars and communicative agents. In off-line applications, novel tools can alter the role of an animator to that of a director, who provides only high-level input for the desired animation; a learned network then translates these instructions into an appropriate sequence of body poses. In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters. In this paper we address some of the core issues towards these ends. By adapting a deep learning-based motion synthesis method called MoGlow, we propose a new generative model for generating state-of-the-art realistic speech-driven gesticulation. Owing to the probabilistic nature of the approach, our model can produce a battery of different, yet plausible, gestures given the same input speech signal. Just like humans, this gives a rich natural variation of motion. We additionally demonstrate the ability to exert directorial control over the output style, such as gesture level, speed, symmetry and spacial extent. Such control can be leveraged to convey a desired character personality or mood. We achieve all this without any manual annotation of the data. User studies evaluating upper-body gesticulation confirm that the generated motions are natural and well match the input speech. Our method scores above all prior systems and baselines on these measures, and comes close to the ratings of the original recorded motions. We furthermore find that we can accurately control gesticulation styles without unnecessarily compromising perceived naturalness. Finally, we also demonstrate an application of the same method to full-body gesticulation, including the synthesis of stepping motion and stance.

    Download full text (pdf)
    fulltext
    Download full text (pdf)
    erratum
  • 87.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Aspects of co-occurring syllables and head nods in spontaneous dialogue2013In: Proceedings of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013), The International Society for Computers and Their Applications (ISCA) , 2013, p. 169-172Conference paper (Refereed)
    Abstract [en]

    This paper reports on the extraction and analysis of head nods taken from motion capture data of spontaneous dialogue in Swedish. The head nods were extracted automatically and then manually classified in terms of gestures having a beat function or multifunctional gestures. Prosodic features were extracted from syllables co-occurring with the beat gestures. While the peak rotation of the nod is on average aligned with the stressed syllable, the results show considerable variation in fine temporal synchronization. The syllables co-occurring with the gestures generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. A functional analysis shows that the majority of the syllables belong to words bearing a focal accent.

  • 88.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Extracting and analysing co-speech head gestures from motion-capture data2013In: Proceedings of Fonetik 2013 / [ed] Eklund, Robert, Linköping University Electronic Press, 2013, p. 1-4Conference paper (Refereed)
  • 89.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Extracting and analyzing head movements accompanying spontaneous dialogue2013In: Conference Proceedings TiGeR 2013: Tilburg Gesture Research Meeting, 2013Conference paper (Refereed)
    Abstract [en]

    This paper reports on a method developed for extracting and analyzing head gestures taken from motion capture data of spontaneous dialogue in Swedish. Candidate head gestures with beat function were extracted automatically and then manually classified using a 3D player which displays timesynced audio and 3D point data of the motion capture markers together with animated characters. Prosodic features were extracted from syllables co-occurring with a subset of the classified gestures. The beat gestures show considerable variation in temporal synchronization with the syllables, while the syllables generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. Additional features for further analysis and automatic classification of the head gestures are discussed.

  • 90.
    Alfalahi, Alyaa
    et al.
    Stockholm University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science. Gavagai AB, Sweden.
    Ahlblom, Rickard
    Stockholm University.
    Baskalayci, Roza
    Stockholm University.
    Henriksson, Aron
    Stockholm University.
    Asker, Lars
    Stockholm University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Expanding a Dictionary of Marker Words for Uncertainty and Negation Using Distributional Semantics2015In: EMNLP 2015 - 6th International Workshop on Health Text Mining and Information Analysis, LOUHI 2015 - Proceedings of the Workshop: Short Paper Track / [ed] Cyril Grouin, Thierry Hamon, Aurélie Névéol, and Pierre Zweigenbaum, Association for Computational Linguistics (ACL) , 2015, p. 90-96Conference paper (Refereed)
    Abstract [en]

    Approaches to determining the factuality of diagnoses and findings in clinical text tend to rely on dictionaries of marker words for uncertainty and negation. Here, a method for semi-automatically expanding a dictionary of marker words using distributional semantics is presented and evaluated. It is shown that ranking candidates for inclusion according to their proximity to cluster centroids of semantically similar seed words is more successful than ranking them according to proximity to each individual seed word. 

  • 91.
    Alissandrakis, Aris
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Reski, Nico
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Laitinen, Mikko
    University of Eastern Finland, Finland.
    Tyrkkö, Jukka
    Linnaeus University, Faculty of Arts and Humanities, Department of Languages.
    Levin, Magnus
    Linnaeus University, Faculty of Arts and Humanities, Department of Languages.
    Lundberg, Jonas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Visualizing dynamic text corpora using Virtual Reality2018In: ICAME 39 : Tampere, 30 May – 3 June, 2018: Corpus Linguistics and Changing Society : Book of Abstracts, Tampere: University of Tampere , 2018, p. 205-205Conference paper (Refereed)
    Abstract [en]

    In recent years, data visualization has become a major area in Digital Humanities research, and the same holds true also in linguistics. The rapidly increasing size of corpora, the emergence of dynamic real-time streams, and the availability of complex and enriched metadata have made it increasingly important to facilitate new and innovative approaches to presenting and exploring primary data. This demonstration showcases the uses of Virtual Reality (VR) in the visualization of geospatial linguistic data using data from the Nordic Tweet Stream (NTS) project (see Laitinen et al 2017). The NTS data for this demonstration comprises a full year of geotagged tweets (12,443,696 tweets from 273,648 user accounts) posted within the Nordic region (Denmark, Finland, Iceland, Norway, and Sweden). The dataset includes over 50 metadata parameters in addition to the tweets themselves.

    We demonstrate the potential of using VR to efficiently find meaningful patterns in vast streams of data. The VR environment allows an easy overview of any of the features (textual or metadata) in a text corpus. Our focus will be on the language identification data, which provides a previously unexplored perspective into the use of English and other non-indigenous languages in the Nordic countries alongside the native languages of the region.

    Our VR prototype utilizes the HTC Vive headset for a room-scale VR scenario, and it is being developed using the Unity3D game development engine. Each node in the VR space is displayed as a stacked cuboid, the equivalent of a bar chart in a three-dimensional space, summarizing all tweets at one geographic location for a given point in time (see: https://tinyurl.com/nts-vr). Each stacked cuboid represents information of the three most frequently used languages, appropriately color coded, enabling the user to get an overview of the language distribution at each location. The VR prototype further encourages users to move between different locations and inspect points of interest in more detail (overall location-related information, a detailed list of all languages detected, the most frequently used hashtags). An underlying map outlines country borders and facilitates orientation. In addition to spatial movement through the Nordic areas, the VR system provides an interface to explore the Twitter data based on time (days, weeks, months, or time of predefined special events), which enables users to explore data over time (see: https://tinyurl.com/nts-vr-time).

    In addition to demonstrating how the VR methods aid data visualization and exploration, we will also briefly discuss the pedagogical implications of using VR to showcase linguistic diversity.

  • 92.
    Alissandrakis, Aris
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Reski, Nico
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Laitinen, Mikko
    University of Eastern Finland, Finland.
    Tyrkkö, Jukka
    Linnaeus University, Faculty of Arts and Humanities, Department of Languages.
    Lundberg, Jonas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Levin, Magnus
    Linnaeus University, Faculty of Arts and Humanities, Department of Languages.
    Visualizing rich corpus data using virtual reality2019In: Studies in Variation, Contacts and Change in English, E-ISSN 1797-4453, Vol. 20Article in journal (Refereed)
    Abstract [en]

    We demonstrate an approach that utilizes immersive virtual reality (VR) to explore and interact with corpus linguistics data. Our case study focuses on the language identification parameter in the Nordic Tweet Stream corpus, a dynamic corpus of Twitter data where each tweet originated within the Nordic countries. We demonstrate how VR can provide previously unexplored perspectives into the use of English and other non-indigenous languages in the Nordic countries alongside the native languages of the region and showcase its geospatial variation. We utilize a head-mounted display (HMD) for a room-scale VR scenario that allows 3D interaction by using hand gestures. In addition to spatial movement through the Nordic areas, the interface enables exploration of the Twitter data based on time (days, weeks, months, or time of predefined special events), making it particularly useful for diachronic investigations.

    In addition to demonstrating how the VR methods aid data visualization and exploration, we briefly discuss the pedagogical implications of using VR to showcase linguistic diversity. Our empirical results detail students’ reactions to working in this environment. The discussion part examines the benefits, prospects and limitations of using VR in visualizing corpus data.

  • 93. Allan, James
    et al.
    Aslam, Jay
    Azzopardi, Leif
    Belkin, Nick
    Borlund, Pia
    Bruza, Peter
    Callan, Jamie
    Carman, Mark
    Clarke, Charles L.A.
    Craswell, Nick
    Croft, W. Bruce
    Culpepper, J. Shane
    Diaz, Fernando
    Dumais, Susan
    Ferro, Nicola
    Geva, Shlomo
    Gonzalo, Julio
    Hawking, David
    Jarvelin, Kalervo
    Jones, Gareth
    Jones, Rosie
    Kamps, Jaap
    Kando, Noriko
    Kanoulas, Evangelos
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Kelly, Diane
    Lease, Matthew
    Lin, Jimmy
    Mizzaro, Stefano
    Moffat, Alistair
    Murdock, Vanessa
    Oard, Douglas W.
    Rijke, Maarten de
    Sakai, Tetsuya
    Sanderson, Mark
    Scholer, Falk
    Si, Luo
    Thom, James A.
    Thomas, Paul
    Trotman, Andrew
    Turpin, Andrew
    Vries, Arjen P. de
    Webber, William
    Zhang, Xiuzhen (Jenny)
    Zhang, Yi
    Frontiers, Challenges, and Opportunities for Information Retrieval – Report from SWIRL 2012, The Second Strategic Workshop on Information Retrieval in Lorne2012In: SIGIR Forum, ISSN 0163-5840, Vol. 46, no 1, p. 2-32Article in journal (Refereed)
    Abstract [en]

    During a three-day workshop in February 2012, 45 Information Retrieval researchers met to discuss long-range challenges and opportunities within the field. The result of the workshop is a diverse set of research directions, project ideas, and challenge areas. This report describes the workshop format, provides summaries of broad themes that emerged, includes brief descriptions of all the ideas, and provides detailed discussion of six proposals that were voted "most interesting" by the participants. Key themes include the need to: move beyond ranked lists of documents to support richer dialog and presentation, represent the context of search and searchers, provide richer support for information seeking, enable retrieval of a wide range of structured and unstructured content, and develop new evaluation methodologies.

    Download full text (pdf)
    fulltext
  • 94.
    Allwood, Jens
    et al.
    University of Borås, School of Business and IT.
    Hammarström, Harald
    Hendrikse, Andries
    Ngcobo, Mtholeni N.
    Nomdebevana, Nozibele
    Pretorius, Laurette
    van der Merwe, Mac
    Work on Spoken (Multimodal) Language Corpora in South Africa2010Conference paper (Refereed)
    Abstract [en]

    This paper describes past, ongoing and planned work on the collection and transcription of spoken language samples for all the South African official languages and as part of this the training of researchers in corpus linguistic research skills. More specifically the work has involved (and still involves) establishing an international corpus linguistic network linked to a network hub at a UNISA website and the development of research tools, a corpus research guide and workbook for multimodal communication and spoken language corpus research. As an example of the work we are doing and hope to do more of in the future, we present a small pilot study of the influence of English and Afrikaans on the 100 most frequent words in spoken Xhosa as this is evidenced in the corpus of spoken interaction we have gathered so far. Other planned work, besides work on spoken language phenomena, involves comparison of spoken and written language and work on communicative body movements (gestures) and their relation to speech.

    Download full text (pdf)
    fulltext
  • 95.
    Al-Mashahedi, Ahmad
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Ljung, Oliver
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Robust Code Generation using Large Language Models: Guiding and Evaluating Large Language Models for Static Verification2024Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Background: Generative AI has achieved rapid and widespread acclaim over a short period since the inception of recent models that have opened up opportunities not possible before. Large Language Models (LLMs), a subset of generative AI, have become an essential part of code generation for software development. However, there is always a risk that the generated code does not fulfill the programmer's intent and contains faults or bugs that can go unnoticed. To that end, we propose that verification of generated code should increase its quality and trust.

    Objectives: This thesis aims to research generation of code that is both functionally correct and verifiable by implementing and evaluating four prompting approaches and a reinforcement learning solution to increase robustness within code generation, using unit-test and verification rewards.

    Methods: We used a Rapid Literature Review (RLR) and Design Science methodology to get a solid overview of the current state of robust code generation. From the RLR and related works, we evaluated the following four prompting approaches: Base prompt, Documentation prompting, In-context learning, and Documentation + In-context learning on the two datasets: MBPP and HumanEval. Moreover, we fine-tuned one model using Proximal Policy Optimization (PPO) for the novel task.

    Results: We measured the functional correctness and static verification success rates, amongst other metrics, for the four proposed approaches on eight model configurations, including the PPO fine-tuned LLM. Our results show that for the MBPP dataset, on average, In-context learning had the highest functional correctness at 29.4% pass@1, Documentation prompting had the highest verifiability at 8.48% verfiable@1, and finally, In-context learning had the highest functionally correct verifiable code at 3.2% pass@1 & verifiable@1. Moreover, the PPO fine-tuned model showed an overall increase in performance across all approaches compared to the pre-trained base model.

    Conclusions: We found that In-context learning on the PPO fine-tuned model yielded the best overall results across most metrics compared to the other approaches. The PPO fine-tuned with In-context learning resulted in 32.0% pass@1, 12.8% verifiable@1, and 5.0% pass@1 & verifiable@1. Documentation prompting was better for verifable@1 on MBPP. However, it did not perform as well for the other metrics. Documentation prompting + In-context learning was performance-wise between Documentation prompting and In-context learning, while Base prompt performed the worst overall. For future work, we envision several improvements to PPO training, including but not limited to training on Nagini documentation and utilizing expert iteration to create supervised fine-tuning datasets to improve the model iteratively. 

    Download full text (pdf)
    fulltext
  • 96. Almqvist, Ingrid
    et al.
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Defining ScaniaSwedish - A Controlled Language for Truck Maintenance1996In: Proceedings of the First International Workshop on Controlled Language Applications, Centre for Computational Linguistics. Katholieke Universiteit Leuven , 1996Conference paper (Refereed)
    Abstract [en]

    An approach to integrated multilingual document production is proposed. The basic idea of this approach is to use the analyzer of a modular, transferbased machine translation system as the core of a language checker. The checker generates grammatical structures to be forwarded to the transfer and generation components for the various target languages. A precondition for such an approach is a controlled source language. The source language in focus of this presentation, is ScaniaSwedish, to be defined via a standardization of the language presently used by Scania in their truck maintenance documents. Here we concentrate on the identification of the vocabulary of current ScaniaSwedish and present the results that we achieved so far. In parallel with the inventory of the vocabulary, the competence of the language checker is developed.

  • 97. Almqvist, Inrid
    et al.
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    A Language Checker of Controlled Language and its Integration in a Documentation and Translation Workflow2000In: Translating and the Computer 22: Proceedings of the Twenty-second international conference, 16-17 November, 2000, London, London: Aslib, 2000, Vol. 22Conference paper (Refereed)
  • 98. Alonso, Omar
    et al.
    Kamps, Jaap
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Report on the Fourth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 11)2012In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 46, no 1, p. 56-64Article in journal (Refereed)
    Abstract [en]

    There is an increasing amount of structure on the Web as a result of modern Web languages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by increasing the depth of analysis of today’s systems. Currently, we have only started to explore the possibilities and only begun to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive format consisting of keynotes, boasters and posters, breakout groups and reports, and a final discussion, which was prolonged into the evening. There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, annotations and use cases come in many different shapes and forms depending on the domain at hand, but at a higher level there are remarkable commonalities in annotation tools, indexing methods, user interfaces, and general methodology. Second, we got insights in the "exploitation" aspects, leading to a clear separation between the low-level annotations giving context or meaning to small units of information (e.g., NLP, sentiments, entities), and annotations bringing out the structure inherent in the data (e.g., sources, data schemas, document genres). Third, the plan to enrich ClueWeb with various document level (e.g., pagerank and spam scores, but also reading level) and lower level (e.g., named entities or sentiments) annotations was embraced by the workshop as a concrete next step to promote research in semantic annotations.

    Download full text (pdf)
    fulltext
  • 99.
    Altemark, Mikael
    Södertörn University College, School of Culture and Communication.
    Lexis, Discourse Prosodies and the Taking of Stance: A Corpus Study of the Meaning of ‘Self-proclaimed’2011Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    This study is concerned with the description of the semantic and pragmatic characteristics of the attributive adjective self-proclaimed, employing corpus-linguistic methodology to explore its meaning from user-based data. The initial query provided the material from which a lexical pro-file of the target word was constructed, systematically describing collocational data, semantic preferences, semantic associations and discourse prosodies. Qualitative analysis of sample con-cordances illustrated the role of the target word in expressing different kinds of meaning-bearing stances. The results demonstrate the importance of context and communicative functionality as constraints determining meaning, determining the discourse prosodies of self-proclaimed as one of either negation; accepted-positive and accepted-negative. Further, the analysis of self-proclaimed as a stance marker indicates the linking evaluative meanings of extended lexical units to the project of linguistic description of intersubjective stancetaking as a possibly fruitful venue for research

    Download full text (pdf)
    Lexis, Discourse Prosodies and the Taking of Stance: A Corpus Study of the Meaning of ‘Self-proclaimed’
    Download (pdf)
    bilaga
  • 100. Altmann, U.
    et al.
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Campbell, N.
    Conversational Involvement and Synchronous Nonverbal Behaviour2012In: Cognitive Behavioural Systems: COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, Revised Selected Papers / [ed] Anna Esposito, Antonietta M. Esposito, Alessandro Vinciarelli, Rüdiger Hoffmann, Vincent C. Müller, Springer Berlin/Heidelberg, 2012, p. 343-352Conference paper (Refereed)
    Abstract [en]

    Measuring the quality of an interaction by means of low-level cues has been the topic of many studies in the last couple of years. In this study we propose a novel method for conversation-quality-assessment. We first test whether manual ratings of conversational involvement and automatic estimation of synchronisation of facial activity are correlated. We hypothesise that the higher the synchrony the higher the involvement. We compare two different synchronisation measures. The first measure is defined as the similarity of facial activity at a given point in time. The second is based on dependence analyses between the facial activity time series of two interlocutors. We found that dependence measure correlates more with conversational involvement than similarity measure.

1234567 51 - 100 of 2745
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf