Digitala Vetenskapliga Arkivet

Change search
Refine search result
12 1 - 50 of 59
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. AAl Abdulsalam, Abdulrahman
    et al.
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. King's College, London.
    Meystre, Stephane
    UtahBMI at SemEval-2016 Task 12: Extracting Temporal Information from Clinical Text2016In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics , 2016, p. 1256-1262Conference paper (Refereed)
    Abstract [en]

    The 2016 Clinical TempEval continued the 2015 shared task on temporal information extraction with a new evaluation test set. Our team, UtahBMI, participated in all subtasks using machine learning approaches with ClearTK (LIBLINEAR), CRF++ and CRFsuite packages. Our experiments show that CRF-based classifiers yield, in general, higher recall for multi-word spans, while SVM-based classifiers are better at predicting correct attributes of TIMEX3. In addition, we show that an ensemble-based approach for TIMEX3 could yield improved results. Our team achieved competitive results in each subtask with an F1 75.4% for TIMEX3, F1 89.2% for EVENT, F1 84.4% for event relations with document time (DocTimeRel), and F1 51.1% for narrative container (CONTAINS) relations.

  • 2.
    Allvin, Helen
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Carlsson, Elin
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Dalianis, Hercules
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Danielsson-Ojala, Riitta
    Daudaravieius, Vidas
    Hassel, Martin
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kokkinakis, Dimitrios
    Lundgrén-Laine, Heljä
    Nilsson, Gunnar H.
    Nytrø, Øystein
    Salanterä, Sanna
    Skeppstedt, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Suominen, Hanna
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Characteristics of Finnish and Swedish intensive care nursing narratives: a comparative analysis to support the development of clinical language technologies2011In: Journal of Biomedical Semantics, E-ISSN 2041-1480, Vol. 2, no S1, p. 1-11Article in journal (Refereed)
    Abstract [en]

    Background: Free text is helpful for entering information into electronic health records, but reusing it is a challenge. The need for language technology for processing Finnish and Swedish healthcare text is therefore evident; however, Finnish and Swedish are linguistically very dissimilar. In this paper we present a comparison of characteristics in Finnish and Swedish free-text nursing narratives from intensive care. This creates a framework for characterising and comparing clinical text and lays the groundwork for developing clinical language technologies. Methods: Our material included daily nursing narratives from one intensive care unit in Finland and one in Sweden. Inclusion criteria for patients were an inpatient period of least five days and an age of at least 16 years. We performed a comparative analysis as part of a collaborative effort between Finnish- and Swedish-speaking healthcare and language technology professionals that included both qualitative and quantitative aspects. The qualitative analysis addressed the content and structure of three average- sized health records from each country. In the quantitative analysis 514 Finnish and 379 Swedish health records were studied using various language technology tools. Results: Although the two languages are not closely related, nursing narratives in Finland and Sweden had many properties in common. Both made use of specialised jargon and their content was very similar. However, many of these characteristics were challenging regarding development of language technology to support producing and using clinical documentation. Conclusions: The way Finnish and Swedish intensive care nursing was documented, was not country or language dependent, but shared a common context, principles and structural features and even similar vocabulary elements. Technology solutions are therefore likely to be applicable to a wider range of natural languages, but they need linguistic tailoring. Availability: The Finnish and Swedish data can be found at: http://www.dsv.su.se/ hexanord/data/

    Download full text (pdf)
    fulltext
  • 3. Bittar, A.
    et al.
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
    Roberts, A.
    Dutta, R.
    Text classification to inform suicide risk assessment in electronic health records2019In: 17th World Congress on Medical and Health Informatics, MEDINFO 2019, IOS Press, 2019, Vol. 264, p. 40-44Conference paper (Refereed)
    Abstract [en]

    Assessing a patient's risk of an impending suicide attempt has been hampered by limited information about dynamic factors that change rapidly in the days leading up to an attempt. The storage of patient data in electronic health records (EHRs) has facilitated population-level risk assessment studies using machine learning techniques. Until recently, most such work has used only structured EHR data and excluded the unstructured text of clinical notes. In this article, we describe our experiments on suicide risk assessment, modelling the problem as a classification task. Given the wealth of text data in mental health EHRs, we aimed to assess the impact of using this data in distinguishing periods prior to a suicide attempt from those not preceding such an attempt. We compare three different feature sets, one structured and two text-based, and show that inclusion of text features significantly improves classification accuracy in suicide risk assessment.

  • 4. Chapman, Wendy W.
    et al.
    Hilert, Dieter
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Skeppstedt, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Chapman, Brian
    Conway, Michael
    Tharp, Melissa
    Mowery, Danielle L.
    Deleger, Louise
    Extending the NegEx Lexicon for Multiple Languages2013In: Proceedings of the 14th World Congress on Medical and Health Informatics / [ed] Christoph Ulrich Lehmann, Elske Ammenwerth, Christian Nøhr, IOS Press, 2013, p. 677-681Conference paper (Refereed)
    Abstract [en]

    We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf’s law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers “no” and “not” were common for all languages; however, other triggers varied considerably. The lexicon is available in OWL and RDF format and can be extended to other languages. We discuss the challenges in translating negation triggers to other languages and issues in representing multilingual lexical knowledge.

  • 5.
    Dalianis, Hercules
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Henriksson, Aron
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Weegar, Rebecka
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    HEALTH BANK - A Workbench for Data Science Applications in Healthcare2015In: Industry Track Workshop, CEUR Workshop Proceedings , 2015, Vol. 1381, p. 1-18Conference paper (Refereed)
    Abstract [en]

    The enormous amounts of data that are generated in the healthcare process and stored in electronic health record (EHR) systems are an underutilized resource that, with the use of data science applica- tions, can be exploited to improve healthcare. To foster the development and use of data science applications in healthcare, there is a fundamen- tal need for access to EHR data, which is typically not readily available to researchers and developers. A relatively rare exception is the large EHR database, the Stockholm EPR Corpus, comprising data from more than two million patients, that has been been made available to a lim- ited group of researchers at Stockholm University. Here, we describe a number of data science applications that have been developed using this database, demonstrating the potential reuse of EHR data to support healthcare and public health activities, as well as facilitate medical re- search. However, in order to realize the full potential of this resource, it needs to be made available to a larger community of researchers, as well as to industry actors. To that end, we envision the provision of an in- frastructure around this database called HEALTH BANK – the Swedish Health Record Research Bank. It will function both as a workbench for the development of data science applications and as a data explo- ration tool, allowing epidemiologists, pharmacologists and other medical researchers to generate and evaluate hypotheses. Aggregated data will be fed into a pipeline for open e-access, while non-aggregated data will be provided to researchers within an ethical permission framework. We believe that HEALTH BANK has the potential to promote a growing industry around the development of data science applications that will ultimately increase the efficiency and effectiveness of healthcare.

  • 6.
    Dalianis, Hercules
    et al.
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Nilsson, Gunnar
    Velupillai, Sumithra
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Is de-identification of electronic health records possible?: Or can we use health record corpora for research?2009In: Virtual healthcare interaction: Papers from AAAI fall symposium ; [November 5 - 7, 2009, at the Westin Arlington Gateway in Arlington, Virginia USA], AAAI Press, 2009, p. 2-3Conference paper (Refereed)
  • 7.
    Dalianis, Hercules
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields2010In: Journal of Biomedical Semantics, ISSN 2041-1480, Vol. 1:6Article in journal (Refereed)
    Abstract [en]

    Background

    In order to perform research on the information contained in Electronic Patient Records (EPRs), access to the data itself is needed. This is often very difficult due to confidentiality regulations. The data sets need to be fully de-identified before they can be distributed to researchers. De-identification is a difficult task where the definitions of annotation classes are not self-evident.

    Results

    We present work on the creation of two refined variants of a manually annotated Gold standard for de-identification, one created automatically, and one created through discussions among the annotators. The data is a subset from the Stockholm EPR Corpus, a data set available within our research group. These are used for the training and evaluation of an automatic system based on the Conditional Random Fields algorithm. Evaluating with four-fold cross-validation on sets of around 4-6 000 annotation instances, we obtained very promising results for both Gold Standards: F-score around 0.80 for a number of experiments, with higher results for certain annotation classes. Moreover, 49 false positives that were verified true positives were found by the system but missed by the annotators.

    Conclusions

    Our intention is to make this Gold standard, The Stockholm EPR PHI Corpus, available to other research groups in the future. Despite being slightly more time-consuming we believe the manual consensus gold standard is the most valuable for further research. We also propose a set of annotation classes to be used for similar de-identification tasks.

  • 8.
    Dalianis, Hercules
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    How Certain are Clinical Assessments?: Annotating Swedish Clinical Text for (Un)certainties, Speculations and Negations2010In: Proceedings of the of the Seventh International Conference on Language Resources and Evaluation, LREC 2010 / [ed] Nicoletta Calzolari, 2010, p. 3071-3075Conference paper (Other academic)
    Abstract [en]

    Clinical texts contain a large amount of information. Some of this information is embedded in contexts where e.g. a patient status is reasoned about, which may lead to a considerable amount of statements that indicate uncertainty and speculation. We believe that distinguishing such instances from factual statements will be very beneficial for automatic information extraction. We have annotated a subset of the Stockholm Electronic Patient Record Corpus for certain and uncertain expressions as well as speculative and negation keywords, with the purpose of creating a resource for the development of automatic detection of speculative language in Swedish clinical text. We have analyzed the results from the initial annotation trial by means of pairwise Inter-Annotator Agreement (IAA) measured with F-score. Our main findings are that IAA results for certain expressions and negations are very high, but for uncertain expressions and speculative keywords results are less encouraging. These instances need to be defined in more detail. With this annotation trial, we have created an important resource that can be used to further analyze the properties of speculative language in Swedish clinical text. Our intention is to release this subset to other research groups in the future after removing identifiable information.

  • 9. Downs, J.
    et al.
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    George, G.
    Holden, R.
    Kikoler, M.
    Dean, H.
    Fernandes, A.
    Dutta, R.
    Detection of Suicidality in Adolescents with Autism Spectrum Disorders: Developing a Natural Language Processing Approach for Use in Electronic Health Records2017In: AMIA Annual Symposium Proceedings, E-ISSN 1942-597X, Vol. 2017, p. 641-649Article in journal (Refereed)
    Abstract [en]

    Over 15% of young people with autism spectrum disorders (ASD) will contemplate or attempt suicide during adolescence. Yet, there is limited evidence concerning risk factors for suicidality in childhood ASD. Electronic health records (EHRs) can be used to create retrospective clinical cohort data for large samples of children with ASD. However systems to accurately extract suicidality-related concepts need to be developed so that putative models of suicide risk in ASD can be explored. We present a systematic approach to 1) adapt Natural Language Processing (NLP) solutions to screen with high sensitivity for reference to suicidal constructs in a large clinical ASD EHR corpus (230,465 documents), and 2) evaluate within a screened subset of 500 patients, the performance of an NLP classification tool for positive and negated suicidal mentions within clinical text. When evaluated, the NLP classification tool showed high system performance for positive suicidality with precision, recall, and F1 scores all > 0.85 at a document and patient level. The application therefore provides accurate output for epidemiological research into the factors contributing to the onset and recurrence of suicidality, and potential utility within clinical settings as an automated surveillance or risk prediction tool for specialist ASD services.

  • 10.
    Dutta, Rina
    et al.
    Kings Coll London, Sch Acad Psychiat, Dept Psychol Med, IoPPN, POB 84,3rd Floor,East Wing,Room E3-07, London SE5 8AF, England.;South London & Maudsley NHS Fdn Trust, London, England..
    Gkotsis, George
    Kings Coll London, Sch Acad Psychiat, Dept Psychol Med, IoPPN, POB 84,3rd Floor,East Wing,Room E3-07, London SE5 8AF, England..
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. Department of Psychological Medicine, School of Academic Psychiatry, King’s College London.
    Bakolis, Ioannis
    Kings Coll London, Sch Acad Psychiat, Dept Psychol Med, IoPPN, POB 84,3rd Floor,East Wing,Room E3-07, London SE5 8AF, England..
    Stewart, Robert
    Kings Coll London, Sch Acad Psychiat, Dept Psychol Med, IoPPN, POB 84,3rd Floor,East Wing,Room E3-07, London SE5 8AF, England.;South London & Maudsley NHS Fdn Trust, London, England..
    Temporal and diurnal variation in social media posts to a suicide support forum2021In: BMC Psychiatry, E-ISSN 1471-244X, Vol. 21, no 1, article id 259Article in journal (Refereed)
    Abstract [en]

    BackgroundRates of suicide attempts and deaths are highest on Mondays and these occur more frequently in the morning or early afternoon, suggesting weekly temporal and diurnal variation in suicidal behaviour. It is unknown whether there are similar time trends on social media, of posts relevant to suicide. We aimed to determine temporal and diurnal variation in posting patterns on the Reddit forum SuicideWatch, an online community for individuals who might be at risk of, or who know someone at risk of suicide.MethodsWe used time series analysis to compare date and time stamps of 90,518 SuicideWatch posts from 1st December 2008 to 31st August 2015 to (i) 6,616,431 posts on the most commonly subscribed general subreddit, AskReddit and (ii) 66,934 of these AskReddit posts, which were posted by the SuicideWatch authors.ResultsMondays showed the highest proportion of posts on SuicideWatch. Clear diurnal variation was observed, with a peak in the early morning (2:00-5:00h), and a subsequent decrease to a trough in late morning/early afternoon (11:00-14:00h). Conversely, the highest volume of posts in the control data was between 20:00-23:00h.ConclusionsPosts on SuicideWatch occurred most frequently on Mondays: the day most associated with suicide risk. The early morning peak in SuicideWatch posts precedes the time of day during which suicide attempts and deaths most commonly occur. Further research of these weekly and diurnal rhythms should help target populations with support and suicide prevention interventions when needed most.

  • 11. Gkotsis, George
    et al.
    Oellrich, Anika
    Hubbard, Tim
    Dobson, Richard
    Liakata, Maria
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Dutta, Rina
    The language of mental health problems in social media2016In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, Association for Computational Linguistics , 2016, p. 63-73Conference paper (Refereed)
    Abstract [en]

    Online social media, such as Reddit, has become an important resource to share personal experiences and communicate with others. Among other personal information, some social media users communicate about mental health problems they are experiencing, with the intention of getting advice, support or empathy from other users. Here, we investigate the language of Reddit posts specific to mental health, to define linguistic characteristics that could be helpful for further applications. The latter include attempting to identify posts that need urgent attention due to their nature, e.g. when someone announces their intentions of ending their life by suicide or harming others. Our results show that there are a variety of linguistic features that are discriminative across mental health user communities and that can be further exploited in subsequent classification tasks. Furthermore, while negative sentiment is almost uniformly expressed across the entire data set, we demonstrate that there are also condition-specific vocabularies used in social media to communicate about particular disorders. Source code and related materials are available from: https: //github.com/gkotsis/ reddit-mental-health.

  • 12. Gkotsis, George
    et al.
    Oellrich, Anika
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Liakata, Maria
    Hubbard, Tim J. P.
    Dobson, Richard J. B.
    Dutta, Rina
    Characterisation of mental health conditions in social media using Informed Deep Learning2017In: Scientific Reports, E-ISSN 2045-2322, Vol. 7Article in journal (Refereed)
    Abstract [en]

    The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well- being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.

  • 13. Gkotsis, George
    et al.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Oellrich, Anika
    Dean, Harry
    Liakata, Maria
    Dutta, Rina
    Don’t Let Notes Be Misunderstood: A Negation Detection Method for Assessing Risk of Suicide in Mental Health Records2016In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, Association for Computational Linguistics , 2016, p. 95-105Conference paper (Refereed)
    Abstract [en]

    Mental Health Records (MHRs) contain freetext documentation about patients’ suicide and suicidality. In this paper, we address the problem of determining whether grammatic variants (inflections) of the word “suicide” are af- firmed or negated. To achieve this, we populate and annotate a dataset with over 6,000 sentences originating from a large repository of MHRs. The resulting dataset has high InterAnnotator Agreement (κ 0.93). Furthermore, we develop and propose a negation detection method that leverages syntactic features of text1 . Using parse trees, we build a set of basic rules that rely on minimum domain knowledge and render the problem as binary classification (affirmed vs. negated). Since the overall goal is to identify patients who are expected to be at high risk of suicide, we focus on the evaluation of positive (affirmed) cases as determined by our classifier. Our negation detection approach yields a recall (sensitivity) value of 94.6% for the positive cases and an overall accuracy value of 91.9%. We believe that our approach can be integrated with other clinical Natural Language Processing tools in order to further advance information extraction capabilities.

  • 14.
    Grigonyte, Gintare
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Wirén, Mats
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Henriksson, Aron
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Swedification patterns of Latin and Greek affixes in clinical text2016In: Nordic Journal of Linguistics, ISSN 0332-5865, E-ISSN 1502-4717, Vol. 39, no 1, p. 5-37Article in journal (Refereed)
    Abstract [en]

    Swedish medical language is rich with Latin and Greek terminology which has undergone a Swedification since the 1980s. However, many original expressions are still used by clinical professionals. The goal of this study is to obtain precise quantitative measures of how the foreign terminology is manifested in Swedish clinical text. To this end, we explore the use of Latin and Greek affixes in Swedish medical texts in three genres: clinical text, scientific medical text and online medical information for laypersons. More specifically, we use frequency lists derived from tokenised Swedish medical corpora in the three domains, and extract word pairs belonging to types that display both the original and Swedified spellings. We describe six distinct patterns explaining the variation in the usage of Latin and Greek affixes in clinical text. The results show that to a large extent affixes in clinical text are Swedified and that prefixes are used more conservatively than suffixes.

  • 15.
    Grigonyté, Gintaré
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Wirén, Mats
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results2014In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Stroudsburg, USA: Association for Computational Linguistics, 2014, p. 74-83Conference paper (Refereed)
    Abstract [en]

    This paper describes part of an ongoing effort to improve the readability of Swedish electronic health records (EHRs). An EHR contains systematic documentation of a single patient’s medical history across time, entered by healthcare professionals with the purpose of enabling safe and informed care. Linguistically, medical records exemplify a highly specialised domain, which can be superficially characterised as having telegraphic sentences involving displaced or missing words, abundant abbreviations, spelling variations including misspellings, and terminology. We report results on lexical simplification of Swedish EHRs, by which we mean detecting the unknown, out-ofdictionary words and trying to resolve them either as compounded known words, abbreviations or misspellings.

    Download full text (pdf)
    Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results
  • 16.
    Grigonyté, Gintaré
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institute, Sweden.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Wirén, Mats
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Spelling Variation of Latin and Greek words in Swedish Medical Text2014Conference paper (Refereed)
  • 17.
    Isenius, Niklas
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Initial Results in the Development of SCAN: a Swedish Clinical Abbreviation Normalizer2012In: CLEFeHealth 2012: The CLEF 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis / [ed] Hanna Suominen, Canberra, Australia: NICTA, National ICT Australia and The Australian National University , 2012Conference paper (Refereed)
    Abstract [en]

    Abbreviations are common in clinical documentation, as this type of text is written under time-pressure and serves mostly for internal communication. This study attempts to apply and extend existing rule-based algorithms that have been developed for English and Swedish abbreviation detection, in order to create an abbreviation detection algorithm for Swedish clinical texts that can identify and suggest definitions for abbreviations and acronyms. This can be used as a pre-processing step for further information extraction and text mining models, as well as for readability solutions.

    Through a literature review, a number of heuristics were defined for automatic abbreviation detection. These were used in the construction of the Swedish Clinical Abbreviation Normalizer (SCAN). The heuristics were: a) freely available external resources: a dictionary of general Swedish, a dictionary of medical terms and a dictionary of known Swedish medical abbreviations, b) maximum word lengths (from three to eight characters), and c) heuristics for handling common patterns such as hyphenation. For each token in the text, the algorithm checks whether it is a known word in one of the lexicons, and whether it fulfills the criteria for word length and the created heuristics. The final algorithm was evaluated on a set of 300 Swedish clinical notes from an emergency department at the Karolinska University Hospital, Stockholm. These notes were annotated for abbreviations, a total of 2,050 tokens. This set was annotated by a physician accustomed to reading and writing medical records.

    The algorithm was tested in different variants, where the word lists were modified, heuristics adapted to characteristics found in the texts, and different combinations of word lengths. The best performing version of the algorithm achieved an F-Measure score of 79%, with 76% recall and 81% precision, which is a considerable improvement over the baseline where each token was only matched against the word lists (51% F-measure, 87% recall, 36% precision). Not surprisingly, precision results are higher when the maximum word length is set to the lowest (three), and recall results higher when it is set to the highest (eight).

    Algorithms for rule-based systems, mainly developed for English, can be successfully adapted for abbreviation detection in Swedish medical records. System performance relies heavily on the quality of the external resources, as well as on the created heuristics. In order to improve results, part-of-speech information and/or local context is needed for disambiguation. In the case of Swedish, compounding also needs to be handled.

    Download full text (pdf)
    fulltext
  • 18. Ive, J.
    et al.
    Viani, N.
    Chandran, D.
    Bittar, A.
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. King's College London, IoPPN, London, SE5 8AF, United Kingdom.
    KCL-Health-NLP@CLEF eHealth 2018 Task 1: ICD-10 coding of French and Italian death certificates with character-level convolutional neural networks2018In: CEUR Workshop Proceedings, CEUR-WS , 2018, Vol. 2125Conference paper (Refereed)
    Abstract [en]

    In this paper we describe the participation of the KCL-Health-NLP team in the CLEF eHealth 2018 lab, specifically Task 1: Multilingual Information Extraction-ICD10 coding. The task involves the automatic coding of causes of death in death certificates in French, Italian and Hungarian according to the ICD-10 taxonomy. Choosing to work on the two Romance languages, we treated the task as a sequence-to-sequence prediction problem. Our system has an encoder-decoder architecture, with convolutional neural networks based on character em-beddings as encoders and recurrent neural network decoders. Our hypothesis was that a character-level representation would allow our model to generalise across two genealogically related languages. Results obtained by pre-training our Italian model on the French data set confirmed this intuition. We also explored the impact of character-level features extracted from dictionary-matched ICD codes. We obtained F-measures of 0.72/0.64 and 0.78 on the French aligned/raw and Italian raw internal test data, respectively. On the blind test set released by the task organisers, our top results were 0.65/0.52 and 0.69 F-measure, respectively.

  • 19. Kalyanam, Janani
    et al.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA. KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Conway, Mike
    Lanckriet, Gert
    From Event Detection to Storytelling on Microblogs2016In: PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, IEEE, 2016, p. 437-442Conference paper (Refereed)
    Abstract [en]

    The problem of detecting events from content published on microblogs has garnered much interest in recent times. In this paper, we address the questions of what happens after the outbreak of an event in terms of how the event gradually progresses and attains each of its milestones, and how it eventually dissipates. We propose a model based approach to capture the gradual unfolding of an event over time. This enables the model to automatically produce entire timeline trajectories of events from the time of their outbreak to their disappearance. We apply our model on the Twitter messages collected about Ebola during the 2014 outbreak and obtain the progression timelines of several events that occurred during the outbreak. We also compare our model to several existing topic modeling and event detection baselines in literature to demonstrate its efficiency.

  • 20. Kelly, Liadh
    et al.
    Goeuriot, Lorraine
    Suominen, Hanna
    Schreck, Tobias
    Leroy, Gondy
    Mowery, Danielle
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Chapman, Wendy W.
    Martinez, David
    Zuccon, Guido
    Palotti, João
    Overview of the ShARe/CLEF eHealth Evaluation Lab 20142014In: Information Access Evaluation. Multilinguality, Multimodality, and Interaction: 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, UK, September 15-18, 2014. Proceedings / [ed] Evangelos Kanoulas, Cham: Springer, 2014, Vol. 8685, p. 172-191Conference paper (Refereed)
  • 21.
    Kvist, Maria
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Professional Language in Swedish Radiology Reports – Characterization for Patient-Adapted Text Simplification2013In: Scandinavian Conference on Health Informatics 2013 / [ed] Gustav Bellika et al., Linköping: Linköping University Electronic Press, 2013, p. 55-59Conference paper (Refereed)
    Abstract [en]

    In health care, there is a need for patient adaption of clinical text, so that patients can understand their own health records. As a base for construction of automated text simplification tools, characterization of the clinical language is needed. We describe a corpus of 0.43 mill. radiology reports from a University Hospital, characterize it quantitatively and per-form a qualitative content analysis. The results show that a limited set of words and phrases are recurrent in the reports and can be used for exchange to more easy-to-read vocabu-lary. Semantic categories such as body parts, findings, proce-dures, and administrative information can be used in the sim-plification process. This study investigates the potentials and the pitfalls for text simplification of medical Swedish into general Swedish for laymen.

    Download full text (pdf)
    fulltext
  • 22.
    Kvist, Maria
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    SCAN: a Swedish Clinical Abbreviation Normalizer: further Development and Adaptation to Radiology2014In: Information Access Evaluation. Multilinguality, Multimodality, and Interaction: 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, UK, September 15-18, 2014. Proceedings, Cham: Springer, 2014, p. 62-73Conference paper (Refereed)
  • 23. Lövestam, Elin
    et al.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Abbreviations in Swedish Clinical Text - use by three professions2014In: e-Health – For Continuity of Care / [ed] Christian Lovis, Brigitte Séroussi, Arie Hasman, Louise Pape-Haugaard, Osman Saka, Stig Kjær Andersen, IOS Press, 2014, p. 720-724Conference paper (Refereed)
    Abstract [en]

    A list of 266 abbreviations from dieticians' notes in patient records was used to extract the same abbreviations from patient records written by three professions: dieticians, nurses and physicians. A context analysis of 40 of the abbreviations showed that ambiguous meanings were common. Abbreviations used by dieticians were found to be used by other professions, but not always with the same meaning. This ambiguity of abbreviations might cause misunderstandings and put patient safety at risk.

  • 24. Mowery, Danielle L.
    et al.
    South, Brett R.
    Christensen, Lee
    Leng, Jianwei
    Peltonen, Laura-Maria
    Salantera, Sanna
    Suominen, Hanna
    Martinez, David
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Elhadad, Noemie
    Savova, Guergana
    Pradhan, Sameer
    Chapman, Wendy W.
    Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 22016In: Journal of Biomedical Semantics, E-ISSN 2041-1480, Vol. 7, article id 43Article in journal (Refereed)
    Abstract [en]

    Background: The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.

  • 25. Mowery, Danielle L.
    et al.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Chapman, Wendy W.
    Medical diagnosis lost in translation – Analysis of uncertainty and negation expressions in English and Swedish clinical texts2012In: BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, Stroudsburg, USA: Association for Computational Linguistics, 2012, p. 56-64Conference paper (Refereed)
    Abstract [en]

    In the English clinical and biomedical text domains, negation and certainty usage are two well-studied phenomena. However, few studies have made an in-depth characterization of uncertainties expressed in a clinical setting, and compared this between different annotation efforts. This preliminary, qualitative study attempts to 1) create a clinical uncertainty and negation taxonomy, 2) develop a translation map to convert annotation labels from an English schema into a Swedish schema, and 3) characterize and compare two data sets using this taxonomy. We define a clinical uncertainty and negation taxonomy and a translation map for converting annotation labels between two schemas and report observed similarities and differences between the two data sets.

    Download full text (pdf)
    fulltext
  • 26. Mowery, Danielle
    et al.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    South, Brett R.
    Christensen, Lee
    Martinez, David
    Kelly, Liadh
    Goeuriot, Lorraine
    Elhadad, Noemie
    Pradhan, Sameer
    Savova, Guergana
    Chapman, Wendy W.
    Task 2: ShARe/CLEF eHealth Evaluation Lab 20142014In: CLEFeHealth eHealth Evaluation Lab 2014, WISU Verlag Aachen, 2014Conference paper (Refereed)
  • 27. Mowery, Danielle
    et al.
    Wiebe, Janyce
    Ross, Mindy
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Stephane, Meystre,
    Chapman, Wendy
    Generating Patient Problem Lists from the ShARe Corpus using SNOMED CT/SNOMED CT CORE Problem List2014In: Proceedings of BioNLP 2014, Stroudsburg: Association for Computational Linguistics , 2014, p. 54-58Conference paper (Refereed)
  • 28.
    Neveol, Aurelie
    et al.
    Univ Paris Saclay, CNRS, LIMSI, Rue John von Neumann, F-91405 Orsay, France..
    Dalianis, Hercules
    Stockholm Univ, DSV, Kista, Sweden..
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC). Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England..
    Savova, Guergana
    Childrens Hosp Boston, Boston, MA USA.;Harvard Med Sch, Boston, MA USA..
    Zweigenbaum, Pierre
    Univ Paris Saclay, CNRS, LIMSI, Rue John von Neumann, F-91405 Orsay, France..
    Clinical Natural Language Processing in languages other than English: opportunities and challenges2018In: Journal of Biomedical Semantics, E-ISSN 2041-1480, Vol. 9, article id 12Article, review/survey (Refereed)
    Abstract [en]

    Background: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body: We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion: We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

  • 29.
    Rosell, Magnus
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Revealing Relations between Open and Closed Answers in Questionnaires through Text Clustering Evaluation2008In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), 2008, p. 1-7Conference paper (Refereed)
    Abstract [en]

    Open answers in questionnaires contain valuable information that is very time-consuming to analyze manually. We present a method forhypothesis generation from questionnaires based on text clustering. Text clustering is used interactively on the open answers, and the usercan explore the cluster contents. The exploration is guided by automatic evaluation of the clusters against a closed answer regarded as acategorization. This simplifies the process of selecting interesting clusters. The user formulates a hypothesis from the relation betweenthe cluster content and the closed answer categorization. We have applied our method on an open answer regarding occupation comparedto a closed answer on smoking habits. With no prior knowledge of smoking habits in different occupation groups we have generated thehypothesis that farmers smoke less than the average. The hypothesis is supported by several separate surveys. Closed answers are easyto analyze automatically but are restricted and may miss valuable aspects. Open answers, on the other hand, fully capture the dynamicsand diversity of possible outcomes. With our method the process of analyzing open answers becomes feasible.

  • 30.
    Rosell, Magnus
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    The Impact of Phrases in Document Clustering for Swedish2005In: Proceedings of the 15th NODALIDA conference, Joensuu 2005 / [ed] Werner, S., 2005, p. 173-179Conference paper (Refereed)
    Abstract [en]

    We have investigated the impact of using phrases in the vector spacemodel for clustering documents in Swedish in different ways. The investigation is carried out on two textsets from different domains: one set of newspaper articles and one set of medical papers.The use of phrases do not improveresults relative the ordinary use ofwords. The results differ significantly between the text types. Thisindicates that one could benefit from different text representations for different domains although a fundamentally different approach probably would be needed.

  • 31. Samuelsson, Y.
    et al.
    Täckström, O.
    Velupillai, Sumithra
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Eklund, J.
    Fišel, M.
    Saers, M.
    Mixing and blending syntactic and semantic dependencies2008In: CoNLL - Proc. Twelfth Conf. Comput. Nat. Lang. Learn., 2008, p. 248-252Conference paper (Refereed)
    Abstract [en]

    Our system for the CoNLL 2008 shared task uses a set of individual parsers, a set of stand-alone semantic role labellers, and a joint system for parsing and semantic role labelling, all blended together. The system achieved a macro averaged labelled F 1- score of 79.79 (WSJ 80.92, Brown 70.49) for the overall task. The labelled attachment score for syntactic dependencies was 86.63 (WSJ 87.36, Brown 80.77) and the labelled F 1-score for semantic dependencies was 72.94 (WSJ 74.47, Brown 60.18).

  • 32.
    Stewart, R.
    et al.
    United Kingdom.
    Jackson, R.
    United Kingdom.
    Patel, R.
    united Kingdom.
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Gkotsis, G.
    United Kingdom.
    Hoyle, D.
    United Kingdom.
    Knowledge discovery for Deep Phenotyping serious mental illness from Electronic Mental Health records2018In: F1000 Research, E-ISSN 2046-1402, Vol. 7, article id 210Article in journal (Refereed)
    Abstract [en]

    Background: Deep Phenotyping is the precise and comprehensive analysis of phenotypic features in which the individual components of the phenotype are observed and described. In UK mental health clinical practice, most clinically relevant information is recorded as free text in the Electronic Health Record, and offers a granularity of information beyond what is expressed in most medical knowledge bases. The SNOMED CT nomenclature potentially offers the means to model such information at scale, yet given a sufficiently large body of clinical text collected over many years, it is difficult to identify the language that clinicians favour to express concepts. Methods: By utilising a large corpus of healthcare data, we sought to make use of semantic modelling and clustering techniques to represent the relationship between the clinical vocabulary of internationally recognised SMI symptoms and the preferred language used by clinicians within a care setting. We explore how such models can be used for discovering novel vocabulary relevant to the task of phenotyping Serious Mental Illness (SMI) with only a small amount of prior knowledge. Results: 20 403 terms were derived and curated via a two stage methodology. The list was reduced to 557 putative concepts based on eliminating redundant information content. These were then organised into 9 distinct categories pertaining to different aspects of psychiatric assessment. 235 concepts were found to be expressions of putative clinical significance. Of these, 53 were identified having novel synonymy with existing SNOMED CT concepts. 106 had no mapping to SNOMED CT. Conclusions: We demonstrate a scalable approach to discovering new concepts of SMI symptomatology based on real-world clinical observation. Such approaches may offer the opportunity to consider broader manifestations of SMI symptomatology than is typically assessed via current diagnostic frameworks, and create the potential for enhancing nomenclatures such as SNOMED CT based on real-world expressions.

  • 33. Suominen, Hanna
    et al.
    Salanterä, Sanna
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Chapman, Wendy W.
    Savova, Guergana
    Elhadad, Noemie
    Pradhan, Sameer
    South, Brett R.
    Mowery, Danielle L.
    Jones, Gareth J.F.
    Leveling, Johannes
    Kelly, Liadh
    Goeuriot, Lorraine
    Martinez, David
    Zuccon, Guido
    Overview of the ShARe/CLEF eHealth Evaluation Lab 20132013In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization: Proceedings / [ed] Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B., Springer Berlin/Heidelberg, 2013, p. 212-231Conference paper (Refereed)
    Abstract [en]

    Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to support the continuum of care by developing methods and resources that make clinical reports in English easier to understand for patients, and which helps them in finding information related to their condition. This ShARe/CLEFeHealth2013 lab offered student mentoring and shared tasks: identification and normalisation of disorders (1a and 1b) and normalisation of abbreviations and acronyms (2) in clinical reports with respect to terminology standards in healthcare as well as information retrieval (3) to address questions patients may have when reading clinical reports. The focus on patients’ information needs as opposed to the specialised information needs of physicians and other healthcare workers was the main feature of the lab distinguishing it from previous shared tasks. De-identified clinical reports for the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Task 3 were from the Internet and originated from the Khresmoi project. Task 1 annotations originated from the ShARe annotations. For Tasks 2 and 3, new annotations, queries, and relevance assessments were created. 64, 56, and 55 people registered their interest in Tasks 1, 2, and 3, respectively. 34 unique teams (3 members per team on average) participated with 22, 17, 5, and 9 teams in Tasks 1a, 1b, 2 and 3, respectively. The teams were from Australia, China, France, India, Ireland, Republic of Korea, Spain, UK, and USA. Some teams developed and used additional annotations, but this strategy contributed to the system performance only in Task 2. The best systems had the F1 score of 0.75 in Task 1a; Accuracies of 0.59 and 0.72 in Tasks 1b and 2; and Precision at 10 of 0.52 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.

  • 34.
    Svee, Eric-Oluf
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institute, Sweden.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Capturing and Representing Values for Requirements of Personal Health Records2013In: PoEM Short Papers: Short Paper Proceedings of the 6th IFIP WG 8.1 Working Conference on the Practice of Enterprise Modeling (PoEM 2013) / [ed] Janis Grabis, Marite Kirikova, Jelena Zdravkovic, Janis Stirna, 2013, p. 166-175Conference paper (Refereed)
    Abstract [en]

    Patients’ access to their medical records in the form of Personal Health Records (PHRs) is a central part of the ongoing shift in health policy, where patient empowerment is in focus. A survey was conducted to gauge the stakeholder requirements of patients in regards to functionality requests in PHRs. Models from goal-oriented requirements engineering were created to express the values and preferences held by patients in regards to PHRs from this survey. The present study concludes that patient values can be extracted from survey data, allowing the incorporation of values in the common workflow of requirements engineering without extensive reworking.

  • 35.
    Tanushi, Hideyuki
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Dalianis, Hercules
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Duneld, Martin
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska University Hospital, Sweden.
    Skeppstedt, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg2013In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) / [ed] Stephan Oepen, Kristin Hagen, Janne Bondi Johannessen, Linköping: Linköping University Electronic Press , 2013, p. 387-474Conference paper (Refereed)
    Abstract [en]

    Negation detection is a key component in clinical information extraction systems, as health record text contains reasonings in which the physician excludes different diagnoses by negating them. Many systems for negation detection rely on negation cues (e.g. not), but only few studies have investigated if the syntactic structure of the sentences can be used for determining the scope of these cues. We have in this paper compared three different systems for negation detection in Swedish clinical text (NegEx, PyConTextNLP and SynNeg), which have different approaches for determining the scope of negation cues. NegEx uses the distance between the cue and the disease, PyConTextNLP relies on a list of conjunctions limiting the scope of a cue, and in SynNeg the boundaries of the sentence units, provided by a syntactic parser, limit the scope of the cues. The three systems produced similar results, detecting negation with an F-score of around 80%, but using a parser had advantages when handling longer, complex sentences or short sentences with contradictory statements.

    Download full text (pdf)
    fulltext
  • 36.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Automatic Classification of Factuality Levels: A Case Study on Swedish Diagnoses and the Impact of Local Context2011In: The Fourth International Symposium on Languages in Biology and Medicine, Singapore, 2011Conference paper (Refereed)
    Abstract [en]

    Clinicians express different levels of knowledge certainty when reasoning about a patient’s status. Automatic extraction of relevant information is crucial in the clinical setting, which means that factuality levels need to be distinguished. We present an automatic classifier using Conditional Random Fields, which is trained and tested on a Swedish clinical corpus annotated for factuality levels at a diagnosis statement level: the Stockholm EPR Diagnosis-Factuality Corpus. The classifier obtains promising results (best overall results are 0.699 average F-measure using all classes, 0.762 F-measure using merged classes), using simple local context features. Preceding context is more useful than posterior, although best results are obtained using a window size of +/-4. Lower levels of certainty are more problematic than higher levels, which was also the case for the human annotators in creating the corpus. A manual error analysis shows that conjunctions and other higher-level features are common sources of errors.

    Download full text (pdf)
    fulltext
  • 37.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Shades of Certainty: Annotation and Classification of Swedish Medical Records2012Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Access to information is fundamental in health care. This thesis presents research on Swedish medical records with the overall goal of building intelligent information access tools that can aid health personnel, researchers and other professions in their daily work, and, ultimately, improve health care in general.

    The issue of ethics and identifiable information is addressed by creating an annotated gold standard corpus and porting an existing de-identification system to Swedish from English. The aim is to move towards making textual resources available to researchers without risking exposure of patients’ confidential information. Results for the rule-based system are not encouraging, but results for the gold standard are fairly high.

    Affirmed, uncertain and negated information needs to be distinguished when building accurate information extraction tools. Annotation models are created, with the aim of building automated systems. One model distinguishes certain and uncertain sentences, and is applied on medical records from several clinical departments. In a second model, two polarities and three levels of certainty are applied on diagnostic statements from an emergency department. Overall results are promising. Differences are seen depending on clinical practice, annotation task and level of domain expertise among the annotators.

    Using annotated resources for automatic classification is studied. Encouraging overall results using local context information are obtained. The fine-grained certainty levels are used for building classifiers for real-world e-health scenarios.

    This thesis contributes two annotation models of certainty and one of identifiable information, applied on Swedish medical records. A deeper understanding of the language use linked to conveying certainty levels is gained. Three annotated resources that can be used for further research have been created, and implications for automated systems are presented.

    Download full text (pdf)
    fulltext
  • 38.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Towards A Better Understanding of Uncertainties and Speculations in Swedish Clinical Text – Analysis of an Initial Annotation Trial2010In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, University of Antwerpen , 2010, p. 14-22Conference paper (Other academic)
    Abstract [en]

    In view of the increasing need to facilitate processing the content of scientific papers, we present an annotation scheme for annotating full papers with zones of conceptualisation, reflecting the information structure and knowledge types which constitute a scientific investigation. The latter are the Core Scientific Concepts (CoreSCs) and include Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. The CoreSC scheme has been used to annotate a corpus of 265 full papers in physical chemistry and biochemistry and we are currently automating the recognition of CoreSCs in papers. We discuss how the CoreSC scheme relates to other views of scientific papers and indeed how the former could be used to help identify negation and speculation in scientific texts.

  • 39.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Dalianis, Hercules
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Hassel, Martin
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Nilsson, Gunnar
    Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial2009In: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 78, no 12, p. e19-e26Article in journal (Refereed)
    Abstract [en]

    Background

    Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish.

    Methods

    This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure.

    Results

    This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results.

    Conclusion

    Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.

  • 40. Velupillai, Sumithra
    et al.
    Dalianis, Hercules
    Hassel, Martin
    Nilsson, Gunnar H.
    Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial2009In: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 78, no 12, p. E19-E26Article in journal (Refereed)
    Abstract [en]

    Background: Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. Methods: This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic deidentification trial. Results are measured with precision, recall and F-measure. Results: This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. Conclusion: Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.

  • 41.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Dalianis, Hercules
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Factuality Levels of Diagnoses in Swedish Clinical Text2011In: User Centred Networked Health Care - Proceedings of MIE 2011 / [ed] Anne Moen, Stig Kjær Andersen, Jos Aarts, Petter Hurlen, 2011, p. 559-563Conference paper (Refereed)
    Abstract [en]

    Different levels of knowledge certainty, or factuality levels, are expressed in clinical health record documentation. This information is currently not fully exploited, as the subtleties expressed in natural language cannot easily be machine analyzed. Extracting relevant information from knowledge-intensive resources such as electronic health records can be used for improving health care in general by e.g. building automated information access systems. We present an annotation model of six factuality levels linked to diagnoses in Swedish clinical assessments from an emergency ward. Our main findings are that overall agreement is fairly high (0.7/0.58 F-measure, 0.73/0.6 Cohen's κ, Intra/Inter). These distinctions are important for knowledge models, since only approx. 50% of the diagnoses are affirmed with certainty. Moreover, our results indicate that there are patterns inherent in the diagnosis expressions themselves conveying factuality levels, showing that certainty is not only dependent on context cues.

  • 42.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Duneld, MartinStockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.Henriksson, AronStockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.Kvist, MariaStockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.Skeppstedt, MariaStockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.Dalianis, HerculesStockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Louhi 2014: Special issue on health text mining and information analysis2015Conference proceedings (editor) (Refereed)
  • 43.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Duneld, Martin
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Henriksson, Aron
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Skeppstedt, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Dalianis, Hercules
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Louhi 2014: Special issue on health text mining and information analysis: introduction2015In: BMC Medical Informatics and Decision Making, E-ISSN 1472-6947, Vol. 2, no SI, p. 1-3Article in journal (Refereed)
  • 44.
    Velupillai, Sumithra
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. King's College London, London, United Kingdom.
    Epstein, S.
    Bittar, A.
    Stephenson, T.
    Dutta, R.
    Downs, J.
    Identifying suicidal adolescents from mental health records using natural language processing2019In: 17th World Congress on Medical and Health Informatics, MEDINFO 2019, IOS Press, 2019, Vol. 264, p. 413-417Conference paper (Refereed)
    Abstract [en]

    Suicidal ideation is a risk factor for self-harm, completed suicide and can be indicative of mental health issues. Adolescents are a particularly vulnerable group, but few studies have examined suicidal behaviour prevalence in large cohorts. Electronic Health Records (EHRs) are a rich source of secondary health care data that could be used to estimate prevalence. Most EHR documentation related to suicide risk is written in free text, thus requiring Natural Language Processing (NLP) approaches. We adapted and evaluated a simple lexicon- and rule-based NLP approach to identify suicidal adolescents from a large EHR database. We developed a comprehensive manually annotated EHR reference standard and assessed NLP performance at both document and patient level on data from 200 patients (~5000 documents). We achieved promising results (>80% f1 score at both document and patient level). Simple NLP approaches can be successfully used to identify patients who exhibit suicidal risk behaviour, and our proposed approach could be useful for other populations and settings.

  • 45.
    Velupillai, Sumithra
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Hadlaczky, Gergo
    Karolinska Inst, Natl Ctr Suicide Res & Prevent NASP, Dept Learning Informat Management & Eth LIME, Stockholm, Sweden.;Stockholm Hlth Care Serv SLSO, Natl Ctr Suicide Res & Prevent NASP, Ctr Hlth Econ Informat & Hlth Serv Res CHIS, Stockholm, Sweden..
    Baca-Garcia, Enrique
    IIS Jimenez Diaz Fdn, Dept Psychiat, Madrid, Spain.;Univ Autonoma Madrid, Dept Psychiat, Madrid, Spain.;Gen Hosp Villalba, Dept Psychiat, Madrid, Spain.;Carlos III Inst Hlth, CIBERSAM, Madrid, Spain.;Univ Hosp Rey Juan Carlos, Dept Psychiat, Mostoles, Spain.;Univ Hosp Infanta Elena, Dept Psychiat, Valdemoro, Spain.;Univ Catolica Maule, Dept Psychiat, Talca, Chile..
    Gorrell, Genevieve M.
    Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England..
    Werbeloff, Nomi
    UCL, Div Psychiat, London, England..
    Nguyen, Dong
    Alan Turing Inst, London, England.;Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland..
    Patel, Rashmi
    Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England.;South London & Maudsley NHS Fdn Trust, London, England..
    Leightley, Daniel
    Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England..
    Downs, Johnny
    Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England.;South London & Maudsley NHS Fdn Trust, London, England..
    Hotopf, Matthew
    Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England.;South London & Maudsley NHS Fdn Trust, London, England..
    Dutta, Rina
    Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England.;South London & Maudsley NHS Fdn Trust, London, England..
    Risk Assessment Tools and Data-Driven Approaches for Predicting and Preventing Suicidal Behavior2019In: Frontiers in Psychiatry, E-ISSN 1664-0640, Vol. 10, article id 36Article in journal (Refereed)
    Abstract [en]

    Risk assessment of suicidal behavior is a time-consuming but notoriously inaccurate activity for mental health services globally. In the last 50 years a large number of tools have been designed for suicide risk assessment, and tested in a wide variety of populations, but studies show that these tools suffer from low positive predictive values. More recently, advances in research fields such as machine learning and natural language processing applied on large datasets have shown promising results for health care, and may enable an important shift in advancing precision medicine. In this conceptual review, we discuss established risk assessment tools and examples of novel data-driven approaches that have been used for identification of suicidal behavior and risk. We provide a perspective on the strengths and weaknesses of these applications to mental health-related data, and suggest research directions to enable improvement in clinical practice.

  • 46.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Hassel, Martin
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Dalianis, Hercules
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Finding the Parallel: Automatic Dictionary Construction and Identification of Parallel Text Pairs2010In: Using Corpora in Contrastive and Translation Studies / [ed] edited by Richard Xiao, Newcastle: Cambridge Scholars Publishing , 2010Chapter in book (Other academic)
  • 47.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Ibrahim, Omran
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institute, Sweden.
    Functions for personal health records in Sweden – patient perspectives2013In: Scandinavian Conference on Health Informatics 2013: Copenhagen, Denmark, August 20, 2013 / [ed] Gustav Bellika et al., Linköping: Linköping University Press , 2013, p. 95-95Conference paper (Refereed)
    Abstract [en]

    As part of the ongoing shift in health policy, with focus on patient empowerment, the Swedish government prioritizes the patients’ access to their medical records. Different models for personal health records (PHR) are suggested.

    Studies have shown difficulties for patients when navigating and understanding the information in their records. Electronic health record systems are physician-oriented and do not include patient-oriented functions. One problem with medical records is that they contain a lot of data which is usually kept as unstructured text in narrative form; this information overload needs to be structured and presented in a manner that patients understand. Furthermore, in order for the PHR to be a supporting tool for patients, there is a need to identify which key functions should be implemented to support patients. Usage of PHR is highly dependent on the information offered and that functions available meet patient needs. In Sweden, little research has been conducted regarding PHR functions  referred by patients. This study addresses the research question “Which PHR functions are preferred by patients living in Sweden?”.

    Download full text (pdf)
    fulltext
  • 48.
    Velupillai, Sumithra
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Fine-grained Certainty Level Annotations Used for Coarser-grained E-health Scenarios: Certainty Classication of Diagnostic Statements in Swedish Clinical Text2012In: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II / [ed] Alexander Gelbukh, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2012, p. 450-461Conference paper (Refereed)
    Abstract [en]

    An important task in information access methods is distinguishingfactual information from speculative or negated information.Fine-grained certainty levels of diagnostic statements in Swedish clinicaltext are annotated in a corpus from a medical university hospital.The annotation model has two polarities (positive and negative) andthree certainty levels. However, there are many e-health scenarios wheresuch ne-grained certainty levels are not practical for information extraction.Instead, more coarse-grained groups are needed. We presentthree scenarios: adverse event surveillance, decision support alerts andautomatic summaries and collapse the ne-grained certainty level classi-cations into coarser-grained groups. We build automatic classiers foreach scenario and analyze the results quantitatively. Annotation discrepanciesare analyzed qualitatively through manual corpus analysis. Ourmain ndings are that it is feasible to use a corpus of ne-grained certaintylevel annotations to build classiers for coarser-grained real-worldscenarios: 0.89, 0.91 and 0.8 F-score (overall average).

  • 49.
    Velupillai, Sumithra
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Mowery, D.
    Conway, M.
    Hurdle, J.
    Kious, B.
    Vocabulary development to support information extraction of substance abuse from psychiatry notes2016In: BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Association for Computational Linguistics (ACL) , 2016, p. 92-101Conference paper (Refereed)
    Abstract [en]

    Extracting information from mental health records can be useful for large-scale clinical studies (e.g., to predict medication adherence or to understand medication effects) in this clinical specialty largely underserved by the Natural Language Processing (NLP) community. Vocabularies that contain medical terms for specific clinical use-cases, such as signs, symptoms, histories, social risk factors, are valuable resources for the development of NLP systems that aid clinicians in extracting information from text. Substance abuse is an important variable for many clinical use-cases, but, to our knowledge, there are no publicly available vocabularies that cover these types of terms. In this study, we apply and combine three methods for generating vocabularies related to substance abuse. We propose a simple and systematic method to generate highly relevant vocabularies and evaluate these vocabularies with respect to size and content, as well as coverage and relevance when applied to authentic psychiatric notes.

  • 50.
    Velupillai, Sumithra
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Mowery, Danielle
    Conway, Mike
    Hurdle, John
    Kious, Brent
    Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notest2016In: Proceedings of BioNLP 2016, Association for Computational Linguistics , 2016, p. 92-101Conference paper (Refereed)
    Abstract [en]

    Extracting information from mental health records can be useful for large-scale clinical studies (e.g., to predict medication adherence or to understand medication effects) in this clinical specialty largely underserved by the Natural Language Processing (NLP) community. Vocabularies that contain medical terms for specific clinical use-cases, such as signs, symptoms, histories, social risk factors, are valuable resources for the development of NLP systems that aid clinicians in extracting information from text. Substance abuse is an important variable for many clinical use-cases, but, to our knowledge, there are no publicly available vocabularies that cover these types of terms. In this study, we apply and combine three methods for generating vocabularies related to substance abuse. We propose a simple and systematic method to generate highly relevant vocabularies and evaluate these vocabularies with respect to size and content, as well as coverage and relevance when applied to authentic psychiatric notes.

12 1 - 50 of 59
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf