Digitala Vetenskapliga Arkivet

Change search
Refine search result
123 1 - 50 of 141
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abid, Nosheen
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Wedin, Jacob
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Paszkowsky, Nuria Agues
    Research Institutes of Sweden, Sweden.
    Shafait, Faisal
    Deep Learning Lab, National Center of Artificial Intelligence, National University of Sciences and Technology, Pakistan; School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Pakistan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    UCL: Unsupervised Curriculum Learning for Utility Pole Detection from Aerial Imagery2022In: Proceedings of the Digital Image Computing: Technqiues and Applications (DICTA), IEEE, 2022Conference paper (Refereed)
    Abstract [en]

    This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole detection. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores with significance test having p−value<0.00005 on the utility pole dataset.

    Download full text (pdf)
    fulltext
  • 2.
    Abid, Nosheen
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Deep Learning Lab, National Center of Artificial Intelligence, National University of Sciences and Technology, Pakistan; School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Pakistan.
    Malik, Muhammad Imran
    Deep Learning Lab, National Center of Artificial Intelligence, National University of Sciences and Technology, Pakistan; School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Pakistan.
    Shahzad, Muhammad
    Deep Learning Lab, National Center of Artificial Intelligence, National University of Sciences and Technology, Pakistan; School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Pakistan; Technical University of Munich (TUM), Munich, Germany.
    Shafait, Faisal
    Deep Learning Lab, National Center of Artificial Intelligence, National University of Sciences and Technology, Pakistan; School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Pakistan.
    Ali, Haider
    Engineering, TU, Kaiserslautern, Germany.
    Ghaffar, Muhammad Mohsin
    Johns Hopkins University, USA.
    Weis, Christian
    Johns Hopkins University, USA.
    Wehn, Norbert
    Johns Hopkins University, USA.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Burnt Forest Estimation from Sentinel-2 Imagery of Australia using Unsupervised Deep Learning2021In: Proceedings of the Digital Image Computing: Technqiues and Applications (DICTA), IEEE, 2021, p. 74-81Conference paper (Refereed)
    Abstract [en]

    Massive wildfires not only in Australia, but also worldwide are burning millions of hectares of forests and green land affecting the social, ecological, and economical situation. Widely used indices-based threshold methods like Normalized Burned Ratio (NBR) require a huge amount of data preprocessing and are specific to the data capturing source. State-of-the-art deep learning models, on the other hand, are supervised and require domain experts knowledge for labeling the data in huge quantity. These limitations make the existing models difficult to be adaptable to new variations in the data and capturing sources. In this work, we have proposed an unsupervised deep learning based architecture to map the burnt regions of forests by learning features progressively. The model considers small patches of satellite imagery and classifies them into burnt and not burnt. These small patches are concatenated into binary masks to segment out the burnt region of the forests. The proposed system is composed of two modules: 1) a state-of-the-art deep learning architecture for feature extraction and 2) a clustering algorithm for the generation of pseudo labels to train the deep learning architecture. The proposed method is capable of learning the features progressively in an unsupervised fashion from the data with pseudo labels, reducing the exhausting efforts of data labeling that requires expert knowledge. We have used the realtime data of Sentinel-2 for training the model and mapping the burnt regions. The obtained F1-Score of 0.87 demonstrates the effectiveness of the proposed model.

    Download (pdf)
    attachment
  • 3.
    Abid, Nosheen
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Noman, Md Kislu
    Centre for AI and ML, School of Science, Edith Cowan University, Joondalup, WA, Australia.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Islam, Syed Mohammed Shamsul
    Centre for AI and ML, School of Science, Edith Cowan University, Joondalup, WA, Australia.
    Adewumi, Tosin
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. EISLAB Machine Learning, Luleå University of Technology, Luleå, Sweden.
    Lavery, Paul
    Centre for Marine Ecosystems Research, School of Sciences, Edith Cowan University, Joondalup, WA, Australia; Centro de Estudios Avanzados de Blanes, Consejo Superior de Investigaciones Cient´ ıficas, Blanes, Spain.
    Shafait, Faisal
    Deep Learning Lab, National Center of Artificial Intelligence, National University of Sciences and Technology, Islamabad, Pakistan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Seagrass classification using unsupervised curriculum learning (UCL)2024In: Ecological Informatics, ISSN 1574-9541, E-ISSN 1878-0512, Vol. 83, article id 102804Article in journal (Refereed)
    Abstract [en]

    Seagrass ecosystems are pivotal in marine environments, serving as crucial habitats for diverse marine species and contributing significantly to carbon sequestration. Accurate classification of seagrass species from underwater images is imperative for monitoring and preserving these ecosystems. This paper introduces Unsupervised Curriculum Learning (UCL) to seagrass classification using the DeepSeagrass dataset. UCL progressively learns from simpler to more complex examples, enhancing the model's ability to discern seagrass features in a curriculum-driven manner. Experiments employing state-of-the-art deep learning architectures, convolutional neural networks (CNNs), show that UCL achieved overall 90.12 % precision and 89 % recall, which significantly improves classification accuracy and robustness, outperforming some traditional supervised learning approaches like SimCLR, and unsupervised approaches like Zero-shot CLIP. The methodology of UCL involves four main steps: high-dimensional feature extraction, pseudo-label generation through clustering, reliable sample selection, and fine-tuning the model. The iterative UCL framework refines CNN's learning of underwater images, demonstrating superior accuracy, generalization, and adaptability to unseen seagrass and background samples of undersea images. The findings presented in this paper contribute to the advancement of seagrass classification techniques, providing valuable insights into the conservation and management of marine ecosystems. The code and dataset are made publicly available and can be assessed here: https://github.com/nabid69/Unsupervised-Curriculum-Learning—UCL.

     

    Download full text (pdf)
    fulltext
  • 4.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Brännvall, Rickard
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. RISE Research Institutes of Sweden.
    Abid, Nosheen
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Pahlavan, Maryam
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Sabah Sabry, Sana
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Småprat: DialoGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning2022In: Proceedings of the Northern Lights Deep Learning Workshop 2022 / [ed] Sigurd Løkse, Benjamin Ricaud, Septentrio Academic Publishing , 2022, Vol. 3Conference paper (Refereed)
    Abstract [en]

    Building open-domain conversational systems (or chatbots) that produce convincing responses is a recognized challenge. Recent state-of-the-art (SoTA) transformer-based models for the generation of natural language dialogue have demonstrated impressive performance in simulating human-like, single-turn conversations in English.This work investigates, by an empirical study, the potential for transfer learning of such models to Swedish language. DialoGPT, an English language pre-trained model, is adapted by training on three different Swedish language conversational datasets obtained from publicly available sources: Reddit, Familjeliv and the GDC. Perplexity score (an automated intrinsic metric) and surveys by human evaluation were used to assess the performances of the fine-tuned models. We also compare the DialoGPT experiments with an attention-mechanism-based seq2seq baseline model, trained on the GDC dataset. The results indicate that the capacity for transfer learning can be exploited with considerable success. Human evaluators asked to score the simulated dialogues judged over 57% of the chatbot responses to be human-like for the model trained on the largest (Swedish) dataset. The work agrees with the hypothesis that deep monolingual models learn some abstractions which generalize across languages. We contribute the codes, datasets and model checkpoints and host the demos on the HuggingFace platform.

  • 5.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Conversational Systems in Machine Learning from the Point of View of the Philosophy of Science—Using Alime Chat and Related Studies2019In: Philosophies, ISSN 2409-9287, Vol. 4, no 3, article id 41Article in journal (Refereed)
    Abstract [en]

    This essay discusses current research efforts in conversational systems from the philosophy of science point of view and evaluates some conversational systems research activities from the standpoint of naturalism philosophical theory. Conversational systems or chatbots have advanced over the decades and now have become mainstream applications. They are software that users can communicate with, using natural language. Particular attention is given to the Alime Chat conversational system, already in industrial use, and the related research. The competitive nature of systems in production is a result of different researchers and developers trying to produce new conversational systems that can outperform previous or state-of-the-art systems. Different factors affect the quality of the conversational systems produced, and how one system is assessed as being better than another is a function of objectivity and of the relevant experimental results. This essay examines the research practices from, among others, Longino’s view on objectivity and Popper’s stand on falsification. Furthermore, the need for qualitative and large datasets is emphasized. This is in addition to the importance of the peer-review process in scientific publishing, as a means of developing, validating, or rejecting theories, claims, or methodologies in the research community. In conclusion, open data and open scientific discussion fora should become more prominent over the mere publication-focused trend.

  • 6.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora2020Conference paper (Refereed)
    Abstract [en]

    In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles. We evaluate embeddings based on two Swedish corpora: The Gigaword and Wikipedia, in analogy (intrinsic) tests and discover that the embeddings from the Wikipedia corpus generally outperform those from the Gigaword corpus, which is a bigger corpus. Downstream tests will be required to have a definite evaluation.

    Download full text (pdf)
    fulltext
  • 7.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Exploring Swedish & English fastText Embeddings2022In: Artificial Intelligence and Cognition 2022: Proceedings of the 8th International Workshop on Artificial Intelligence and Cognition / [ed] Hadi Banaee, Amy Loutfi, Alessandro Saffiotti, Antonio Lieto, 2022, Vol. 3400, p. 201-208Conference paper (Refereed)
    Abstract [en]

    In this paper, we show that embeddings from relatively smaller corpora sometimes outperform thosefrom larger corpora and we introduce a new Swedish analogy test set and make it publicly available.To achieve good performance in Natural Language Processing (NLP) downstream tasks, several factorsplay important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We utilizethe fastText tool for our experiments. We evaluate both the Swedish and English embeddings that wecreated using intrinsic evaluation (including analogy & Spearman correlation) and compare them with2 common, publicly available embeddings. Our English continuous Bag-of-Words (CBoW)-negativesampling embedding shows better performance compared to the publicly available GoogleNews version.We also describe the relationship between NLP and cognitive science. We contribute the embeddings forresearch or other useful purposes by publicly releasing them.

    Download full text (pdf)
    fulltext
  • 8.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Exploring Swedish & English fastText Embeddings for NER with the TransformerManuscript (preprint) (Other academic)
    Abstract [en]

    In this paper, our main contributions are that embeddings from relatively smaller corpora can outperform ones from far larger corpora and we present the new Swedish analogy test set. To achieve a good network performance in natural language processing (NLP) downstream tasks, several factors play important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We show that, with the right set of hyper-parameters, good network performance can be reached even on smaller datasets. We evaluate the embeddings at the intrinsic level and extrinsic level, by deploying them on the Transformer in named entity recognition (NER) task and conduct significance tests. This is done for both Swedish and English. We obtain better performance in both languages on the downstream task with far smaller training data, compared to recently released, common crawl versions; and character n-grams appear useful for Swedish, a morphologically rich language.

  • 9.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Vector Representations of Idioms in Conversational Systems2022In: Sci, E-ISSN 2413-4155, Vol. 4, no 4, article id 37Article in journal (Refereed)
    Abstract [en]

    In this study, we demonstrate that an open-domain conversational system trained on idioms or figurative language generates more fitting responses to prompts containing idioms. Idioms are a part of everyday speech in many languages and across many cultures, but they pose a great challenge for many natural language processing (NLP) systems that involve tasks such as information retrieval (IR), machine translation (MT), and conversational artificial intelligence (AI). We utilized the Potential Idiomatic Expression (PIE)-English idiom corpus for the two tasks that we investigated: classification and conversation generation. We achieved a state-of-the-art (SoTA) result of a 98% macro F1 score on the classification task by using the SoTA T5 model. We experimented with three instances of the SoTA dialogue model—the Dialogue Generative Pre-trained Transformer (DialoGPT)—for conversation generation. Their performances were evaluated by using the automatic metric, perplexity, and a human evaluation. The results showed that the model trained on the idiom corpus generated more fitting responses to prompts containing idioms 71.9% of the time in comparison with a similar model that was not trained on the idiom corpus. We have contributed the model checkpoint/demo/code to the HuggingFace hub for public access.

  • 10.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Word2Vec: Optimal hyperparameters and their impact on natural language processing downstream tasks2022In: Open Computer Science, E-ISSN 2299-1093, Vol. 12, no 1, p. 134-141Article in journal (Refereed)
    Abstract [en]

    Word2Vec is a prominent model for natural language processing tasks. Similar inspiration is found in distributed embeddings (word-vectors) in recent state-of-the-art deep neural networks. However, wrong combination of hyperparameters can produce embeddings with poor quality. The objective of this work is to empirically show that Word2Vec optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the publicly released, original Word2Vec embedding. Both intrinsic and extrinsic (downstream) evaluations are carried out, including named entity recognition and sentiment analysis. Our main contributions include showing that the best model is usually task-specific, high analogy scores do not necessarily correlate positively with F1 scores, and performance is not dependent on data size alone. If ethical considerations to save time, energy, and the environment are made, then relatively smaller corpora may do just as well or even better in some cases. Increasing the dimension size of embeddings after a point leads to poor quality or performance. In addition, using a relatively small corpus, we obtain better WordSim scores, corresponding Spearman correlation, and better downstream performances (with significance tests) compared to the original model, which is trained on a 100 billion-word corpus.

  • 11.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream TasksManuscript (preprint) (Other academic)
    Abstract [en]

    Word2Vec is a prominent model for natural language processing (NLP) tasks. Similar nspiration is found in distributed embeddings for new state-of-the-art (SotA) deep neural networks.  However, wrong combination of hyper-parameters can produce poor quality vectors. The objective of this work is to empirically show optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the released, pre-trained original word2vec model. Both intrinsic and extrinsic (downstream) evaluations, including named entity recognition (NER) and sentiment analysis (SA) were carried out. The downstream tasks reveal that the best model is usually task-specific, high analogy scores don’t necessarily correlate positively with F1 scores and the same applies to focus on data alone. Increasing vector dimension size after a point leads to poor quality or performance. If ethical considerations to save time, energy and the environment are made, then reasonably smaller corpora may do just as well or even better in some cases. Besides, using a small corpus, we obtain better human-assigned WordSim scores, corresponding Spearman correlation and better downstream performances (with significance tests) compared to the original model, trained on 100 billion-word corpus.

  • 12.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Inner For-Loop for Speeding Up Blockchain Mining2020In: Open Computer Science, E-ISSN 2299-1093, Vol. 10, no 1, p. 42-47Article in journal (Refereed)
    Abstract [en]

    In this paper, the authors propose to increase the efficiency of blockchain mining by using a population-based approach. Blockchain relies on solving difficult mathematical problems as proof-of-work within a network before blocks are added to the chain. Brute force approach, advocated by some as the fastest algorithm for solving partial hash collisions and implemented in Bitcoin blockchain, implies exhaustive, sequential search. It involves incrementing the nonce (number) of the header by one, then taking a double SHA-256 hash at each instance and comparing it with a target value to ascertain if lower than that target. It excessively consumes both time and power. In this paper, the authors, therefore, suggest using an inner for-loop for the population-based approach. Comparison shows that it’s a slightly faster approach than brute force, with an average speed advantage of about 1.67% or 3,420 iterations per second and 73% of the time performing better. Also, we observed that the more the total particles deployed, the better the performance until a pivotal point. Furthermore, a recommendation on taming the excessive use of power by networks, like Bitcoin’s, by using penalty by consensus is suggested.

    Download full text (pdf)
    fulltext
  • 13.
    Adewumi, Oluwatosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Sabry, Sana Sabah
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Abid, Nosheen
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    T5 for Hate Speech, Augmented Data, and Ensemble2023In: Sci, E-ISSN 2413-4155, Vol. 5, no 4, article id 37Article in journal (Refereed)
    Abstract [en]

    We conduct relatively extensive investigations of automatic hate speech (HS) detection using different State-of-The-Art (SoTA) baselines across 11 subtasks spanning six different datasets. Our motivation is to determine which of the recent SoTA models is best for automatic hate speech detection and what advantage methods, such as data augmentation and ensemble, may have on the best model, if any. We carry out six cross-task investigations. We achieve new SoTA results on two subtasks—macro F1 scores of 91.73% and 53.21% for subtasks A and B of the HASOC 2020 dataset, surpassing previous SoTA scores of 51.52% and 26.52%, respectively. We achieve near-SoTA results on two others—macro F1 scores of 81.66% for subtask A of the OLID 2019 and 82.54% for subtask A of the HASOC 2021, in comparison to SoTA results of 82.9% and 83.05%, respectively. We perform error analysis and use two eXplainable Artificial Intelligence (XAI) algorithms (Integrated Gradient (IG) and SHapley Additive exPlanations (SHAP)) to reveal how two of the models (Bi-Directional Long Short-Term Memory Network (Bi-LSTM) and Text-to-Text-Transfer Transformer (T5)) make the predictions they do by using examples. Other contributions of this work are: (1) the introduction of a simple, novel mechanism for correcting Out-of-Class (OoC) predictions in T5, (2) a detailed description of the data augmentation methods, and (3) the revelation of the poor data annotations in the HASOC 2021 dataset by using several examples and XAI (buttressing the need for better quality control). We publicly release our model checkpoints and codes to foster transparency.

    Download full text (pdf)
    fulltext
  • 14.
    Adewumi, Tosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Masakhane.
    Adeyemi, Mofetoluwa
    Masakhane.
    Anuoluwapo, Aremu
    Masakhane.
    Peters, Bukola
    CIS.
    Buzaaba, Happy
    Masakhane.
    Samuel, Oyerinde
    Masakhane.
    Rufai, Amina Mardiyyah
    Masakhane.
    Ajibade, Benjamin
    Masakhane.
    Gwadabe, Tajudeen
    Masakhane.
    Koulibaly Traore, Mory Moussou
    Masakhane.
    Ajayi, Tunde Oluwaseyi
    Masakhane.
    Muhammad, Shamsuddeen
    Baruwa, Ahmed
    Masakhane.
    Owoicho, Paul
    Masakhane.
    Ogunremi, Tolulope
    Masakhane.
    Ngigi, Phylis
    Jomo Kenyatta University of Agriculture and Technology.
    Ahia, Orevaoghene
    Masakhane.
    Nasir, Ruqayya
    Masakhane.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages2023In: IJCNN 2023 - International Joint Conference on Neural Networks, Conference Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2023Conference paper (Refereed)
    Abstract [en]

    Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

  • 15.
    Adewumi, Tosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Alkhaled, Lama
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Mokayed, Hamam
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizingand Condescending Language2022In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) / [ed] Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan, Association for Computational Linguistics , 2022, p. 473-478Conference paper (Refereed)
    Abstract [en]

    This paper describes the system used by the Machine Learning Group of LTU in subtask 1 of the SemEval-2022 Task 4: Patronizing and Condescending Language (PCL) Detection. Our system consists of finetuning a pretrained text-to-text transfer transformer (T5) and innovatively reducing its out-of-class predictions. The main contributions of this paper are 1) the description of the implementation details of the T5 model we used, 2) analysis of the successes & struggles of the model in this task, and 3) ablation studies beyond the official submission to ascertain the relative importance of data split. Our model achieves an F1 score of 0.5452 on the official test set.

  • 16.
    Adewumi, Tosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    State-of-the-Art in Open-Domain Conversational AI: A Survey2022In: Information, E-ISSN 2078-2489, Vol. 13, no 6, article id 298Article, review/survey (Refereed)
    Abstract [en]

    We survey SoTA open-domain conversational AI models with the objective of presenting the prevailing challenges that still exist to spur future research. In addition, we provide statistics on the gender of conversational AI in order to guide the ethics discussion surrounding the issue. Open-domain conversational AI models are known to have several challenges, including bland, repetitive responses and performance degradation when prompted with figurative language, among others. First, we provide some background by discussing some topics of interest in conversational AI. We then discuss the method applied to the two investigations carried out that make up this study. The first investigation involves a search for recent SoTA open-domain conversational AI models, while the second involves the search for 100 conversational AI to assess their gender. Results of the survey show that progress has been made with recent SoTA conversational AI, but there are still persistent challenges that need to be solved, and the female gender is more common than the male for conversational AI. One main takeaway is that hybrid models of conversational AI offer more advantages than any single architecture. The key contributions of this survey are (1) the identification of prevailing challenges in SoTA open-domain conversational AI, (2) the rarely held discussion on open-domain conversational AI for low-resource languages, and (3) the discussion about the ethics surrounding the gender of conversational AI.

  • 17.
    Adewumi, Tosin P.
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    The Challenge of Diacritics in Yorùbá Embeddings2020In: ML4D 2020 Proceedings / [ed] Tejumade Afonja; Konstantin Klemmer; Aya Salama; Paula Rodriguez Diaz; Niveditha Kalavakonda; Oluwafemi Azeez, Neural Information Processing Systems Foundation , 2020, article id 2011.07605Conference paper (Refereed)
    Abstract [en]

    The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation.The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wikipedia dataset but with the second one normalized to be undiacritized. We further compare average intrinsic performance with two other work (using analogy test set & WordSim) and we obtain the best performance in WordSim and corresponding Spearman correlation.

    Download full text (pdf)
    fulltext
  • 18.
    Adewumi, Tosin P.
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Vector Representations of Idioms in Chatbots2020In: Proceedings: SAIS Workshop 2020, Chalmers University of Technology , 2020Conference paper (Refereed)
    Abstract [en]

    Open-domain chatbots have advanced but still have many gaps. My PhD aims to solve a few of those gaps by creating vector representations of idioms (figures of speech) that will be beneficial to chatbots and natural language processing (NLP), generally. In the process, new, optimal fastText embeddings in Swedish and English have been created and the first Swedish analogy test set, larger than the Google original, for intrinsic evaluation of Swedish embeddings has also been produced. Major milestones have been attained and others are soon to follow. The deliverables of this project will give NLP researchers the opportunity to measure the quality of Swedish embeddings easily and advance state-of-the-art (SotA) in NLP.

    Download full text (pdf)
    fulltext
  • 19.
    Adewumi, Tosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Södergren, Isabella
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Digital Services and Systems.
    Alkhaled, Lama
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Sabry, Sana Sabah
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Bipol: Multi-axes Evaluation of Bias with Explainability in BenchmarkDatasets2023In: Proceedings of Recent Advances in Natural Language Processing / [ed] Galia Angelova, Maria Kunilovskaya and Ruslan Mitkov, Incoma Ltd. , 2023, p. 1-10Conference paper (Refereed)
    Abstract [en]

    We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labeled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.

  • 20.
    Adewumi, Tosin
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Vadoodi, Roshanak
    Luleå University of Technology, Department of Civil, Environmental and Natural Resources Engineering, Geosciences and Environmental Engineering.
    Tripathy, Aparajita
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Nikolaidou, Konstantina
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms2022In: Proceedings of the 13th Language Resources and Evaluation Conference / [ed] Nicoletta Calzolari; Frédéric Béchet; Philippe Blache; Khalid Choukri; Christopher Cieri; Thierry Declerck; Sara Goggi; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Hélène Mazo; Jan Odijk; Stelios Piperidis, European Language Resources Association (ELRA) , 2022, p. 689-696Conference paper (Refereed)
    Abstract [en]

    We present a fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English. The challenges with NLP systems with regards to tasks such as Machine Translation (MT), word sense disambiguation (WSD) and information retrieval make it imperative to have a labelled idioms dataset with classes such as it is in this work. To the best of the authors’ knowledge, this is the first idioms corpus with classes of idioms beyond the literal and the general idioms classification. Inparticular, the following classes are labelled in the dataset: metaphor, simile, euphemism, parallelism, personification, oxymoron, paradox, hyperbole, irony and literal. We obtain an overall inter-annotator agreement (IAA) score, between two independent annotators, of 88.89%. Many past efforts have been limited in the corpus size and classes of samples but this dataset contains over 20,100 samples with almost 1,200 cases of idioms (with their meanings) from 10 classes (or senses). The corpus may also be extended by researchers to meet specific needs. The corpus has part of speech (PoS) tagging from the NLTK library. Classification experiments performed on the corpus to obtain a baseline and comparison among three common models, including the state-of-the-art (SoTA) BERT model, give good results. We also make publicly available the corpus and the relevant codes for working with it for NLP tasks.

  • 21.
    Agües Paszkowsky, Núria
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Research Institutes of Sweden, Unit for Data Center Systems and Applied Data Science, Sweden.
    Brännvall, Rickard
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Research Institutes of Sweden, Unit for Data Center Systems and Applied Data Science, Sweden.
    Carlstedt, Johan
    Research Institutes of Sweden, Unit for Data Center Systems and Applied Data Science, Sweden.
    Milz, Mathias
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Space Technology.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Vegetation and Drought Trends in Sweden’s Mälardalen Region – Year-on-Year Comparison by Gaussian Process Regression2020In: 2020 Swedish Workshop on Data Science (SweDS), IEEE, 2020Conference paper (Refereed)
    Abstract [en]

    This article describes analytical work carried out in a pilot project for the Swedish Space Data Lab (SSDL), which focused on monitoring drought in the Mälardalen region in central Sweden. Normalized Difference Vegetation Index (NDVI) and the Moisture Stress Index (MSI) – commonly used to analyse drought – are estimated from Sentinel 2 satellite data and averaged over a selection of seven grassland areas of interest. To derive a complete time-series over a season that interpolates over days with missing data, we use Gaussian Process Regression, a technique from multivariate Bayesian analysis. The analysis show significant differences at 95% confidence for five out of seven areas when comparing the peak drought period in the dry year 2018 compared to the corresponding period in 2019. A cross-validation analysis indicates that the model parameter estimates are robust for temporal covariance structure (while inconclusive for the spatial dimensions). There were no signs of over-fitting when comparing in-sample and out-of-sample RMSE.

  • 22. Ahmad, Riaz
    et al.
    Afzal, Muhammad Zeshan
    Rashid, Sheikh Faisal
    Liwicki, Marcus
    DFKI Kaiserslautern, Germany .
    Breuel, Thomas
    Scale and Rotation Invariant OCR for Pashto Cursive Script using MDLSTM Network2015In: 13th International Conference on Document Analysis and Recognition, IEEE , 2015, p. 1101-1105Conference paper (Refereed)
    Abstract [en]

    Optical Character Recognition (OCR) of cursive scripts like Pashto and Urdu is difficult due the presence of complex ligatures and connected writing styles. In this paper, we evaluate and compare different approaches for the recognition of such complex ligatures. The approaches include Hidden Markov Model (HMM), Long Short Term Memory (LSTM) network and Scale Invariant Feature Transform (SIFT). Current state of the art in cursive script assumes constant scale without any rotation, while real world data contain rotation and scale variations. This research aims to evaluate the performance of sequence classifiers like HMM and LSTM and compare their performance with descriptor based classifier like SIFT. In addition, we also assess the performance of these methods against the scale and rotation variations in cursive script ligatures. Moreover, we introduce a database of 480,000 images containing 1000 unique ligatures or sub-words of Pashto. In this database, each ligature has 40 scale and 12 rotation variations. The evaluation results show a significantly improved performance of LSTM over HMM and traditional feature extraction technique such as SIFT. Keywords.

    Download full text (pdf)
    fulltext
  • 23.
    Ahmad, Riaz
    et al.
    DFKI, Kaiserslautern, Germany.
    Afzal, Muhammad Zeshan
    DFKI, Kaiserslautern, Germany.
    Rashid, Sheikh Faisal
    DFKI, Kaiserslautern, Germany.
    Liwicki, Marcus
    University in Fribourg, Switzerland.
    Dengel, Andreas
    DFKI, Kaiserslautern, Germany.
    Breuel, Thomas
    TU, Kaiserslautern, Germany.
    Recognizable units in Pashto language for OCR2015In: 13th International Conference on Document Analysis and Recognition, IEEE , 2015, p. 1246-1250Conference paper (Refereed)
    Abstract [en]

    Atomic segmentation of cursive scripts into con- stituent characters is one of the most challenging problems in pattern recognition. To avoid segmentation in cursive script, concrete shapes are considered as recognizable units. Therefore, the objective of this work is to find out the alternate recognizable units in Pashto cursive script. These alternatives are ligatures and primary ligatures. However, we need sound statistical analysis to find the appropriate numbers of ligatures and primary ligatures in Pashto script. In this work, a corpus of 2, 313, 736 Pashto words are extracted from a large scale diversified web sources, and total of 19, 268 unique ligatures have been identified in Pashto cursive script. Analysis shows that only 7000 ligatures represent 91% portion of overall corpus of the Pashto unique words. Similarly, about 7, 681 primary ligatures are also identified which represent the basic shapes of all the ligatures.

    Download full text (pdf)
    fulltext
  • 24.
    Ahmad, Riaz
    et al.
    Shaheed Banazir Bhutto University, Sheringal, Pakistan.
    Naz, Saeeda
    Computer Science Department, GGPGC No.1 Abbottabad, Pakistan.
    Afzal, Muhammad
    Mindgarage, University of Kaiserslautern, Germany.
    Rashid, Sheikh
    Al Khwarizmi Institute of Computer Science, UET Lahore, Pakistan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Dengel, Andreas
    German Research Center for Artificial Intelligence (DFKI) in Kaiserslautern, Germany.
    A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT2020In: The International Arab Journal of Information Technology, ISSN 1683-3198, Vol. 17, no 3, p. 299-305Article in journal (Refereed)
    Abstract [en]

    This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

  • 25.
    Ahmed, Muhammad
    et al.
    Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; Mindgrage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany.
    Hashmi, Khurram Azeem
    Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; Mindgrage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany .
    Pagani, Alain
    German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Stricker, Didier
    Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany .
    Afzal, Muhammad Zeshan
    Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; Mindgrage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany.
    Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments2021In: Sensors, E-ISSN 1424-8220, Vol. 21, no 15Article, review/survey (Refereed)
    Abstract [en]

    Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of deep learning-based object detection in challenging environments. However, there is no consolidated reference to cover the state of the art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present a quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.

  • 26.
    Al-Azzawi, Sana
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Nilsson, Filip
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Adewumi, Tosin
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset2023In: 17th International Workshop on Semantic Evaluation, SemEval 2023: Proceedings of the Workshop, Association for Computational Linguistics, 2023, p. 1421-1427Conference paper (Refereed)
  • 27.
    Al-Azzawi, Sana Sabah Sabry
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Mokayed, Hamam
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Chronéer, Diana
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Digital Services and Systems.
    Liwicki, Foteini
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Innovative Education Approach Toward Active Distance Education: a Case Study in the Introduction to AI course2022In: Conference Proceedings. The Future of Education 2022, 2022Conference paper (Refereed)
    Abstract [en]

    In this paper, we first describe various synchronous and asynchronous methods for enhancing student engagement in big online courses. We showcase the implementation of these methods in the “Introduction to Artificial Intelligence (AI)” course at Luleå University of Technology, which has attracted around 500 students in each of its iterations (twice yearly, since 2019). We also show that these methods can be applied efficiently, in terms of the teaching hours required. With the increase in digitization and student mobility, the demand for improved and personalized content delivery for distance education has also increased. This applies not only in the context of traditional undergraduate education, but also in the context of adult education and lifelong learning. This higher level of demand, however, introduces a challenge, especially as it is typically combined with a shortage of staff and needs for efficient education. This challenge is further amplified by the current pandemic situation, which led to an even bigger risk of student-dropout. To mitigate this risk, as well as to meet the increased demand, we applied various methods for creating engaging interaction in our pedagogy based on Moor’s framework: learner-to-learner, learner-to-instructor, and learner-to-content engagement strategies. The main methods of this pedagogy are as follows: short, and interactive videos, active discussions in topic-based forums, regular live sessions with group discussions, and the introduction of optional content at many points in the course, to address different target groups. In this paper, we show how we originally designed and continuously improved the course, without requiring more than 500 teaching hours per iteration (one hour per enrolled student), while we also managed to increase the successful completion rate of the participants by 10%, and improved student engagement and feedback for the course by 50%. We intend to share a set of best-practices applicable to many other e-learning courses in ICT.

    Download full text (pdf)
    fulltext
  • 28.
    Alberti, M.
    et al.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Pondenkandath, V.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Wursch, M.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Ingold, R.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments2018In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR 2018, IEEE, 2018, p. 423-428, article id 8583798Conference paper (Refereed)
    Abstract [en]

    We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source(1), and accessible as Web Service through DIVAServices(2).

  • 29.
    Alberti, Michele
    et al.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland; V7 Ltd, London, United Kingdom.
    Botros, Angela
    ARTORG Center for Biomedical Engineering Research, University of Bern, Switzerland.
    Schütz, Narayan
    ARTORG Center for Biomedical Engineering Research, University of Bern, Switzerland.
    Ingold, Rolf
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Seuret, Mathias
    Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
    Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks2021In: Proceedings of ICPR 2020: 25th International Conference on Pattern Recognition, IEEE, 2021, p. 8204-8211Conference paper (Refereed)
    Abstract [en]

    In this work, we introduce a new architectural component to Neural Network (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.

  • 30.
    Alberti, Michele
    et al.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Pondenkandath, Vinaychandran
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Vögtlin, Lars
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Würsch, Marcel
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland; Institute for Interactive Technologies (IIT), FHNW University of Applied Sciences and Arts Northwestern Switzerland, Switzerland.
    Ingold, Rolf
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Improving Reproducible Deep Learning Workflows with DeepDIVA2019In: Proceedings 6th Swiss Conference on Data Science: SDS2019, IEEE, 2019, p. 13-18Conference paper (Refereed)
    Abstract [en]

    The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and their reproduction. This framework allows researchers to share their experiments with others, while providing functionality that allows for easy experimentation, such as: boilerplate code, experiment management, hyper-parameter optimization, verification of data integrity and visualization of data and results. Additionally, the code of DeepDIVA is well-documented and supported by several tutorials that allow a new user to quickly familiarize themselves with the framework.

  • 31.
    Alberti, Michele
    et al.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Pondenkandath, Vinaychandran
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Würsch, Marcel
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Bouillon, Manuel
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Seuret, Mathias
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Ingold, Rolf
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland.
    Are You Tampering with My Data?2019In: Computer Vision – ECCV 2018 Workshops: Proceedings, Part II / [ed] Laura Leal-Taixé & Stefan Roth, Springer, 2019, p. 296-312Conference paper (Refereed)
    Abstract [en]

    We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) that a universal modification of just one pixel per image for all the images of a class in the training set is enough to corrupt the training procedure of several state-of-the-art deep neural networks, causing the networks to misclassify any images to which the modification is applied. Our aim is to bring to the attention of the machine learning community, the possibility that even learning-based methods that are personally trained on public datasets can be subject to attacks by a skillful adversary.

  • 32.
    Alberti, Michele
    et al.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Vögtlin, Lars
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Pondenkandath, Vinaychandran
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Seuret, Mathias
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland. Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
    Ingold, Rolf
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Labeling, Cutting, Grouping: An Efficient Text Line Segmentation Method for Medieval Manuscripts2019In: The 15th IAPR International Conference on Document Analysis and Recognition: ICDAR 2019, IEEE, 2019, p. 1200-1206Conference paper (Other academic)
    Abstract [en]

    This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.

  • 33.
    Alonso, Pedro
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Shridhar, Kumar
    Department of Computer Science, ETH Zürich, Zürich, Switzerland.
    Kleyko, Denis
    UC Berkeley, Berkeley, USA; Research Institutes of Sweden, Kista, Sweden.
    Osipov, Evgeny
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Computer Science.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing Enabled Embedding of n-gram Statistics2021In: 2021 International Joint Conference on Neural Networks (IJCNN) Proceedings, IEEE, 2021Conference paper (Refereed)
    Abstract [en]

    Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n -gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n -gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, memory reduction was ca. 100 times and train and test speed-ups were over 100 times. Importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting strict dependency between the dimensionality of the representation and n-gram size, thus, opening a room for tradeoffs.

  • 34. Azawi, Mayce Al
    et al.
    Liwicki, Marcus
    MDAM Group DFKI, TU Kaiserslautern, D-67663 Kaiserslautern, Germany .
    Breuel, Thomas M
    Combination of Multiple Aligned Recognition Outputs using WFST and LSTM2015In: 13th International Conference on Document Analysis and Recognition, IEEE , 2015, p. 31-35Conference paper (Refereed)
    Abstract [en]

    The contribution of this paper is a new strategy of integrating multiple recognition outputs of diverse recognizers. Such an integration can give higher performance and more accurate outputs than a single recognition system. The problem of aligning various Optical Character Recognition (OCR) results lies in the difficulties to find the correspondence on character, word, line, and page level. These difficulties arise from segmentation and recognition errors which are produced by the OCRs. Therefore, alignment techniques are required for synchronizing the outputs in order to compare them. Most existing approaches fail when the same error occurs in the multiple OCRs. If the corrections do not appear in one of the OCR approaches are unable to improve the results.We design a Line-to-Page alignment with edit rules using Weighted Finite-State Transducers (WFST). These edit rules are based on edit operations: insertion, deletion, and substitution. Therefore, an approach is designed using Recurrent Neural Networks with Long Short-Term Memory (LSTM) to predict these types of errors. A Character-Epsilon alignment is designed to normalize the size of the strings for the LSTM alignment. The LSTM returns best voting, especially when the heuristic approaches are unable to vote among various OCR engines. LSTM predicts the correct characters, even if the OCR could not produce the characters in the outputs. The approaches are evaluated on OCR’s output from the UWIII and historical German Fraktur dataset which are obtained from state-of-the-art OCR systems. The experiments shows that the error rate of the LSTM approach has the best performance with around 0.40%, while other approaches are between 1,26% and 2,31%.

    Download full text (pdf)
    fulltext
  • 35.
    Barney Smith, Elisa H.
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Peng, Liangrui
    Tsinghua University, Beijing 100084, China.
    Marinai, Simone
    University of Florence, Florence, Italy.
    Editorial for special issue on “advanced topics in document analysis and recognition”2024In: International Journal on Document Analysis and Recognition, ISSN 1433-2833, E-ISSN 1433-2825, Vol. 27, no 3, p. 209-211Article in journal (Other academic)
  • 36.
    Belay, Birhanu
    et al.
    Bahir Dar Institute of Technology, Bahir Dar, Ethiopia.
    Habtegebrial, Tewodros
    Technical University of Kaiserslautern, Kaiserslautern, Germany.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Belay, Gebeyehu
    DFKI, Augmented Vision Department, Kaiserslautern, Germany.
    Stricker, Didier
    DFKI, Augmented Vision Department, Kaiserslautern, Germany.
    A Blended Attention-CTC Network Architecture for Amharic Text-image Recognition2021In: Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM), SciTePress, 2021, p. 435-441Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a blended Attention-Connectionist Temporal Classification (CTC) network architecture for a unique script, Amharic, text-image recognition. Amharic is an indigenous Ethiopic script that uses 34 consonant characters with their 7 vowel variants of each and 50 labialized characters which are derived, with a small change, from the 34 consonant characters. The change involves modifying the structure of these characters by adding a straight line, or shortening and/or elongating one of its main legs including the addition of small diacritics to the right, left, top or bottom of the character. Such a small change affects orthographic identities of character and results in shape similarly among characters which are interesting, but challenging task, for OCR research. Motivated with the recent success of attention mechanism on neural machine translation tasks, we propose an attention-based CTC approach which is designed by blending attention mechanism directly within the CTC network. The proposed model consists of an encoder module, attention module and transcription module in a unified framework. The efficacy of the proposed model on the Amharic language shows that attention mechanism allows learning powerful representations by integrating information from different time steps. Our method outperforms state-of-the-art methods and achieves 1.04% and 0.93% of the character error rate on ADOCR test datasets.

  • 37.
    Belay, Birhanu
    et al.
    DFKI-German Research Center for Artificial Intelligence, University of Kaiserslautern, DE, Kaiserslautern, DE.
    Habtegebrial, Tewodros
    DFKI-German Research Center for Artificial Intelligence, University of Kaiserslautern, DE, Kaiserslautern, DE.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Belay, Gebeyehu
    Bahir Dar Institute of Technology, Ethiopia.
    Stricker, Didier
    DFKI-German Research Center for Artificial Intelligence, University of Kaiserslautern, DE, Kaiserslautern, DE.
    Factored Convolutional Neural Network for Amharic Character Image Recognition2019In: 2019 IEEE International Conference on Image Processing: Proceedings, IEEE, 2019, p. 2906-2910Conference paper (Other academic)
    Abstract [en]

    In this paper we propose a novel CNN based approach for Amharic character image recognition. The proposed method is designed by leveraging the structure of Amharic graphemes. Amharic characters could be decomposed in to a consonant and a vowel. As a result of this consonant-vowel combination structure, Amharic characters lie within a matrix structure called 'Fidel Gebeta'. The rows and columns of 'Fidel Gebeta' correspond to a character's consonant and the vowel components, respectively. The proposed method has a CNN architecture with two classifiers that detect the row/consonant and column/vowel components of a character. The two classifiers share a common feature space before they fork-out at their last layers. The method achieves state-of-the-art result on a synthetically generated dataset. The proposed method achieves 94.97% overall character recognition accuracy.

  • 38.
    Belay, Birhanu
    et al.
    Dept. of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany. Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia.
    Habtegebrial, Tewodros
    Dept. of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Belayl, Gebeyehu
    Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia.
    Stricker, Didier
    DFKI-German Research Center for Artificial Intelligence, Kaiserslautern, Germany.
    Amharic Text Image Recognition: Database, Algorithm, and Analysis2019In: The 15th IAPR International Conference on Document Analysis and Recognition: ICDAR 2019, IEEE, 2019, p. 1268-1273Conference paper (Other academic)
    Abstract [en]

    This paper introduces a dataset for an exotic, but very interesting script, Amharic. Amharic follows a unique syllabic writing system which uses 33 consonant characters with their 7 vowels variants of each. Some labialized characters derived by adding diacritical marks on consonants and or removing part of it. These associated diacritics on consonant characters are relatively smaller in size and challenging to distinguish the derived (vowel and labialized) characters. In this paper we tackle the problem of Amharic text-line image recognition. In this work, we propose a recurrent neural network based method to recognize Amharic text-line images. The proposed method uses Long Short Term Memory (LSTM) networks together with CTC (Connectionist Temporal Classification). Furthermore, in order to overcome the lack of annotated data, we introduce a new dataset that contains 337,332 Amharic text-line images which is made freely available at http://www.dfki.uni-kl.de/~belay/. The performance of the proposed Amharic OCR model is tested by both printed and synthetically generated datasets, and promising results are obtained.

  • 39.
    Belay, Birhanu
    et al.
    Department of Computer Science, University of Kaiserslautern, Germany; Faculty of Computing, Bahir Dar Institute of Technology, Ethiopia.
    Habtegebrial, Tewodros
    Department of Computer Science, University of Kaiserslautern, Germany.
    Meshesha, Million
    School of Information Science, Addis Ababa University, Ethiopia.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Belay, Gebeyehu
    Faculty of Computing, Bahir Dar Institute of Technology, Ethiopia.
    Stricker, Didier
    Department of Computer Science, University of Kaiserslautern, Germany; German Research Center for Artificial Intelligence, DFKI, Germany.
    Amharic OCR: An End-to-End Learning2020In: Applied Sciences, E-ISSN 2076-3417, Vol. 10, no 3, article id 1117Article in journal (Refereed)
    Abstract [en]

    In this paper, we introduce an end-to-end Amharic text-line image recognition approach based on recurrent neural networks. Amharic is an indigenous Ethiopic script which follows a unique syllabic writing system adopted from an ancient Geez script. This script uses 34 consonant characters with the seven vowel variants of each (called basic characters) and other labialized characters derived by adding diacritical marks and/or removing parts of the basic characters. These associated diacritics on basic characters are relatively smaller in size, visually similar, and challenging to distinguish from the derived characters. Motivated by the recent success of end-to-end learning in pattern recognition, we propose a model which integrates a feature extractor, sequence learner, and transcriber in a unified module and then trained in an end-to-end fashion. The experimental results, on a printed and synthetic benchmark Amharic Optical Character Recognition (OCR) database called ADOCR, demonstrated that the proposed model outperforms state-of-the-art methods by 6.98% and 1.05%, respectively.

  • 40.
    Brännvall, Rickard
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. RISE ICE - Research Institutes of Sweden, Sweden.
    Öhman, Johan
    Luleå University of Technology, Department of Engineering Sciences and Mathematics, Fluid and Experimental Mechanics.
    Kovács, György
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Cross-Encoded Meta Embedding towards Transfer Learning2020In: ESANN 2020 - Proceedings, ESANN , 2020, p. 631-636Conference paper (Refereed)
    Abstract [en]

    In this paper we generate word meta-embeddings from already existing embeddings using cross-encoding. Previous approaches can only work with words that exist in each source embedding, while the architecture presented here drops this requirement. We demonstrate the method using two pre-trained embeddings, namely GloVE and FastText. Furthermore, we propose additional improvements to the training process of the meta-embedding. Results on six standard tests for word similarity show that the meta-embedding trained outperforms the original embeddings. Moreover, this performance can be further increased with the proposed improvements, resulting in a competitive performance with those reported earlier.

  • 41.
    Byeon, Wonmin
    et al.
    University of Kaiserslautern, Kaiserslautern, Germany; German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany.
    Liwicki, Marcus
    University of Kaiserslautern, Kaiserslautern, Germany.
    Breuel, Thomas M
    University of Kaiserslautern, Kaiserslautern, Germany.
    Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging2015In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 63, p. 23-29Article in journal (Refereed)
    Abstract [en]

    This paper describes an approach to scene analysis based on supervised training of 2D Long Short-Term Memory recurrent neural networks (LSTM networks). Unlike previous methods, our approach requires no manual construction of feature hierarchies or incorporation of other prior knowledge. Rather, like deep learning approaches using convolutional networks, our recognition networks are trained directly on raw pixel values. However, in contrast to convolutional neural networks, our approach uses 2D LSTM networks at all levels. Our networks yield per pixel mid-level classifications of input images; since training data for such applications is not available in large numbers, we describe an approach to generating artificial training data, and then evaluate the trained networks on real-world images. Our approach performed significantly better than others methods including Convolutional Neural Networks (ConvNet), yet using two orders of magnitude fewer parameters. We further show the experiment on a recently published dataset, outdoor scene attribute dataset for fair comparisons of scene attribute learning which had significant performance improvement (ca. 21%). Finally, our approach is successfully applied on a real-world application, automatic web-image tagging.

    Download full text (pdf)
    fulltext
  • 42.
    Chhipa, Prakash Chandra
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Chopra, Muskaan
    CCET, Punjab University, Chandigarh, India.
    Mengi, Gopal
    CCET, Punjab University, Chandigarh, India.
    Gupta, Varun
    CCET, Punjab University, Chandigarh, India.
    Upadhyay, Richa
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Chippa, Meenakshi Subhash
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    De, Kanjar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Saini, Rajkumar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Uchida, Seiichi
    Human Interface Laboratory, Kyushu University, Fukuoka, Japan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Functional Knowledge Transfer with Self-supervised Representation Learning2023In: 2023 IEEE International Conference on Image Processing: Proceedings, IEEE , 2023, p. 3339-3343Conference paper (Refereed)
  • 43.
    Chhipa, Prakash Chandra
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Rodahl Holmgren, Johan
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    De, Kanjar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Video Coding Systems, Fraunhofer Heinrich-Hertz-Institut, Berlin, Germany.
    Saini, Rajkumar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Can Self-Supervised Representation Learning Methods Withstand Distribution Shifts and Corruptions?2023In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Institute of Electrical and Electronics Engineers Inc. , 2023, p. 4469-4478Conference paper (Refereed)
  • 44.
    Chhipa, Prakash Chandra
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Upadhyay, Richa
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Grund Pihlgren, Gustav
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Saini, Rajkumar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Uchida, Seiichi
    Human Interface Laboratory, Kyushu University, Fukuoka, Japan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images2023In: Proceedings: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023), IEEE, 2023, p. 2716-2726Conference paper (Refereed)
    Abstract [en]

    This work presents a novel self-supervised pre-training method to learn efficient representations without labels on histopathology medical images utilizing magnification factors. Other state-of-the-art works mainly focus on fully supervised learning approaches that rely heavily on human annotations. However, the scarcity of labeled and unlabeled data is a long-standing challenge in histopathology. Currently, representation learning without labels remains unexplored in the histopathology domain. The proposed method, Magnification Prior Contrastive Similarity (MPCS), enables self-supervised learning of representations without labels on small-scale breast cancer dataset BreakHis by exploiting magnification factor, inductive transfer, and reducing human prior. The proposed method matches fully supervised learning state-of-the-art performance in malignancy classification when only 20% of labels are used in fine-tuning and outperform previous works in fully supervised learning settings for three public breast cancer datasets, including BreakHis. Further, It provides initial support for a hypothesis that reducing human-prior leads to efficient representation learning in self-supervision, which will need further investigation. The implementation of this work is available online on GitHub

  • 45.
    Chhipa, Prakash Chandra
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Upadhyay, Richa
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Saini, Rajkumar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Lindqvist, Lars
    Optimation Advanced Measurements AB, Luleå, Sweden.
    Nordenskjold, Richard
    Optimation Advanced Measurements AB, Luleå, Sweden.
    Uchida, Seiichi
    Human Interface Laboratory, Kyushu University, Fukuoka, Japan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Depth Contrast: Self-Supervised Pretraining on 3DPM Images for Mining Material ClassificationManuscript (preprint) (Other academic)
    Abstract [en]

    This work presents a novel self-supervised representation learning method to learn efficient representations without labels on images from a 3DPM sensor (3-Dimensional Particle Measurement; estimates the particle size distribution of material) utilizing RGB images and depth maps of mining material on the conveyor belt. Human annotations for material categories on sensor-generated data are scarce and cost-intensive. Currently, representation learning without human annotations remains unexplored for mining materials and does not leverage on utilization of sensor-generated data. The proposed method, Depth Contrast, enables self-supervised learning of representations without labels on the 3DPM dataset by exploiting depth maps and inductive transfer. The proposed method outperforms material classification over ImageNet transfer learning performance in fully supervised learning settings and achieves an F1 score of 0.73. Further, The proposed method yields an F1 score of 0.65 with an 11% improvement over ImageNet transfer learning performance in a semi-supervised setting when only 20% of labels are used in fine-tuning. Finally, the Proposed method showcases improved performance generalization on linear evaluation. The implementation of proposed method is available on GitHub. 

  • 46.
    Chhipa, Prakash Chandra
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Upadhyay, Richa
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Saini, Rajkumar
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Lindqvist, Lars
    Optimation Advanced Measurements AB, Luleå, Sweden.
    Nordenskjold, Richard
    Optimation Advanced Measurements AB, Luleå, Sweden.
    Uchida, Seiichi
    Human Interface Laboratory, Kyushu University, Fukuoka, Japan.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Depth Contrast: Self-supervised Pretraining on 3DPM Images for Mining Material Classification2022In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI / [ed] Avidan, S.; Brostow, B.; Cissé, M.; Farinella, G.M.; Hassner, H., Springer Nature, 2022, Vol. VI, p. 212-227Conference paper (Refereed)
  • 47.
    Chintalapati, Bharadwaj
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Space Technology. Airbus Defence and Space GmbH, Friedrichshafen,Claude-Dornier Strasse, 88090 Immenstaad am Bodensee, Germany.
    Precht, Arthur
    Airbus Defence and Space GmbH, Friedrichshafen,Claude-Dornier Strasse, 88090 Immenstaad am Bodensee, Germany.
    Hanra, Sougata
    Airbus Defence and Space GmbH, Friedrichshafen,Claude-Dornier Strasse, 88090 Immenstaad am Bodensee, Germany.
    Laufer, Rene
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Space Technology.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Eickhoff, Jens
    Airbus Defence and Space GmbH, Friedrichshafen,Claude-Dornier Strasse, 88090 Immenstaad am Bodensee, Germany; University of Stuttgart, Postfach 10 60 37, 70049 Stuttgart, Germany.
    Opportunities and challenges of on-board AI-based image recognition for small satellite Earth observation missions2024In: Advances in Space Research, ISSN 0273-1177, E-ISSN 1879-1948Article in journal (Refereed)
    Abstract [en]

    The satellite industry is rapidly growing. There has been a significant increase in the number of new small satellites that are launched, which is complemented by the rapid pace of the development of image recognition algorithms. Convolutional neural networks (CNNs) in particular, have achieved state-of-the-art performance in computer vision related applications. Combining both and running an AI algorithm on-board the satellite to observe and recognize any natural disaster directly from the orbit is an important opportunity. This paper presents notable challenges that are generally involved in an Earth Observation small satellite mission and further challenges that are posed by combining it with AI-based image recognition on-board the satellite. This study discusses an approach that is feasible mainly for a fleet of small satellites.

    Download full text (pdf)
    fulltext
  • 48.
    Chopra, Muskaan
    et al.
    Chandigarh College of Engineering and Technology, Punjab University, Chandigarh, India.
    Chhipa, Prakash Chandra
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Mengi, Gopal
    Chandigarh College of Engineering and Technology, Punjab University, Chandigarh, India.
    Gupta, Varun
    Chandigarh College of Engineering and Technology, Punjab University, Chandigarh, India.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Domain Adaptable Self-supervised Representation Learning on Remote Sensing Satellite Imagery2023In: IJCNN 2023 - International Joint Conference on Neural Networks, Conference Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2023Conference paper (Refereed)
    Abstract [en]

    This work presents a novel domain adaption paradigm for studying contrastive self-supervised representation learning and knowledge transfer using remote sensing satellite data. Major state-of-the-art remote sensing visual domain ef-forts primarily focus on fully supervised learning approaches that rely entirely on human annotations. On the other hand, human annotations in remote sensing satellite imagery are always subject to limited quantity due to high costs and domain expertise, making transfer learning a viable alternative. The proposed approach investigates the knowledge transfer of self-supervised representations across the distinct source and target data distributions in depth in the remote sensing data domain. In this arrangement, self-supervised contrastive learning- based pretraining is performed on the source dataset, and downstream tasks are performed on the target datasets in a round-robin fashion. Experiments are conducted on three publicly avail-able datasets, UC Merced Landuse (UCMD), SIRI-WHU, and MLRSNet, for different downstream classification tasks versus label efficiency. In self-supervised knowledge transfer, the pro-posed approach achieves state-of-the-art performance with label efficiency labels and outperforms a fully supervised setting. A more in-depth qualitative examination reveals consistent evidence for explainable representation learning. The source code and trained models are published on GitHub1.

  • 49.
    Clavien, Gil
    et al.
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Alberti, Michele
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Pondenkandath, Vinaychandran
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Ingold, Rolf
    Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. Document Image and Voice Analysis Group (DIVA), University of Fribourg, Switzerland.
    DNNViz: Training Evolution Visualization for Deep Neural Network2019In: Proceedings 6th Swiss Conference on Data Science: SDS2019, IEEE, 2019, p. 19-24Conference paper (Refereed)
    Abstract [en]

    In this paper, we present novel visualization strategies for inspecting, displaying, browsing, comparing, and visualizing deep neural networks (DNN) and their internal state during training. Despite their broad use across many fields of application, deep learning techniques are still often referred to as "black boxes". Trying to get a better understanding of these models and how they work is a thriving field of research. To this end, we contribute with a visualization mechanism designed explicitly to enable simple and efficient introspection for deep neural networks. The mechanism processes, computes, and displays neurons activation during the training of a deep neural network. We furthermore demonstrate the usefulness of this visualization technique through different use cases: class similarity detection, hints for network pruning and adversarial attack detection. We implemented this mechanism in an open source tool called DNNViz, which is integrated into DeepDIVA, a highly-functional PyTorch framework for reproducible experiments.

  • 50.
    Dengel, Ric
    et al.
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Space Technology.
    Honvault, Christophe
    European Space Agency, European Space Research and Technology Centre, Netherlands.
    Mansilla, Luis
    European Space Agency, European Space Research and Technology Centre, Netherlands.
    Magnin, Diane
    European Space Agency, European Space Research and Technology Centre, Netherlands.
    Marques, Hugo
    European Space Agency, European Space Research and Technology Centre, Netherlands.
    Steenari, David
    European Space Agency, European Space Research and Technology Centre, Netherlands.
    Foerster, Kyra
    European Space Agency, European Space Research and Technology Centre, Netherlands.
    Liwicki, Marcus
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
    Laufer, Rene
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Space Technology.
    Hardware Accelerated Machine Learning on Embedded Systems for Space Applications2021In: IAC 2021 Congress Proceedings, 72nd International Astronautical Congress (IAC), Dubai, United Arab Emirates, International Astronautical Federation, IAF , 2021, article id 66177Conference paper (Refereed)
    Abstract [en]

    As spacecraft missions continue to increase in complexity, the system operation and amount of gathered data demand more complex systems than ever before. Currently, mission capabilities are constrained by the link bandwidth as well as on-board processing capacity, depending on a high number of commands and complex ground station systems to allow spacecraft operations. Thus, efficient use of the bandwidth, computing capacity and increased autonomous capabilities are of utmost importance. Artificial intelligence, with its vast areas of application scenarios, allows for these challenges and more to be tackled in spacecraft design. Particularly, the flexibility of neural networks as machine learning technology provides many possibilities. For example, neural networks can be used for object detection and classification tasks. Unfortunately, the execution of current machine learning algorithms consumes a large amount of power and memory resources, and qualified deployment remains challenging which limits their possible applications in space systems. Thus, an increase in efficiency is a major enabling factor for these technologies. The optimisation of the algorithm for System-on-Chip platforms allows it to benefit from the best of a generic processor and hardware acceleration shall allow broader applications of these technologies with a minimum increase of power consumption. Additionally, COTS embedded systems are commonly used in NewSpace applications due to the possibility to add external or software fault mitigation. For deployment of machine learning on such devices, a CNN model was optimised on a workstation. Then, the neural network is deployed with Xilinx’s Vitis AI onto different embedded systems that include a powerful generic processor as well as the hardware programming capabilities of an FPGA. This result was evaluated based on relevant performance and efficiency parameters and a summary is given in this paper. Additionally, a different approach was developed which creates, with a high-level synthesis tool, the hardware description language of an accelerated linear algebra optimized network. The implementation of this tool was started, and the proof of concept is presented. Furthermore, existing challenges with the auto-generated code are outlined and future steps to automate and improve the entire workflow are presented. This paper aims to contribute to increasing the efficiency and applicability of artificial intelligence in space. Specifically, the performance of machine learning algorithms is evaluated on FPGAs which are commonly used for complex algorithms’ execution in space.

123 1 - 50 of 141
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf