Communication with customers is a major part of customer experience as well as a great source of data mining. More businesses are engaging with consumers via text messages. Before 2020, 39% of businesses already use some form of text messaging to communicate with their consumers. Many more were expected to adopt the technology after 2020[1]. Email response rates are merely 8%, compared to a response rate of 45% for text messaging[2]. A significant portion of this communication involves customer enquiries or support messages sent in both directions.
According to estimates, more than 80% of today’s data is stored in an unstructured format (suchas text, image, audio, or video) [3], with a significant portion of it being stated in ambiguous natural language. When analyzing such data, qualitative data analysis techniques are usually employed. In order to facilitate the automated examination of huge corpora of textual material, researchers have turned to natural language processing techniques[4].
Under the light of shared statistics above, Billogram[5] has decided that support messages between creditors and recipients can be mined for predictive maintenance purposes, such as early identification of an outlier like a bug, defect, or wrongly built feature. As one sentence goal definition, Billogram is looking for an answer to ”why are people reaching out to begin with?” This thesis project discusses implementing unsupervised clustering of support messages by benefiting from natural language processing methods as well as performance metrics of results to answer Billogram’s question. The research also contains intent recognition of clustered messages in two different ways, one automatic and one semi-manual, the results have been discussed and compared.
LDA and manual intent assignment approach of the first research has 100 topics and a 0.293 coherence score. On the other hand, the second approach produced 158 clusters with UMAP and HDBSCAN while intent recognition was automatic. Creating clusters will help identifying issues which can be subjects of increased focus, automation, or even down-prioritizing. Therefore, this research lands in the predictive maintenance[9] area. This study, which will get better over time with more iterations in the company, also contains the preliminary work for ”labeling” or ”describing”clusters and their intents.