Stellenbosch University
Welcome to Stellenbosch University
Dept Statistics Seminar: Zoë-Mae Adams (SU) - Embedded word MCA biplots for sentiment visualisation
Start: 12/04/2024, 13:00
End: 12/04/2024, 14:00
Contact:Elizna Huysamen -
Location: Van der Sterr building, 2nd Floor, Room 2048

Text data, being unstructured in nature, is transformed into a structured and understandable state through text mining, encompassing fields such as data mining, machine learning, and natural language processing. The process of text mining involves extracting usable insights from the text, which can strengthen market positions for businesses or reinforce existing perceptions. To gain a deeper understanding of text data, it can be classified into relevant categories. However, in exploratory stages where there may not be sufficient labels for pre-dictions, evaluating different text classification techniques’ results may be chal-lenging. Therefore, a more effective approach to examining and understanding text data is to summarize its content visually. Multivariate visualizations, par-ticularly biplots, are employed to account for associations and summarize text content. Multiple correspondence analysis (MCA) biplot is suited for visual-izing multivariate categorical data, making it ideal for representing text data where various elements and characteristics are expressed as categorical variables. We previously proposed the interactive embedded word MCA (EW-MCA) bi-plot as a tool for summarizing unstructured text data, enabling the display of text observations and categorical data variables’ category levels in a space where proximity represents association, with an added functionality to view raw text. In cases involving sentiment-related categorical variables, this interactive EW-MCA biplot serves as a tool for sentiment visualization. Despite limited resources focused on sentiment visualization, this research aims to contribute to the existing sources by providing a novel sentiment visualization tool for interpreting and summarizing text data involving sentiments.​