Podcast Data Visualization

Khadija Anam
Apr 11, 2022
1 min read

Updated: Jan 20, 2025

Podcasts have become a popular medium for learning and exploring new ideas, and one of the conversation between Naval Ravikant and Joe Rogan on entrepreneurship, personal development, wealth creation, and mindfulness is no exception.

So I thought, "Hey, why not turn this into a data viz project?" And here are the results:

Data Source - Extracted the Transcription of the podcast from transcribe.app using R studio.
Data Processing & Data Visualization -

Word Cloud

Removed all the stop words
Made a tibble of the words along with its frequency
Turned the table into word cloud

Topic Modelling

Created a document-term matrix (DTM): A DTM of the above tibble was created using the preprocessed text data. This matrix represents the frequency of occurrence of each word in each document (i.e., podcast episode).
Chose the number of topics to extract from the text data using the elbow method or trial and error.
Built the topic model using the LDA() function in the topicmodels package, which takes the DTM and the number of topics as input.
Evaluated the performance of the topic model by calculating the coherence score, which measures the degree of semantic similarity between words in the same topic.
Extracted and interpreted the topics from the model using the terms() function, which displays the most frequent words in each topic.

Comments