Podcast Data Visualization
- Khadija Anam
- Apr 11, 2022
- 1 min read
Updated: Jan 20, 2025
Podcasts have become a popular medium for learning and exploring new ideas, and one of the conversation between Naval Ravikant and Joe Rogan on entrepreneurship, personal development, wealth creation, and mindfulness is no exception.
So I thought, "Hey, why not turn this into a data viz project?" And here are the results:
Data Source - Extracted the Transcription of the podcast from transcribe.app using R studio.
Data Processing & Data Visualization -
Word Cloud
Removed all the stop words
Made a tibble of the words along with its frequency
Turned the table into word cloud

Topic Modelling
Created a document-term matrix (DTM): A DTM of the above tibble was created using the preprocessed text data. This matrix represents the frequency of occurrence of each word in each document (i.e., podcast episode).
Chose the number of topics to extract from the text data using the elbow method or trial and error.
Built the topic model using the LDA() function in the topicmodels package, which takes the DTM and the number of topics as input.
Evaluated the performance of the topic model by calculating the coherence score, which measures the degree of semantic similarity between words in the same topic.
Extracted and interpreted the topics from the model using the terms() function, which displays the most frequent words in each topic.




Comments