COVID-19 Discussion Trends
This project was my entry into the Alberta Innovates
COVID-19
Data Science Hackathon. It won a prize
for best individual effort!
The project explored how Albertan discussion of the pandemic evolved from January to May 2020 using data from ~500k comments on local subreddits. An unsupervised text classification model was used to determine if comments were relevant to a set of topics related to the pandemic.
Check out the notebook on nbviewer and the source code on Github.
Tech
- Data processing:
dask
numpy
pandas
spacy
- Modeling:
gensim
- Visualization:
altair
Acknowledgements
- Reddit data sourced from pushshift.io
- Thanks to Alberta Innovates for hosting the Hackathon