COVID-19 Discussion Trends


This project was my entry into the Alberta Innovates COVID-19 Data Science Hackathon. It won a prize for best individual effort!

The project explored how Albertan discussion of the pandemic evolved from January to May 2020 using data from ~500k comments on local subreddits. An unsupervised text classification model was used to determine if comments were relevant to a set of topics related to the pandemic.

Check out the notebook on nbviewer and the source code on Github.

C19 Reddit Analysis Chart

Tech

  • Data processing: dask numpy pandas spacy
  • Modeling: gensim
  • Visualization: altair

Acknowledgements

  • Reddit data sourced from pushshift.io
  • Thanks to Alberta Innovates for hosting the Hackathon