4/27/2022 - New Project: Scraping r/nosleep
My current project-in-progress is a mix of using text-mining methods while also keeping the project dynamic. I want
to look at the nosleep subreddit (aka the contemporary internet version of the gothic novel) and try to track changes
in it overtime. This serves as a small-scale sample of how a genre may change overtime; the theme of nosleep in general
has not changed - it has always been horror short stories - yet as more posts appear in the subreddit, people will be
influenced by those previous posts and tropes will begin to appear. Just like iconic stories for the gothic genre appear,
such as Sleepy Hollow, so too do these stories appear in nosleep, such as Ted's Caving Story. You might be wondering "why
this particular subreddit?" r/nosleep is a relatively popular and relatively old subreddit that still has new posts added
to it daily; it's also in the horror genre which is somewhat similar to the gothic genre.
Naturally, a computational approach lends itself to this project nicely, mostly due to the number of posts that would have
to be read in order to track such changes over such a long period of time (established in 2010). These changes can be quantified
with different computational methods to track basic things like wordcount and word frequency. The other advantage of a computational
appraoch with quantifiable variables is that I can continuously update this data as new posts in the subreddit appear. For
example, I imagine having a larger analysis of the change over time from r/nosleep's conception until whenever I finish this
project. However, it would be pretty cool to get daily updates automatically to see how r/nosleep has performed over the
last 24 hours. So I plan to have my overall analysis of the entire subreddit with a piece of my webpage that has daily updates.
I may even add a few other similar subreddits in to compare to each other.
Another appeal for this project is the contemporary nature (what could be more contemporary than daily updates?) of the texts
that I'll be working with. Previously, I did research on Early American Gothics, which is pretty much as old as you can get
in American Literature (some texts technically even predated the United States!). To look at the oldest set of texts in
a genre and then compare it to the youngest texts seems like an interesting process. Of course, r/nosleep is only
tenuously related to Early American Gothics since these stories are short, don't have to go through any kind of publishing
process, and, most notably, could have been written anywhere since reddit exists on the internet. That being said, the
term 'contemporary' has to acknowledge technology since it has such a strong effect on how books are published in 2022
compared to the 1700s and who can publish these stories.
Lastly, I also want to use this opportunity to master some new skills. I've done basic text pre-processing and word frequency
analyses in previous projects, but I haven't done any semantic similarity or topic modelling before. I also haven't done any
projects that continually update, hence my desire for daily updates. Nor have I done any projects which combine the different
programming languages that I know: Python (for pulling the reddit data), R (for data analysis and graphing), and javascript/HTML/css
(for loading all of this info into a webpage).