Axel Delano Fabiano Bax

4/27/2022 - New Project: Scraping r/nosleep

My current project-in-progress is a mix of using text-mining methods while also keeping the project dynamic. I want to look at the nosleep subreddit (aka the contemporary internet version of the gothic novel) and try to track changes in it overtime. This serves as a small-scale sample of how a genre may change overtime; the theme of nosleep in general has not changed - it has always been horror short stories - yet as more posts appear in the subreddit, people will be influenced by those previous posts and tropes will begin to appear. Just like iconic stories for the gothic genre appear, such as Sleepy Hollow, so too do these stories appear in nosleep, such as Ted's Caving Story. You might be wondering "why this particular subreddit?" r/nosleep is a relatively popular and relatively old subreddit that still has new posts added to it daily; it's also in the horror genre which is somewhat similar to the gothic genre.

Naturally, a computational approach lends itself to this project nicely, mostly due to the number of posts that would have to be read in order to track such changes over such a long period of time (established in 2010). These changes can be quantified with different computational methods to track basic things like wordcount and word frequency. The other advantage of a computational appraoch with quantifiable variables is that I can continuously update this data as new posts in the subreddit appear. For example, I imagine having a larger analysis of the change over time from r/nosleep's conception until whenever I finish this project. However, it would be pretty cool to get daily updates automatically to see how r/nosleep has performed over the last 24 hours. So I plan to have my overall analysis of the entire subreddit with a piece of my webpage that has daily updates. I may even add a few other similar subreddits in to compare to each other.

Another appeal for this project is the contemporary nature (what could be more contemporary than daily updates?) of the texts that I'll be working with. Previously, I did research on Early American Gothics, which is pretty much as old as you can get in American Literature (some texts technically even predated the United States!). To look at the oldest set of texts in a genre and then compare it to the youngest texts seems like an interesting process. Of course, r/nosleep is only tenuously related to Early American Gothics since these stories are short, don't have to go through any kind of publishing process, and, most notably, could have been written anywhere since reddit exists on the internet. That being said, the term 'contemporary' has to acknowledge technology since it has such a strong effect on how books are published in 2022 compared to the 1700s and who can publish these stories.

Lastly, I also want to use this opportunity to master some new skills. I've done basic text pre-processing and word frequency analyses in previous projects, but I haven't done any semantic similarity or topic modelling before. I also haven't done any projects that continually update, hence my desire for daily updates. Nor have I done any projects which combine the different programming languages that I know: Python (for pulling the reddit data), R (for data analysis and graphing), and javascript/HTML/css (for loading all of this info into a webpage).