Chapter 5 Results

5.1 Stock Data and Information

5.1.1 Stock Candlestick Plot

The most representative WSB stocks are GME and AMC. Also, these 2 stocks are the top 2 most widely discussed stocks in r/wallstreetbets subreddit analysed in Data transformation. Therefore, we choose these 2 stocks and show the stock price and volume change in the past one year.

The graph is interactive, where we can choose select the period of time that we want to focus on by dragging on the bottom graph. This allows us to zoom in and see the graph clearly. Also, we can choose the specific period, i.e. the past 1 year or past 6 months data by pressing the corresponding buttons above and we can easily go back to the main graph by pressing the “RESET” botton.

## [1] "GME"
## [1] "AMC"

From the above 2 graphs, we can clearly see that there are 2 waves of the WSB stocks, which are indicated by the sudden spike of the stock price as well as the sudden increase of the volume. These 2 waves are around January - Feburary and May - June period respectively. These 2 graphs actually double confirm with our previous claims.

Also, the gray and red line in the price graph is the Bollinger Bands, which is the price level at 1 standard deviation level above and below the sample moving average of the price, which is able to capture the volatility of the stock. Generally, Bollinger bands is used to check whether prices are relatively high or low as comparing to the historical price. Here, we can see that for the beginning of the 2 waves, the price of both GME and AMC break the Bollinger upper bands, which suggests that the price of these 2 stocks are extremely high as comparing to the historical data. However, the Bollinger bands also adjusted themselves quickly to accommodate the sudden price change, so we can see that the price of these 2 stocks fall back within the Bollinger bands after a short period of time.

5.1.2 Stock Risk Assessment

While the previous 2 graphs mainly focus on the price and volume aspects, there is one more aspect that are very important to the stock, which is risk. Risk can be measured by the standard deviation. Here, we would like to explore how the risk of the 2 stocks change before and after the 2 waves, which are GME and AMC respectively. Thus, we need to calculate the moving standard deviation of the 2 stocks over time.

Here, we take the closing price to be the one calculating the moving standard deviation since closing price is the most representative price among all the 5 prices. For each month, it is about 4 weeks and the stock market only opens in weekdays, thus, there are about 4*5 = 20 days with stock data every month. Therefore, we set the moving window to be 20, which is roughly about 1 month time.

Also, we take the data since the start of last year because we want to see whether there is any difference in terms of the risk of GME and AMC by comparing the moving standard deviation before and after the 2 waves.

## [1] "GME"

## [1] "AMC"

Looking at the line chart and calendar heat map for GME and AMC, both of them deliver similar observations as follow:

  1. Around Jan/Feb and May/June period, there are huge changes for both GME and AMC in terms of risk.

  2. GME has relatively higher risk than AMC as GME’s moving standard deviation reached more than 100 while the highest moving standard deviation of AMC is about 13.

  3. The risk of GME is highest in the first wave during Jan/Feb period while the risk of AMC is highest in the second wave during May/Jun period. This can be supported by the general knowledge that GME is the leading stock in the 1st wave while AMC is the leading stock in the 2nd wave.

5.1.4 Time Series of the Top 10 Stocks

Now, we want to study how the top 10 stocks’ prices move over the past one year. Again, since the closing price is the most representative price among all the 5 prices, we just use closing price to calculate the daily return change over time. The equation used to calculate the daily stock return is that: (current closing price - previous closing price) / previous closing price * 100%.

From the above graph, we can have the following observation:

  1. AMC has the highest daily return in the end of January 2021, which was around 300% increase as comparing to the previous day price.

  2. GME and AMC’s price change start to fluctuate a lot since the mid of the January, which marks the beginning of the WSB 1st wave. During this period, they started to get public attention.

  3. In 27th Jan, the 1st wave of WSB reaches the peak, where GME, AMC, BB, NOK, NAKD achieved their highest daily return change on the same day.

  4. Generally, the change in daily return in the 1st WSB wave is higher than the change in daily return in the 2nd WSB wave. This occurs since the price of the stocks has increases a lot in the first wave already, so in terms of the return (%), it is very hard to have as big change as that in 1st wave.

  5. The stocks are included in different waves, whilePLTR is the stock that is only active in 1st wave, there are other stocks that are active only in the 2nd wave, which are CLOV, SND, RETA respectively. For GME, AMC, BB, NOK, NAKD, they are active in both of the waves. However, for MAR, its stock price does not change much through out the 2 waves.

5.1.5 Correlations between top 10 mentioned stocks

We would like to study the correlation between the top 10 most popular stocks and also extend the study period before the WSB wave to see how the WSB event actually affects the correlation. Also, we would like to compare the correlation in the 2 WSB wave respectively to see whether there is any difference. Hence, we need to have 3 separate graphs to study the correlation in different period, which are in 2020, Jan/Feb 2021 and May/Jun/Jul 2021. However, we removed CLOV from the study for 2020, because it only went public in 2021 so there is no stock data for CLOV in 2020.

In this report, if the correlation coefficient < 0.4, we consider there is no correlation; if the correlation coefficient is between 0.4 and 0.7, then it is weak correlation; and if the correlation coefficient > 0.7, then it is strong correlation.

From the above 3 graphs, we can have the following observations:

  1. There are 2 stock pairs with weak correlation in 2020, which are (MAR,AMC) and (RETA,AMC) respectively, therefore, we can conclude that there are no correlation among all those stocks generally before the WSB waves.

  2. During the Jan and Feb 2021 period, which is the 1st wave of WSB, we can see that there are many more stock pairs with weak correlation and even there are 2 stock pairs with strong correlation, which are (PLTR,BB) and (NOK,AMC) respectively. Therefore, we can clearly see that it is the 1st WSB wave to influence the stock price of most of the stocks in our list.

  3. Also during the 1st WSB wave, the correlation coefficient among the stocks are mainly positively correlated as there are more green circle in the 2nd graph above. This is because that redditers are generally buy different WSB stocks at the same time and also sell at the same time. There is one exception NAKD, which has weak negative correlation some other stocks.

  4. During the May, Jun and July 2021 period, which is the 2nd wave of WSB, we can see that there are still 2 stock pairs with strong correlation, which are (NAKD,BB) and (NAKD,AMC) respectively. Therefore, during the 2nd WSB wave, the correlationship among all the stocks are stronger than in 2021 and 2nd WSB wave influence all the stock prices.

  5. By comapring the 2nd and 3rd graph, we can see that the correlation among the stock price are generally higher in 1st WSB wave as comparing to that in 2nd WS wave. Also, in the 2nd WSB wave, NAKD now has positive correlation coefficient with most of the other stocks in contrast with it being outlier in the 1st wave.

5.2 r/wallstreetbets Reddit Post Data

In this section, we mainly explore the posts in r/wallstreetbets subreddit. We would like to explore how this event affects the r/wallstreetbets subreddit group and how the posts affect the stock price. Also, we also want to conduct the sentiment analysis on the posts to get the general information about the massive post.

5.2.1 r/wallstreetbets metrics

We would like to access how this event affects the r/wallstreetbets subreddit group from the following 3 aspects, which are: subscriber growth and daily active users and posts count. Subscriber growth can show how many new users are attracted to the sub-reddit group by this event. The definition of daily active users is that the user has posted at least 1 post on that day and it can show how many people are actively contributing to the community and event and posts counts can directly reflect the popularity of the sub-reddit group.

Also, since we want to compare the popularity of the sub-reddit group before and after the events in order to show how big the effect of the event on the sub-reddit group, we choose to display the data in the past 2 year.

From the above 3 graphs, we have the following observations:

  1. There are very few joiners to the r/wallstreetbets subreddit group before and after the 1st wave of WSB. However, there are huge amount of new users joining to the sub-reddit group daily at the end of the January, peaked at 27th Jan, 2021 with 4443 new joiners.

  2. Even there are 2nd WSB wave, not much new joiners are attracted to the subreddit group, which means that it is the same group of people who participated in the 1st WSB wave initiate and participate the 2nd WSB wave again.

  3. The daily active users spiked at 28th Jan, 2021 with 117426 distinct users have posted at least 1 post. We can see that during the 1st and 2nd WSB wave, the daily active users increased a lot as comparing to the before and after the wave.

  4. The daily active users and the post counts spiked 1 day after the new joiner’s peak, which means that the new joiners are very active in the begin and they gradually loss the interest in this topic over time.

  5. The effect of the WSB wave lasts for quite some time, since it is only around early March, 2021, the daily active user are back to the normal level.

  6. The posts created during the 1st WSB wave are much higher than that during the 2nd WSB wave, therefore, the 1st WSB wave attract more public attention as comparing to the 2nd WSB wave.

  7. From the posts count graph, we can see that there are 2 sub-waves in both 1st and 2nd WSB wave, indicating that the redditers are continuously follow the event and it is not one time thing.

  8. The posts count of GME is much higher than that of AMC during the 1st wave, therefore, GME is the primary stock in the 1st wave.

  9. While GME and AMC are widely discussed during that period, there are still many other stocks and topics that posts discussed.

5.3 What’s the impact of r/wallstreetbet?

5.3.1 Reddit metrics vs. stock price and volume

Now, we want to examine how the posts counts in r/wallstreetbets subreddit group affect the stock price as well as the stock trading volume. Since GME and AMC are the 2 most representative stocks in the 2 WSB waves,we choose these 2 stocks as the study object.

From the graphs above, we have the following observations:

  1. Generally there is no relationship between the posts count and daily price change for both AMC and GME.

  2. Only for some dates in the 1st and 2nd WSB wave, i.e. 2021-01-27, when there are huge post counts, the price change also very high.

  3. GME generally has higher daily price change than AMC. The maximum daily change of GME is around 200% while the maximum daily change of AMC is only around 30%

  4. The association between the posts counts and volume is tighter than the association between the posts counts and price change.

  5. There is some weak correlation between posts counts and volume, which is the bigger the post count is, the higher the volume is.

  6. AMC generally has much higher volume than GME, this is because that the price of AMC is much lower than GME, therefore, people can afford to buy more AMC stocks.

5.3.2 Sentiment analysis on Reddit posts regarding GME

Since there are 2 WSB waves, we would like to study the posts in these 2 periods separately to compare whether there is any change in terms of the sentiments. Also, for the 10 different sentiment, we manually assign them into 2 categories, which are positive and negative sentiments respectively. The positive sentiments includes: positive, trust, anticipation, joy and surprise while the negative sentiments includes: negative, anger, fear, sadness and disgust.

From the graphs above, we have the following observations:

  1. The sentiments of the posts are dominated by positive related sentiment, i.e. positive, trust and anticipation. We can see the 2 most popular sentiment among all posts in both 1st and 2nd WSB waves are positive and trust.

  2. The overall sentiments of posts in the 1st WSB wave is slightly more negative than that in the 2nd WSB wave as we can see that the percentage posts with negative sentiment is higher in 1st WSB wave than that in the 2nd WSB wave and also in the top 6 sentiments in 1st WSB wave, there are the 3 negative related sentiments while there are only 2 negative related sentiments in the top 6 sentiments in 2nd WSB wave.

  3. There are much more posts created during the 1st WSB wave as comparing to that created during the 2nd WSB wave since more word count means more posts created.

In the next graph, we would like to explore whether there is any relationship between the sentiment of the posts and the stock price, so we use GEM as a case to study the relationship. Also, we introduce a new parameter, called positive percentage, which is calculated by total positive word counts / total word counts the on daily basis.

From the r/wallstreetbets daily posts count graph, we know that the posts counts are very minimal outside the 2 waves period, therefore, there are not enough data for us to make any reliable conclusion based on such a small number of posts. As a result, we didn’t collect Reddit post data for those two months, which leads to the discontinuity of the data.

The line for positive sentiment percentage (colored by blue) is disconnected in Mar and Apr due to the discontinuity mentioned above. Here are some of the observations based on the graph above:

  1. In the 1st WSB wave, the positive sentiment percentage are relatively constant, ranging from 35% to 70%. While for the 2nd WSB wave, the sentiment percentage changes a lot, ranging from 0% to 100%. This may due to the different posts count and active user during these 2 period. Since the posts count and the active users are much higher during the 1st WSB wave, more people are there to express their opinions and the sentiment should be relatively evenly distributed between positive and negative sentiment, therefore, for the positive sentiment percentage, it is close to 50%, which means half of the posts are with positive sentiment while the other half are negative sentiment. On the contrary, in the 2nd wave, less active users are there and also most of the active users are original users, i.e. no new joiners, therefore, the sentiment of the posts created by the active users in 2nd wave should be quite similar since they have the similar interests and similar thoughts. Therefore, if 1 person is positive, it is easy to pass to others, wise versa, so we can see that the positive sentiment percentage swing a lot in the 2nd WSB wave.

  2. There is some negative correlation between daily positive sentiment percentage and stock price movement. During the end Jan, 2021 to early Feb, 2021 period, which is the peak of the 1st WSB wave, we can see the negative correlation but with some delay, i.e. the price dip on 28th Jan, 2021 and the positive sentiment percentage topped on 29th Jan, 2021. This is reasonable since when the price drop on the previous day, it means that there are more shorts than longs, so more people will advocate for buying GME to bring the price back on the next day. Therefore, more posts are identified with positive sentiment. However, there may be many other influential factors that affect stock price direction, example: dividends, earning per share, cash flow, etc, therefore, the correlation between these 2 variables are just to some extend.

5.3.3 General Public’s interests on Google

In this section, we mainly explore the posts in r/wallstreetbets subreddit. We believe that the whole event , i.e. r/wallstreetbets subreddit, short squeeze and the associated stocks are relatively new to the public, therefore, they are very likely to search some of the related keywords in Google to understand the whole event. Therefore, in this section, we would like to use the Google search related data to explore the general public attention towards this event.

5.3.3.1 Interest over time

In Google, people can search in the website and if they already know the basic information about the event and they follow the event closely and only interested in the most recent news, therefore, we would like to zoom one level down into Google News to understand the topics that the close followers are interested in.

We have identified some keywords from the word cloud based on the size of the words there. However, there are some commonly searched word in our daily life, i.e. stock, market, buy and etc, so we need to exclude them from the list. Therefore, we choose GME, AMC, wallStreetBet, reddit, robinhood to be the key words. However, in the source data, there are misalignment in the word format, i.e. Google data source use GameStop to represent GME, so we choose the closest word form, which are: AMC,GameStop, r/WallStreetBets,Reddit,Robinhood respectively.

Also, since we want to compare the public interest towards these key words before and after the events in order to show how big the effect of the event on the public interest, we choose to display the data during the time period 2020-11-15 to 2021-11-15.

In the above 2 graphs, the x-axis is the date. For the y-aixs score, it represents search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.

From the above 2 graphs, we have the following observations:

  1. This event has attract huge public attention around the end of Jan, 2021 and start of Feb, 2021 period of time.

  2. GME caught the highest attention during the 1st wave of the WSB, while AMC caught the highest attention during the 2nd wave of the WSB.

  3. Generally, the 1st wave of WSB caught more public attention than the 2nd wave of WSB.

  4. For the professionals and the closely followed people, more news reported the GME and Robinhood related topics, since they are the one using Robinhood to trade GME. While for the general public, they are more care about the different stocks, i.e. GME and AMC.

  5. Reddit always has relatively high search interest, which indicates that there are constant active users in Reddit, while for r/WallStreetBets, before and after the event, only very few people use it. Therefore, the active users in this sub-reddit is not very high, which coincident with our previous conclusion from the daily active user graph.