The current forecast suggests the button will end on the 23rd. On a positive note, the forecasts have been getting shorter and shorter as I have updated them. The first forecast was +15 days, then +12, +13, and +8. This current one is +7. I am still hoping to get two forecasts which both suggest the same day, but I imagine the trends are changing over time faster than my technique allows.
The grey confidence interval suggests that the timer will probably never over 30 seconds for any 10 minute period again. If you want a badge indicating less than 30 seconds, you will always be able to find it if you wait for, at most, 10 minutes, even during peak hours. Very few people will have to wait that long.
Each cycle or wave in the graph is approximately one day, representing the daily cycle of activity on the button, high in the afternoons, low in the late nights/mornings. There is also a slight weekly cycle, but it is not easy to notice in the plot.
The button’s values have partially stabilized with a pretty persistent protection around the red barrier, but there is still some noticeable decay between the 4000 and 6000 period marks. I have continued to use an indicator for the lowest observed badge color to help soften the impact of the early periods, when it was impossible to get a low badge color due to the frequency of clicking. We are now in a period where it is demonstratively possible to get red, with patience and discipline- we have observed red badges occur. Using the lowest observed badge color as a variable allows us to separate out this current period from earlier ones where the data was less descriptive of the current state.
Out of the grey collection of possible futures highlighted, it looks like button is declining steadily, the general future looks rather grim. The upper line of the grey 75% confidence interval is below 30 seconds, suggests that the timer will not be kept at over half for a full 10 minutes ever again. I note that the existence of a good forecast means that the red guard can simply pay extra close attention to the period in which they think it will end, and this forecast might actually extend the life of the button. Maybe.
First, I downloaded the second-by-second data at about 5/16 at 12:00pm CST from here. To ease the computational load and reduce unwanted noise in the forecast, the 4+ million data points were aggregated from seconds into intervals of ten minutes each. I examine only the lowest value of the timer, since the topic of interest is when the timer hits zero. (This strikes me as somewhat ad hoc, because the distributions of minimums are likely non-normal, they would be from an extreme value distribution.) Below is a plot of the ten minute minimums for the button. Each cycle is about a day, and there appears to be a weekly cycle that is very slight.
I exclude any period where the timer was not well recorded for technical reasons, which has helped return the forecast to normal after the “great button failure”. I am much more confident in this current forecast as a result. New to this forecast, I have also added a dummy indicator for the lowest badge observed. It began as purple, and then slowly slid to red. We are in a post-red period, but when the button began, we had only seen purple. The structure of the model ought to reflect that. This significant set of variables suggests that the button’s lowest observed value in a 10 minute period is sinking at an accelerated pace compared to the early stages of the button.
Then, I estimate the process using ARIMA(1,1,1) and weekly, daily, and hourly Fourier cyclical components. I include one pair of sin() and cos() waves to catch any cyclical trends in weeks, days, or hours. This is roughly the same technique I used to predict the next note in Beethoven’s Symphony, which worked with 95+% accuracy. They tend to fit very well, and in fact, I am often shocked by how effective they are when used correctly.
Below I show a how well the past data fits with our model model. This type of extreme fit is typical when ARIMA sequences are applied correctly, and only shows that I do seem to fit the past reasonably well. I check this plot to ensure that my forecast does not predict impossible amounts and it stays between 0-60 for our past data.
The fit appears to be very good, better than prior weeks, suggesting my model is better now that I have included lowest observed badge. There are few periods where the forecast is very off base. (I am not sure why the last line spikes up so much, I would like to take a careful look at the code to see what’s going on, that spike is not part of ARIMA and therefore is a problem within my forecast, likely involving the very last period.)
Below, I show the errors of the forecast above. At this scale it is clear there are a few days where my model misjudges the fit. I am unsurprising by this, given I have about so many observations, but I am disappointed some intervals are incorrectly predicted by 20 seconds or more. This is the cost of estimation, perhaps.
On to more technical details. My process looks at its own previous values to make a forecast. I need to make sure that my sequence is not missing critical associations. Let us see how well the past values are associated with a current one. Big lines mean big association. We call plots of these correlations the ACF and PACF. I plot them below. They suggest our fit is relatively well done. (They fall mostly within the blue lines for many/most steps, the first of the ACF is excluded, because the current value is 100% equal to itself.) For these steps that are outside of the blue in the PACF, I doubt the sequence has 25 lags or leads, and such things are not quickly calculable on a home computer anyway, so I am going to reject them as a possibility. Adding too many terms and over-fitting the data would be equally unwise.
I avoid looking at the Augmented Dickey-Fuller Test because I am looking at minimums, and therefore have concerns about the normality of the errors, but have considered it.
Commentary on Other predictions Types and Styles:
Some are attempting to use the remaining number of greys. I am currently not encouraged that this approach is good. I note that the count of remaining greys appear to be largely insignificant in predicting the next lowest value of the button. (I have tried to include them in a variety of ways, including natural logs, and they did not influence the prediction.) I conclude from this that the number of greys largely is irrelevant. I suspect that a portion of the greys are pre-disposed to click, and this proportion of “click eventually” vs “never-click” matters more than the total number of greys, but I suspect this proportion fluctuates dramatically from minute to minute and I cannot isolate what the true proportion is without serious adjustment in my technique.
Some are attempting to predict the button failure by a clicks/minute approach, which I am intrigued by, but I have not investigated this closely as an approach.
I note that I have some reservations about the asymptotic validity of my estimators. I am investigating these currently.
To see how my forecast changes, and in the interest of full disclosure, I will keep tabs of my past estimates and note how additional data improves or worsens these estimations.
Current Update (5/16) – May 23rd. Noting that previous updates have all shrunk distance to button failure: +14 days, then +13, +13, and +8. This current one is +7, within a week.
Updated Badge Technique (5/11) – May 19th New Technique Added: Used lowest observed badge color to help separate out the pattern of early periods (the purple, blue, orange periods) from the late patterns (the post-red period).
Revisiting the Forecast (5/3) -May 16th
Update: After Button Failure (4/27) – May 9th.