Watching āReactJS Basicsā course the other day Iāve noticed that the further I go ā the smaller number of pageviews for each lesson was. But is that ādropout rateā the same for all courses? Here is my little research.
So the procedure we are going to follow is called survival analysis. Itās very well known in medicine but could be applied to many more industries as well.
The fraction of patients living for a certain amount of time after treatment. by Wikipedia
Letās think of a brighter example for it than cancer patients. It could be a percentage of working laptops from the same batch or fraction of people who continue using your product after the first week, second week, etc.
In our research, weāll look at how many people who started online course continue with it and watch new lessons.
We are going to work with Youtube data since itās open and there are plenty of interesting courses. But there is a very important assumption ā to make a precise survival analysis weād need to know timestamp and user id for every view of every video in the course. But since Youtube gives us only a number of pageviews per video, letās assume that all pageviews are unique and there are no users who started the course lately (weād exclude them in real analysis).
So Iāve picked up some courses on programming, chess, guitar, drawing, and fitness:
I think there is no use to share Ruby basics like looping through an array of ids and dump downloaded data to CSV, you can check all the code yourself. May be pay attention how itās organized. Itās very much inspired by Jupyter Notebooks in Python: progression of steps that should be run one by one.
At this point, we have pageviews for each video in selected courses. Before we start processing it, letās just plot it as it is. I think this step is very useful in any research you do.
Gnuplot is pretty much a standard, so letās go with it:
As you can see not every curve has downward slope ā there are spikes in the middle with millions of pageviews. I guess itās very specific to Youtube when some videos became viral ā just look at the title āHow to Achieve Checkmate in 2 Movesā. So weāll remove these points. Weād also remove pageviews when users havenāt started from the beginning, but our assumption is that there are no such pageviews.
For the final plot we will calculate a portion of āsurvivedā users for every lesson:
As you can see itās really hard to survive singing and drawing classes online š On top, we have chess and React JS ā both are easy but bring you a lot of fun! š»
P.S. All code for this post is available on Github.