Model customer behavior, visualize A/B test results, predict user metrics…all using a simple Markovian framework!

final product
In this article, I aim to introduce you (regardless of your technical ability) to Markov chains and use it to simulate customer behavior.
This isn’t going to be a traditional “how to” tutorial with code snippets every two lines. My primary aim in writing this is to provide you with a conceptual framework that you can flexibly use, so you do not necessarily have to code along to learn something new. Technical details will pop up here and there, but I will provide as much intuition for them as possible.
Data ProcessingFor this analysis I will be using Credit Sesame’s analytics data that I was provided with during a datathon. You can use any user data provided it spans your time-frame of interest (for example a week’s/month’s/year’s worth of data). It should follow a structure similar to the one below.

example data
Examples of action can be “clicked offer/ad”, “clicked subscribe” etc. Columns can also be other metrics such as page views or revenue. Include any column you think will be useful for what you plan on modeling ― in my case, it is user engagement.
Customer SegmentationSelect a particular day in your dataset and get new-user data for that particular day . I am modeling how new users behave within 30 days of using Credit Sesame’s website.

new user data on a particular day
Next, we segment our customers into different categories or states. There are many ways you can do this.
You can apply a scoring function : Give each user a score indicating their overall engagement. You could have higher weights for actions you think influence higher user engagement such as “session length”.
You can then divide the distribution into 3 segments (Inactive, Active and Very Active) based on your heuristics.

Distribution of applied scorefunction
2. Apply an unsupervised algorithm such as k-means: You may use clustering algorithms such as k-means to cluster similarly engaged customers. Each cluster would have their own distinct properties which are hopefully the ones you wish to model. You could even apply the algorithm on your previously calculated score function (univariate data) to make it even simpler.

K-means segments visualized on the scorefunction

after segmentation on first-day data. 1 = inactive user, 2 = active user, 3 = very activeuser.
After segmenting first day data, you pick a time-frame. I chose to go with a month as I believe Credit Sesame has a lot of returning users and the magnitude of that can be captured with a month’s worth of data. After 30 days, users would have had the opportunity to shift to and from each segment: very active users may become inactive, moderately active users may become very active, and so on.
Apply segmentation to this post-30-day data. Make sure you account for the time frame (for example, average your engagement score for the 30 days).

segmentation applied to 30-daydata
Let’s visualize the results:

As expected, the number of users who became inactive rose over the 30 days and the number of users who stayed active and very active decreased.
Onto applying the Markov framework
Markov Chains GroundworkMarkov chains are simply mathematical systems that model state-to-state movement using certain probabilistic rules and fixed assumptions.
To put it more simply, when you have a system with fixed states (or segments), and agents/users who can move between those states with a certain fixed probability, you can model it using a Markov chain.
But let us first see if our system satisfies the assumptions of a Markov model:
ASSUMPTION 1: There are a finite set of states. In our system there only 3 segments customers can move in and out of. ASSUMPTION 2: The probabilities of moving between states are fixed. I admit, this is a strong assumption. While my system does take into account hundreds of thousands of user data points, it is easy to believe the variance of probabilities for different 30 day time frames shouldn’t be too large. But even with a lot of data, as we will see later on in the article, we have to be cautious. ASSUMPTION 3: State accessibility. Users in any segment can move to a different segment without any external restriction. ASSUMPTION 4: Non-cyclic. The segment-to-segment movement is in no way ‘automatic’ in our system, so this assumption is satisfied.Our system performs well against most assumptions of the Markov chain. This gives us some confidence in our model estimates, which we will get to once we build the model.
Constructing the MarkovChainThere are three parts to a Markov chain that are best represented as matrix-vector multiplication. If you are completely new to linear algebra, I would recommend going through this link before proceeding with the article.

N represents the number ofsegments
The initial state in our system is a 3x1 vector that represents the number of users in each segment. The end state is also a 3x1 vector that shows the number of users in each segment after the first month (after we multiply the initial state vector with the probabilities matrix). The transition probability matrix is a 3x3 matrix that represents the fixed probabilities of moving to and from different customer segments.
So how do we calculate these fixed probabilities?

From our recorded segment movements. We look at how users from each segment on day 1 moved to various segments after 30 days and calculate the probabilities accordingly (equivalent to proportions).
0.89 (in the picture) refers to the probability that someone in segment 1 on day 1 stays in the same segment after 30 days i.e. it is the probability that an inactive user will stay inactive after 30 days. Note, the probabilities in each column must add up to 1. We repeat this process for all segments and build the final transition matrix:

Deconstructing the transition matrix for a c