top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Statistics: Churn Modeling 2

Descriptive statistics summarize and describe relatively basic but essential features of a dataset (Jansen, 2023). Descriptive statistics can offer a snapshot of characteristics in the dataset and provide a better understanding of the raw data. They can also simplify data by reducing large amounts of data into a single number. The Customer Churn program used a number of different descriptive statistics to try to provide the user with an understanding of who was leaving the bank, and why.
The first statistic used was variance. Variance is a statistical measurement of the spread between numbers in a data set. It is calculated by taking the differences between each number in the data set and the mean, squaring the differences to make them positive, and then dividing the sum of the squares by the number of values (Hayes, 2024). The first thing this program did was provide the variance for the independent variables credit score, age, tenure, balance, and estimated salary.
Next we see a bar chart of the total number of customers and the total number of customers that exited the bank. This numbers is then split into geographic region and we can tell by the bar chart that France had the most customers, but also the highest percent leaving the bank. The next IV we examine is age. It appears that the majority of customers leaving the bank are in their mid-thirties. However, based on the data so far, we do not know if this is a meaningful number because we do not know the age breakdown of all the customers. Perhaps most of the bank’s customers are young adults and that is why we see such a huge spike in that age group leaving the bank.
We then take a look at how long the customers have been with the bank and try to determine if tenure is a relevant factor in determining who is likely to leave the bank. We see that most people do not leave in the first year, or after ten. Years 1-9 are steady, so tenure does not appear to be a factor.
When we look at the number of products each customer has we start to see meaningful numbers. The more products a customer has, the more likely they are to stay with the bank. If a customer only has one product then they are 3-4x more likely to leave than if they have two. This is valuable information to banks that can lead to changes in the way they price products. For example: open a checking account and the maintenance fee on a savings account will be waived.
It’s not just the number of accounts that matter, it is the type of account. Next we look at credit cards and we see that customers who have a credit card are far less likely to leave the bank than those who do not. If this were my project, I would look at the differences between fixed and revolving credit types, but we do not have that data.
Finally, we examine estimated salary. Does a person’s salary play a role in whether or not they will stay? The data suggests that those on the lower income scale may be more likely to leave. The next logical question would be ‘why?’. I would like to know if people in this income bracket have more overdraft fees and whether it is really overdraft fees that is the driving force, or income.
Logistic regression is not sensitive to the magnitude of variables, so it is not necessary to standardize the data. However, due to the high variance of several IV’s there would need to be standardization if an SVM was used. The variance would be too noisy for an SVM.

References
Hayes, A. (2024, September 11). What Is Variance in Statistics? Definition, Formula, and Example. Investopedia. https://www.investopedia.com/terms/v/variance.asp
Jaadi, Z. (2023, August 4). When and Why to Standardize Your DataA simple guide on when it is necessary to standardize your data. built in. https://builtin.com/data-science/when-and-why-standardize-your-data
Jansen, D. (2023, October). Quant Analysis 101: Descriptive Statistics. Grad Coach. https://gradcoach.com/descriptive-statistics/#:~:text=For%20example%2C%20a%20descriptive%20statistic,talk%20is%20called%20the%20mean).

bottom of page