- Python Data Analysis Cookbook
- Ivan Idris
- 272字
- 2025-04-04 19:55:25
Correlating a binary and a continuous variable with the point biserial correlation
The point-biserial correlation correlates a binary variable Y and a continuous variable X. The coefficient is calculated as follows:

The subscripts in (3.21) correspond to the two groups of the binary variable. M1 is the mean of X for values corresponding to group 1 of Y. M2 is the mean of X for values corresponding to group 0 of Y.
In this recipe, the binary variable we will use is rain or no rain. We will correlate this variable with temperature.
How to do it...
We will calculate the correlation with the scipy.stats.pointbiserialr()
function. We will also compute the rolling correlation using a 2 year window with the np.roll()
function. The steps are as follows:
- The imports are as follows:
import dautil as dl from scipy import stats import numpy as np import matplotlib.pyplot as plt import pandas as pd from IPython.display import HTML
- Load the data and correlate the two relevant arrays:
df = dl.data.Weather.load().dropna() df['RAIN'] = df['RAIN'] > 0 stats_corr = stats.pointbiserialr(df['RAIN'].values, df['TEMP'].values)
- Compute the 2 year rolling correlation as follows:
N = 2 * 365 corrs = [] for i in range(len(df.index) - N): x = np.roll(df['RAIN'].values, i)[:N] y = np.roll(df['TEMP'].values, i)[:N] corrs.append(stats.pointbiserialr(x, y)[0]) corrs = pd.DataFrame(corrs, index=df.index[N:], columns=['Correlation']).resample('A')
- Plot the results with the following code:
plt.plot(corrs.index.values, corrs.values) plt.hlines(stats_corr[0], corrs.index.values[0], corrs.index.values[-1], label='Correlation using the whole data set') plt.title('Rolling Point-biserial Correlation of Rain and Temperature with a 2 Year Window') plt.xlabel('Year') plt.ylabel('Correlation') plt.legend(loc='best') HTML(dl.report.HTMLBuilder().watermark())
Refer to the following screenshot for the end result (see correlating_pointbiserial.ipynb
file in this book's code bundle):

See also
- The relevant SciPy documentation at 2015).