# Marking Statistically Significant Values using Pandas

Wed 21 May 2014 by Eoin Travers

Writing a results section, I had some data collated using the Pandas library in Python, which I wanted to display the mean for a number of groups, and show if that mean was significantly different from chance (.5) in each case.

Calculating the means, and running the binomial test, is simple. I'll demonstrate with a data set from UCLA, the details of which aren't important, but I'm going to look at average admit, grouped by rank.

import pandas as pd
from scipy import stats
# Data courtesy of http://www.ats.ucla.edu/stat/r/dae/logit.htm

data_grouped = data.groupby('rank') # Grouped values
data_means = data_grouped.mean() # Mean values

# Number of values in the first group (assuming all groups to be equal)
# Run a binomial tests for each group
# m*N = Mean accuracy * Number of trials = Total Accuracy
# .5 = Chance.
data_means['p'] = [stats.binom_test(m*N, N, .5) for m in data_means.admit]

rank
1     0.541  0.609
2     0.358  0.020
3     0.231  0.000
4     0.179  0.000

[4 rows x 2 columns]

To output this they way you would expect in a publication, I used the following function, which takes a Pandas dataframe, a list of value column names, a list of p value column names, and a number why which to round to output. The input is a list, rather than just a value, so you can enter a list of columns for each.

def mark_sig(df, val_cols, p_cols, round_to=3):
df = df.copy() # Don't modify the original data
mapper = {1:'', .1:' .', .05:' *', .01:' **', .001:' ***', .0001:' ***',}
posible_p = [.0001, .001, .01, .05, .1, 1]
for val_col, p_col in zip(val_cols, p_cols):
# For each value/p value pairing...
for i in range(len(df)):
# For every row...
val = df[val_col].iloc[i]
for p in posible_p:
# Check if the p value if below any of those on the list
if df[p_col].iloc[i] < p:
# If so, add the appropriate asterisks
df[val_col].iloc[i] = str(np.round(val, round_to)) + mapper[p]
break
print_me = val_cols # Only print the value columns
print df[print_me]