Welcome Spring 2025 Students!

Walkthrough#

Learning Objectives#

At the end of this learning activity you will be able to:

  • Practice summarize observations by sample using groupby.

  • Measure the uncertainty of the estimate of the mean.

  • Distinguish when to use parametric and non-parametric estimates of error.

  • Practice merging two dataframes.

This week we will start looking at the imaging data we discussed with Dr. Gaskill. In this experiment, they used pH responsive beads that flouresce when in the low pH environment of the phagasome. With this technology, they exposed cells to different levels of dopamine and measured the uptake of these beads. They did this using a high content imager which automates the process of scanning a plate, detecting cell boundaries, and spots of flourescing beads.

This imager returns a giant spreadsheet where each cell is a row and the columns are the cell area, bead count, and intensity. This dataset of a single 96-well plate has over 315,000 cells measured across 60 samples of 20 conditions performed in triplicate.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
cell_level_data = pd.read_csv('pHrodo_DMEM.csv')
cell_level_data.head()
Well Field Cell_Number Top Left XCentroid YCentroid ObjectAreaCh1 ObjectTotalIntenCh1 ObjectAvgIntenCh1 ObjectVarIntenCh1 SpotCountCh2 SpotTotalAreaCh2 SpotAvgAreaCh2 SpotTotalIntenCh2 SpotAvgIntenCh2 TotalIntenCh2 AvgIntenCh2
0 B2 1 1 127 634 703.418228 197.382766 14889 40514018 2721.070455 1428.303950 5 122 24.400000 113038 926.540984 1151084 77.311035
1 B2 1 2 203 53 119.211656 272.750579 14670 36697977 2501.566258 1270.956156 0 0 0.000000 0 0.000000 531102 36.203272
2 B2 1 3 477 595 664.088627 538.774772 14341 38322709 2672.248030 1543.100214 16 382 23.875000 823276 2155.172775 3011914 210.021198
3 B2 1 4 488 302 389.107857 581.589321 25302 59364634 2346.242748 939.256416 21 324 15.428571 259496 800.913580 1265323 50.008814
4 B2 1 5 713 717 808.196306 790.656465 22414 57729816 2575.614170 1724.895666 0 0 0.000000 0 0.000000 590502 26.345231

Sumarize by sample#

Q1: How many cells are in each well?#

# Use `groupby` to count the number of cells per well

cells_per_well = cell_level_data.groupby('Well')['Cell_Number'].count() # SOLUTION
cells_per_well.describe()
count     60.000000
mean     525.966667
std      133.318145
min      257.000000
25%      418.250000
50%      528.000000
75%      641.000000
max      794.000000
Name: Cell_Number, dtype: float64
grader.check("q1_cells_per_well")
cells_per_well.plot(kind='box')
<Axes: >
../../_images/b979d078f716f721a5d16224facdca1c7bbb36f080d5acdd06f062ce1540c8a8.png

The count ranges from 257 to 794 with an average of 525 cells per well.

Measuring phagocytosis#

Each cell can take up 0 or more pH beads. Our biological question is whether dopamine changes the amount of beads that are taken up by the cells.

sns.histplot(data = cell_level_data,
             x = 'SpotCountCh2',
             bins = np.arange(0, 100),
             stat = 'percent')
<Axes: xlabel='SpotCountCh2', ylabel='Percent'>
../../_images/92eafa323a5c19483393e505ed9ccf9b9e824775d3ed8c1cff55231972efef20.png

From our graph, we can see that most cells took up 0 beads and then about 10% took up 1, ~5% took up two, etc.

We hypothesize that dopamine treatment will increase the average number of beads taken up by cells.

# Visually
sns.barplot(data = cell_level_data,
            y = 'Well',
            x = 'SpotCountCh2')
<Axes: xlabel='SpotCountCh2', ylabel='Well'>
../../_images/5cdd476da6626d5e3a3e4221d482de1e545f7f2596ff9d9e78e276aa256770f6.png

The length of the bars indicates the average number of spots per cell while the black hashes indicate the 95% CI of that estimate.

# Numerically
well_level_data = cell_level_data.groupby('Well')['SpotCountCh2'].agg(['mean', 'sem', 'count'])
well_level_data.head()
mean sem count
Well
B10 4.138889 0.290394 612
B11 2.692828 0.200705 739
B2 4.599343 0.326940 609
B3 4.391667 0.312008 720
B4 3.464491 0.301991 521

Decoding samples#

Up to now we’ve been treating all of our without knowing which treatment they came from. Now that we’ve collapsed our data to a single representative number for each sample, we can merge with our plate map.

# Load in plate map
plate_map = pd.read_csv('plate_map.csv')

# Treat concentration as a category instead of a number
plate_map['pHrodo_conc_ug'] = pd.Categorical(plate_map['pHrodo_conc_ug'])

plate_map.head()
well pHrodo_conc_ug DA_Tx replicate
0 B2 5.0 veh Rep1
1 C2 5.0 veh Rep2
2 D2 5.0 veh Rep3
3 B3 5.0 DA06 Rep1
4 C3 5.0 DA06 Rep2

This function helps visualize how the plate is layed out.

def fancy_pivot(df):
    import re
    # Extract row letters and column numbers from the 'well' column
    df['row'] = df['well'].apply(lambda x: re.match(r'([A-H])', x).group(1))
    df['col'] = df['well'].apply(lambda x: int(re.match(r'[A-H]([0-9]{1,2})', x).group(1)))

    # Concatenate all other columns as 'V1-V2-V3' format
    value_columns = [col for col in df.columns if col not in ['well', 'row', 'col']]
    df['values'] = df[value_columns].astype(str).agg('-'.join, axis=1)

    # Create pivot table with aggfunc as 'first'
    pivot_table = df.pivot_table(index='row', columns='col', values='values', aggfunc='first')

    return pivot_table


fancy_pivot(plate_map)
col 2 3 4 5 6 7 8 9 10 11
row
B 5.0-veh-Rep1 5.0-DA06-Rep1 5.0-DA07-Rep1 5.0-DA08-Rep1 5.0-DA09-Rep1 5.0-DA10-Rep1 5.0-DA11-Rep1 5.0-DA12-Rep1 5.0-DA13-Rep1 5.0-DA14-Rep1
C 5.0-veh-Rep2 5.0-DA06-Rep2 5.0-DA07-Rep2 5.0-DA08-Rep2 5.0-DA09-Rep2 5.0-DA10-Rep2 5.0-DA11-Rep2 5.0-DA12-Rep2 5.0-DA13-Rep2 5.0-DA14-Rep2
D 5.0-veh-Rep3 5.0-DA06-Rep3 5.0-DA07-Rep3 5.0-DA08-Rep3 5.0-DA09-Rep3 5.0-DA10-Rep3 5.0-DA11-Rep3 5.0-DA12-Rep3 5.0-DA13-Rep3 5.0-DA14-Rep3
E 7.5-veh-Rep1 7.5-DA06-Rep1 7.5-DA07-Rep1 7.5-DA08-Rep1 7.5-DA09-Rep1 7.5-DA10-Rep1 7.5-DA11-Rep1 7.5-DA12-Rep1 7.5-DA13-Rep1 7.5-DA14-Rep1
F 7.5-veh-Rep2 7.5-DA06-Rep2 7.5-DA07-Rep2 7.5-DA08-Rep2 7.5-DA09-Rep2 7.5-DA10-Rep2 7.5-DA11-Rep2 7.5-DA12-Rep2 7.5-DA13-Rep2 7.5-DA14-Rep2
G 7.5-veh-Rep3 7.5-DA06-Rep3 7.5-DA07-Rep3 7.5-DA08-Rep3 7.5-DA09-Rep3 7.5-DA10-Rep3 7.5-DA11-Rep3 7.5-DA12-Rep3 7.5-DA13-Rep3 7.5-DA14-Rep3

Merge the plate map with the well level aggregates.

sample_level_data = pd.merge(plate_map, well_level_data,
                             left_on = 'well', right_index = True)
sample_level_data.head()
well pHrodo_conc_ug DA_Tx replicate row col values mean sem count
0 B2 5.0 veh Rep1 B 2 5.0-veh-Rep1 4.599343 0.326940 609
1 C2 5.0 veh Rep2 C 2 5.0-veh-Rep2 4.737265 0.426620 373
2 D2 5.0 veh Rep3 D 2 5.0-veh-Rep3 3.623116 0.350467 398
3 B3 5.0 DA06 Rep1 B 3 5.0-DA06-Rep1 4.391667 0.312008 720
4 C3 5.0 DA06 Rep2 C 3 5.0-DA06-Rep2 5.070039 0.538362 257

Now we can visualize the well level aggregates by the treatment condition.

ax = sns.barplot(data = sample_level_data,
            hue = 'pHrodo_conc_ug',
            x = 'DA_Tx',
            y = 'mean', errorbar=('se', 2), alpha=0.5)

sns.stripplot(data = sample_level_data,
            hue = 'pHrodo_conc_ug',
            x = 'DA_Tx',
            y = 'mean', dodge=True, legend=False, ax=ax)

ax.set_ylabel('mean(bead count)')
Text(0, 0.5, 'mean(bead count)')
../../_images/56fcc41c41d6c206e1272e2fe1b00259f0c70e6e3b7f57df56f9bcd01b12696c.png

Q2: Describe the graph#

Points: 5

# Which experimental condition (pHrodo_conc_ug) had less noise in the measurement?
# Answer 5.0 or 7.5
q2a = 5.0 # SOLUTION

q2a reasoning: your answer here

# Does this graph show evidence that dopamine increases the amount of beads phagocytosed?
# Anwser 'yes' or 'no'
q2b = 'no' # SOLUTION

q2b reasoning: your answer here

grader.check("q2_graph")

In the next few weeks well cover strategies to quantify our hypothesis using techniques like ANOVAs and multiple regression.