Walkthrough

Walkthrough#

Learning Objectives#

At the end of this learning activity you will be able to:

Practice summarize observations by sample using groupby.
Measure the uncertainty of the estimate of the mean.
Distinguish when to use parametric and non-parametric estimates of error.
Practice merging two dataframes.

This week we will start looking at the imaging data we discussed with Dr. Gaskill. In this experiment, they used pH responsive beads that flouresce when in the low pH environment of the phagasome. With this technology, they exposed cells to different levels of dopamine and measured the uptake of these beads. They did this using a high content imager which automates the process of scanning a plate, detecting cell boundaries, and spots of flourescing beads.

This imager returns a giant spreadsheet where each cell is a row and the columns are the cell area, bead count, and intensity. This dataset of a single 96-well plate has over 315,000 cells measured across 60 samples of 20 conditions performed in triplicate.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

cell_level_data = pd.read_csv('pHrodo_DMEM.csv')
cell_level_data.head()

	Well	Field	Cell_Number	Top	Left	XCentroid	YCentroid	ObjectAreaCh1	ObjectTotalIntenCh1	ObjectAvgIntenCh1	ObjectVarIntenCh1	SpotCountCh2	SpotTotalAreaCh2	SpotAvgAreaCh2	SpotTotalIntenCh2	SpotAvgIntenCh2	TotalIntenCh2	AvgIntenCh2
0	B2	1	1	127	634	703.418228	197.382766	14889	40514018	2721.070455	1428.303950	5	122	24.400000	113038	926.540984	1151084	77.311035
1	B2	1	2	203	53	119.211656	272.750579	14670	36697977	2501.566258	1270.956156	0	0	0.000000	0	0.000000	531102	36.203272
2	B2	1	3	477	595	664.088627	538.774772	14341	38322709	2672.248030	1543.100214	16	382	23.875000	823276	2155.172775	3011914	210.021198
3	B2	1	4	488	302	389.107857	581.589321	25302	59364634	2346.242748	939.256416	21	324	15.428571	259496	800.913580	1265323	50.008814
4	B2	1	5	713	717	808.196306	790.656465	22414	57729816	2575.614170	1724.895666	0	0	0.000000	0	0.000000	590502	26.345231

Sumarize by sample#

Q1: How many cells are in each well?#

# Use `groupby` to count the number of cells per well

cells_per_well = cell_level_data.groupby('Well')['Cell_Number'].count() # SOLUTION
cells_per_well.describe()

count     60.000000
mean     525.966667
std      133.318145
min      257.000000
25%      418.250000
50%      528.000000
75%      641.000000
max      794.000000
Name: Cell_Number, dtype: float64

grader.check("q1_cells_per_well")

cells_per_well.plot(kind='box')

<Axes: >

../../_images/b979d078f716f721a5d16224facdca1c7bbb36f080d5acdd06f062ce1540c8a8.png

The count ranges from 257 to 794 with an average of 525 cells per well.

Measuring phagocytosis#

Each cell can take up 0 or more pH beads. Our biological question is whether dopamine changes the amount of beads that are taken up by the cells.

sns.histplot(data = cell_level_data,
             x = 'SpotCountCh2',
             bins = np.arange(0, 100),
             stat = 'percent')

<Axes: xlabel='SpotCountCh2', ylabel='Percent'>

../../_images/92eafa323a5c19483393e505ed9ccf9b9e824775d3ed8c1cff55231972efef20.png

From our graph, we can see that most cells took up 0 beads and then about 10% took up 1, ~5% took up two, etc.

We hypothesize that dopamine treatment will increase the average number of beads taken up by cells.

# Visually
sns.barplot(data = cell_level_data,
            y = 'Well',
            x = 'SpotCountCh2')

<Axes: xlabel='SpotCountCh2', ylabel='Well'>

../../_images/5cdd476da6626d5e3a3e4221d482de1e545f7f2596ff9d9e78e276aa256770f6.png

The length of the bars indicates the average number of spots per cell while the black hashes indicate the 95% CI of that estimate.

# Numerically
well_level_data = cell_level_data.groupby('Well')['SpotCountCh2'].agg(['mean', 'sem', 'count'])
well_level_data.head()

	mean	sem	count
Well
B10	4.138889	0.290394	612
B11	2.692828	0.200705	739
B2	4.599343	0.326940	609
B3	4.391667	0.312008	720
B4	3.464491	0.301991	521

Decoding samples#

Up to now we’ve been treating all of our without knowing which treatment they came from. Now that we’ve collapsed our data to a single representative number for each sample, we can merge with our plate map.

# Load in plate map
plate_map = pd.read_csv('plate_map.csv')

# Treat concentration as a category instead of a number
plate_map['pHrodo_conc_ug'] = pd.Categorical(plate_map['pHrodo_conc_ug'])

plate_map.head()

	well	pHrodo_conc_ug	DA_Tx	replicate
0	B2	5.0	veh	Rep1
1	C2	5.0	veh	Rep2
2	D2	5.0	veh	Rep3
3	B3	5.0	DA06	Rep1
4	C3	5.0	DA06	Rep2

This function helps visualize how the plate is layed out.

def fancy_pivot(df):
    import re
    # Extract row letters and column numbers from the 'well' column
    df['row'] = df['well'].apply(lambda x: re.match(r'([A-H])', x).group(1))
    df['col'] = df['well'].apply(lambda x: int(re.match(r'[A-H]([0-9]{1,2})', x).group(1)))

    # Concatenate all other columns as 'V1-V2-V3' format
    value_columns = [col for col in df.columns if col not in ['well', 'row', 'col']]
    df['values'] = df[value_columns].astype(str).agg('-'.join, axis=1)

    # Create pivot table with aggfunc as 'first'
    pivot_table = df.pivot_table(index='row', columns='col', values='values', aggfunc='first')

    return pivot_table


fancy_pivot(plate_map)

col	2	3	4	5	6	7	8	9	10	11
row
B	5.0-veh-Rep1	5.0-DA06-Rep1	5.0-DA07-Rep1	5.0-DA08-Rep1	5.0-DA09-Rep1	5.0-DA10-Rep1	5.0-DA11-Rep1	5.0-DA12-Rep1	5.0-DA13-Rep1	5.0-DA14-Rep1
C	5.0-veh-Rep2	5.0-DA06-Rep2	5.0-DA07-Rep2	5.0-DA08-Rep2	5.0-DA09-Rep2	5.0-DA10-Rep2	5.0-DA11-Rep2	5.0-DA12-Rep2	5.0-DA13-Rep2	5.0-DA14-Rep2
D	5.0-veh-Rep3	5.0-DA06-Rep3	5.0-DA07-Rep3	5.0-DA08-Rep3	5.0-DA09-Rep3	5.0-DA10-Rep3	5.0-DA11-Rep3	5.0-DA12-Rep3	5.0-DA13-Rep3	5.0-DA14-Rep3
E	7.5-veh-Rep1	7.5-DA06-Rep1	7.5-DA07-Rep1	7.5-DA08-Rep1	7.5-DA09-Rep1	7.5-DA10-Rep1	7.5-DA11-Rep1	7.5-DA12-Rep1	7.5-DA13-Rep1	7.5-DA14-Rep1
F	7.5-veh-Rep2	7.5-DA06-Rep2	7.5-DA07-Rep2	7.5-DA08-Rep2	7.5-DA09-Rep2	7.5-DA10-Rep2	7.5-DA11-Rep2	7.5-DA12-Rep2	7.5-DA13-Rep2	7.5-DA14-Rep2
G	7.5-veh-Rep3	7.5-DA06-Rep3	7.5-DA07-Rep3	7.5-DA08-Rep3	7.5-DA09-Rep3	7.5-DA10-Rep3	7.5-DA11-Rep3	7.5-DA12-Rep3	7.5-DA13-Rep3	7.5-DA14-Rep3

Merge the plate map with the well level aggregates.

sample_level_data = pd.merge(plate_map, well_level_data,
                             left_on = 'well', right_index = True)
sample_level_data.head()

	well	pHrodo_conc_ug	DA_Tx	replicate	row	col	values	mean	sem	count
0	B2	5.0	veh	Rep1	B	2	5.0-veh-Rep1	4.599343	0.326940	609
1	C2	5.0	veh	Rep2	C	2	5.0-veh-Rep2	4.737265	0.426620	373
2	D2	5.0	veh	Rep3	D	2	5.0-veh-Rep3	3.623116	0.350467	398
3	B3	5.0	DA06	Rep1	B	3	5.0-DA06-Rep1	4.391667	0.312008	720
4	C3	5.0	DA06	Rep2	C	3	5.0-DA06-Rep2	5.070039	0.538362	257

Now we can visualize the well level aggregates by the treatment condition.

ax = sns.barplot(data = sample_level_data,
            hue = 'pHrodo_conc_ug',
            x = 'DA_Tx',
            y = 'mean', errorbar=('se', 2), alpha=0.5)

sns.stripplot(data = sample_level_data,
            hue = 'pHrodo_conc_ug',
            x = 'DA_Tx',
            y = 'mean', dodge=True, legend=False, ax=ax)

ax.set_ylabel('mean(bead count)')

Text(0, 0.5, 'mean(bead count)')

../../_images/56fcc41c41d6c206e1272e2fe1b00259f0c70e6e3b7f57df56f9bcd01b12696c.png

Q2: Describe the graph#

Points: 5

# Which experimental condition (pHrodo_conc_ug) had less noise in the measurement?
# Answer 5.0 or 7.5
q2a = 5.0 # SOLUTION

q2a reasoning: your answer here

# Does this graph show evidence that dopamine increases the amount of beads phagocytosed?
# Anwser 'yes' or 'no'
q2b = 'no' # SOLUTION

q2b reasoning: your answer here

grader.check("q2_graph")

In the next few weeks well cover strategies to quantify our hypothesis using techniques like ANOVAs and multiple regression.