Walkthrough

Walkthrough#

Introduction#

An emerging area of biomedical research over the past decade as been the human microbiome. This field studies the commensal bacteria that inhabit our bodies and how they influence our health. These can be found everywhere from our digestive system, to our skin, to our ears, and to every part of our body. Often times hundreds of different bacterial species can be isolated from a single body site of a single individual. Disease can be caused or exacerbated by an imbalance in these species.

This week, we will explore the data generated by researchers here at Drexel. From a collection of 12 patients they measured the microbiome of of 12 body sites of the inner ear. Some of these patients had inner ear infections (otitis media) and different disease outcomes. This week, we will use Python to generate pivot-tables and bar-plots to understand whether the microbiome is impacted by disease outcome.

Learning Objectives#

At the end of this learning activity you will be able to:

Practice using query to extract data from a larger table.
Calculate summary values across a a pd.DataFrame using methods like sum, mean, and max.
Utilize pd.DataFrame.groupby to aggregate and transform data by group.
Use pd.merge to combine data held in two different tables.
Employ pd.pivot_table and pd.melt to reshape and summarize data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

data = pd.read_csv('microbiome_phylum_data.tsv', sep = '\t')
data

	Patient	Location	CollectionType	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria	num_otu	Predominant
0	3062	Nasal Vestibule	Swab	2516	44	14987	0	16	Firmicutes
1	3094	Nasal Vestibule	Swab	103	0	1397	0	15	Firmicutes
2	3095	Nasal Vestibule	Swab	1474	0	5510	29	21	Firmicutes
3	3115	Nasal Vestibule	Swab	0	0	5480	0	2	Firmicutes
4	3116	Nasal Vestibule	Swab	2	0	2324	1	4	Firmicutes
...	...	...	...	...	...	...	...	...	...
103	3094	Sphenoid Tissue	Biopsy	1540	0	784	13	14	Actinobacteria
104	3095	Sphenoid Tissue	Biopsy	670	0	703	0	14	Firmicutes
105	3116	Sphenoid Tissue	Biopsy	309	9	6709	495	17	Firmicutes
106	3117	Sphenoid Tissue	Biopsy	1812	0	1954	129	18	Firmicutes
107	3119	Sphenoid Tissue	Biopsy	4183	0	988	13	23	Actinobacteria

108 rows × 9 columns

Exploring a single patient#

First, we’ll explore the distribution of bacteria of a single individual across body site.

Q1: Extract the information for patient 3116#

pat_3116 = data.query('Patient == 3116')  # SOLUTION

pat_3116.head()

	Patient	Location	CollectionType	Actinobacteria	Firmicutes	Proteobacteria	num_otu	Predominant
4	3116	Nasal Vestibule	Swab	2	2324	1	4	Firmicutes
14	3116	Head of Inferior Turbinate Tissue	Biopsy	19	6261	21	5	Firmicutes
25	3116	Middle Meatus	Swab	5	988	2	10	Firmicutes
36	3116	Uncinate Process Tissue	Biopsy	17	7249	79	12	Firmicutes
42	3116	Maxillary Sinus	Swab	34	5748	190	11	Firmicutes

print(f'pat_3116 is an `DataFrame`: {isinstance(pat_3116, pd.DataFrame)}')

pat_3116 is an `DataFrame`: True

print(f'len(pat_3116) = {len(pat_3116)}')

len(pat_3116) = 12

# Checking for specific rows.
print(f'pat_3116["Actinobacteria"].sum() = {pat_3116["Actinobacteria"].sum()}')

pat_3116["Actinobacteria"].sum() = 1145

Q2: Calculate the average count across regions for each phylum for patient 3116.#

q2_Actinobacteria_mean = pat_3116['Actinobacteria'].mean() # SOLUTION
q2_Bacteroidetes_mean = pat_3116['Bacteroidetes'].mean() # SOLUTION
q2_Firmicutes_mean = pat_3116['Firmicutes'].mean() # SOLUTION
q2_Proteobacteria_mean = pat_3116['Proteobacteria'].mean() # SOLUTION

print(f'q2_Actinobacteria_mean = {q2_Actinobacteria_mean:0.2f}')

q2_Actinobacteria_mean = 95.42

print(f'q2_Bacteroidetes_mean = {q2_Bacteroidetes_mean:0.2f}')

q2_Bacteroidetes_mean = 1.25

print(f'q2_Firmicutes_mean = {q2_Firmicutes_mean:0.2f}')

q2_Firmicutes_mean = 5135.50

print(f'q2_Proteobacteria_mean = {q2_Proteobacteria_mean:0.2f}')

q2_Proteobacteria_mean = 299.75

Summarizing by grouping#

Now that we’ve looked at the summary values for a single individual, how would we look at this for each individual? Copy-pasting that over and over is unsustainable, DataFrames have useful methods for dealing with this problem.

All of these fall into the same basic strategy.

Split - Apply - Combine.

# Split

grouped_patients = data.groupby('Patient')

# Apply - Combine

# Capitalizing constants is useful if you will re-use them often.
PHYLUM_COLS = ['Actinobacteria', 'Bacteroidetes',
                'Firmicutes', 'Proteobacteria']

mean_vals = grouped_patients[PHYLUM_COLS].mean()
mean_vals

	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria
Patient
3062	727.571429	7.714286	4513.714286	2.000000
3094	1356.900000	0.100000	2384.300000	79.100000
3095	1447.111111	0.111111	2753.666667	47.777778
3115	49.875000	1.375000	4266.875000	22.000000
3116	95.416667	1.250000	5135.500000	299.750000
3117	1164.333333	0.000000	1859.777778	69.000000
3118	1329.125000	3.125000	3271.125000	224.375000
3119	2692.727273	0.000000	1437.636364	5.727273
3120	1679.333333	0.000000	1435.555556	18.888889
3121	1205.428571	18.571429	3486.142857	29.714286
3123	1852.142857	12.285714	2536.428571	129.285714
3124	1603.545455	0.000000	1998.545455	39.272727

# This is commonly done in a single "sentence"

mean_vals = data.groupby('Patient')[PHYLUM_COLS].mean()

Q3: Calculate the average counts of each phylum by body site.#

q3_mean_phylum_site = data.groupby('Location')[PHYLUM_COLS].mean() # SOLUTION

q3_mean_phylum_site

	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria
Location
Ethmoid Culture (Deep to Ethmoid Bulla)	1483.750000	3.416667	2757.416667	127.000000
Ethmoid Tissue (Deep to Ethmoid Bulla)	1449.444444	0.777778	1715.555556	92.111111
Head of Inferior Turbinate Tissue	317.000000	0.636364	2165.727273	24.727273
Maxillary Sinus	1691.000000	0.000000	3002.750000	122.500000
Maxillary Sinus Tissue	976.666667	0.000000	2966.000000	77.000000
Middle Meatus	1158.363636	0.181818	3131.363636	43.000000
Nasal Vestibule	1161.400000	12.500000	4373.000000	37.900000
Sphenoethmoidal Recess Tissue	930.888889	4.000000	2019.666667	175.777778
Sphenoid	1974.750000	5.416667	3821.000000	69.666667
Sphenoid Tissue	1702.800000	1.800000	2227.600000	130.000000
Superior Meatus	1704.416667	1.166667	3547.250000	127.416667
Uncinate Process Tissue	868.900000	1.700000	2259.600000	40.700000

print(f'q3_mean_phylum_site is a `DataFrame`: {isinstance(q3_mean_phylum_site, pd.DataFrame)}')

q3_mean_phylum_site is a `DataFrame`: True

print(f'q3_mean_phylum_site.index = {list(q3_mean_phylum_site.index)}')

q3_mean_phylum_site.index = ['Ethmoid Culture (Deep to Ethmoid Bulla)', 'Ethmoid Tissue (Deep to Ethmoid Bulla)', 'Head of Inferior Turbinate Tissue', 'Maxillary Sinus', 'Maxillary Sinus Tissue', 'Middle Meatus', 'Nasal Vestibule', 'Sphenoethmoidal Recess Tissue', 'Sphenoid', 'Sphenoid Tissue', 'Superior Meatus', 'Uncinate Process Tissue']

print(f'q3_mean_phylum_site.columns = {list(q3_mean_phylum_site.columns)}')

q3_mean_phylum_site.columns = ['Actinobacteria', 'Bacteroidetes', 'Firmicutes', 'Proteobacteria']

print(f'q3_mean_phylum_site.sum().sum() = {q3_mean_phylum_site.sum().sum():0.2f}')

q3_mean_phylum_site.sum().sum() = 50505.71

There are a number of different built-in summary functions like this.

data.groupby('Patient')[PHYLUM_COLS].median()

	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria
Patient
3062	461.0	2.0	2714.0	0.0
3094	1000.0	0.0	1467.5	41.5
3095	1223.0	0.0	1244.0	15.0
3115	15.0	0.0	2846.5	1.5
3116	37.0	0.0	6004.5	183.5
3117	1108.0	0.0	1800.0	66.0
3118	1023.5	0.0	3343.0	152.5
3119	2605.0	0.0	988.0	3.0
3120	468.0	0.0	1301.0	27.0
3121	911.0	12.0	3686.0	13.0
3123	1010.0	0.0	1207.0	63.0
3124	1602.0	0.0	2133.0	19.0

data.groupby('Patient')[PHYLUM_COLS].count()

	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria
Patient
3062	7	7	7	7
3094	10	10	10	10
3095	9	9	9	9
3115	8	8	8	8
3116	12	12	12	12
3117	9	9	9	9
3118	8	8	8	8
3119	11	11	11	11
3120	9	9	9	9
3121	7	7	7	7
3123	7	7	7	7
3124	11	11	11	11

data.groupby('Patient')[PHYLUM_COLS].max()

	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria
Patient
3062	2516	44	14987	6
3094	4604	1	5559	406
3095	3926	1	10097	166
3115	211	11	13948	108
3116	340	9	9796	1139
3117	1812	0	3671	129
3118	3709	12	5253	771
3119	7514	0	4372	18
3120	4195	0	2655	36
3121	2816	63	6135	78
3123	4186	81	5885	323
3124	4138	0	3979	183

You can see an extensive list of available summary functions at the Pandas Documentation

If there isn’t a function that does what you want, you can also make your own.

Here is a simple one that scales the data to a unit-norm.

def unit_norm(values):
    "Given a series, return a scaled version"

    mu = values.mean()
    std = values.std()

    return (values-mu)/std

unit_normed_data = data.groupby('Patient', as_index=False)[PHYLUM_COLS].transform(unit_norm)

unit_normed_data

	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria
0	2.158109	2.247876	2.140374	-0.755929
1	-0.901854	-0.316228	-0.515722	-0.660942
2	0.025250	-0.333333	0.862714	-0.303077
3	-0.694809	-0.353553	0.287822	-0.549657
4	-0.790041	-0.441315	-0.985677	-0.838082
...	...	...	...	...
103	0.131693	-0.316228	-0.835926	-0.552316
104	-0.729756	-0.333333	-0.641845	-0.771142
105	1.806312	2.736155	0.551650	0.547734
106	1.315743	NaN	0.101533	1.390656
107	0.700800	NaN	-0.331381	1.106277

108 rows × 4 columns

Notice I used the transform method here instead of a common name.

When applying custom functions to groups of data there are three different methods depending on your final output shape:

.aggregate() or .agg() - Each group of data produces a single summary number. Commonly used to summarize groups.
.transform() - The output will have the same number (and order) of rows as the input. Commonly used for normalizations.
.apply() - Everything else.

Merging data#

Now we come to a common problem, our sample information is in a different file.

sample_info = pd.read_csv('sample_info.csv')
sample_info.head()

	PID	severe_disease	disease_type
0	3062	False	persistent
1	3094	False	persistent
2	3095	False	persistent
3	3115	True	typical
4	3116	True	typical

Now that we have two DataFrames with a common key we can use pd.merge.

merged_info = pd.merge(data, sample_info,
                       left_on = 'Patient', # The column of the key in biome_data
                       right_on = 'PID', # The column of the key in sample_info
                       how = 'inner') # Keep only those in both

merged_info.head()

	Patient	Location	CollectionType	Actinobacteria	Bacteroidetes	Firmicutes	Proteobacteria	num_otu	Predominant	PID	severe_disease	disease_type
0	3062	Nasal Vestibule	Swab	2516	44	14987	0	16	Firmicutes	3062	False	persistent
1	3094	Nasal Vestibule	Swab	103	0	1397	0	15	Firmicutes	3094	False	persistent
2	3095	Nasal Vestibule	Swab	1474	0	5510	29	21	Firmicutes	3095	False	persistent
3	3115	Nasal Vestibule	Swab	0	0	5480	0	2	Firmicutes	3115	True	typical
4	3116	Nasal Vestibule	Swab	2	0	2324	1	4	Firmicutes	3116	True	typical

Q4: Calculate the average counts of each phylum by `severe_disease`.#

q4_severe_means = merged_info.groupby('severe_disease')[PHYLUM_COLS].mean() # SOLUTION

print(f'q4_severe_means is a `DataFrame`: {isinstance(q4_severe_means, pd.DataFrame)}')

q4_severe_means is a `DataFrame`: True

print(f'q4_severe_means.index = {list(q4_severe_means.index)}')

q4_severe_means.index = [False, True]

print(f'q4_severe_means.columns = {list(q4_severe_means.columns)}')

q4_severe_means.columns = ['Actinobacteria', 'Bacteroidetes', 'Firmicutes', 'Proteobacteria']

print(f'q4_severe_means.sum().sum() = {q4_severe_means.sum().sum():0.2f}')

q4_severe_means.sum().sum() = 8544.57

We can also do more advanced things like this:

merged_info.groupby(['Location', 'severe_disease'])[PHYLUM_COLS].aggregate(['mean', 'std'])

		Actinobacteria		Bacteroidetes		Firmicutes		Proteobacteria
		mean	std	mean	std	mean	std	mean	std
Location	severe_disease
Ethmoid Culture (Deep to Ethmoid Bulla)	False	1382.714286	875.245435	4.142857	10.106575	2053.142857	1314.670989	31.571429	32.444091
Ethmoid Culture (Deep to Ethmoid Bulla)	True	1625.200000	1698.904706	2.400000	5.366563	3743.400000	3394.971988	260.600000	492.952127
Ethmoid Tissue (Deep to Ethmoid Bulla)	False	1689.400000	1065.430570	0.000000	0.000000	1232.200000	722.258403	91.400000	64.103822
Ethmoid Tissue (Deep to Ethmoid Bulla)	True	1149.500000	1066.299676	1.750000	3.500000	2319.750000	2767.632728	93.000000	125.078642
Head of Inferior Turbinate Tissue	False	394.833333	282.110912	1.166667	2.401388	1123.333333	343.322395	19.500000	26.883080
Head of Inferior Turbinate Tissue	True	223.600000	219.127360	0.000000	0.000000	3416.600000	2021.523510	31.000000	42.023803
Maxillary Sinus	False	2539.500000	2260.620379	0.000000	0.000000	2999.000000	1224.708945	150.000000	193.747258
Maxillary Sinus	True	842.500000	1143.391665	0.000000	0.000000	3006.500000	3877.066481	95.000000	134.350288
Maxillary Sinus Tissue	False	265.000000	NaN	0.000000	NaN	448.000000	NaN	17.000000	NaN
Maxillary Sinus Tissue	True	1332.500000	1827.871029	0.000000	0.000000	4225.000000	5392.396313	107.000000	147.078210
Middle Meatus	False	1171.000000	747.548527	0.333333	0.816497	2916.166667	940.973627	19.666667	18.062853
Middle Meatus	True	1143.200000	1343.776097	0.000000	0.000000	3389.600000	1624.755459	71.000000	97.739450
Nasal Vestibule	False	1598.000000	1496.139811	17.857143	32.313199	4987.000000	4857.272383	52.714286	119.600326
Nasal Vestibule	True	142.666667	245.375902	0.000000	0.000000	2940.333333	2294.448154	3.333333	4.932883
Sphenoethmoidal Recess Tissue	False	1286.000000	1863.761385	6.200000	11.278298	1723.600000	2203.662020	95.000000	174.608133
Sphenoethmoidal Recess Tissue	True	487.000000	427.834080	1.250000	2.500000	2389.750000	1640.061660	276.750000	382.792698
Sphenoid	False	1953.428571	1362.865958	9.142857	23.751692	3441.000000	2253.827337	35.142857	36.503098
Sphenoid	True	2004.600000	3145.778966	0.200000	0.447214	4353.000000	5425.135298	118.000000	124.715677
Sphenoid Tissue	False	1105.000000	615.182900	0.000000	0.000000	743.500000	57.275649	6.500000	9.192388
Sphenoid Tissue	True	2101.333333	1953.139609	3.000000	5.196152	3217.000000	3062.488694	212.333333	251.573714
Superior Meatus	False	2235.000000	1402.061102	2.000000	4.472136	4252.285714	2796.542014	75.000000	69.622793
Superior Meatus	True	961.600000	847.056846	0.000000	0.000000	2560.200000	1857.766186	200.800000	326.720370
Uncinate Process Tissue	False	766.400000	718.500383	0.000000	0.000000	1440.200000	291.232210	18.400000	17.700282
Uncinate Process Tissue	True	971.400000	1580.324745	3.400000	4.979960	3079.000000	2892.901658	63.000000	60.930288

Here I’ve broken things down by body-site and disease status and calculated both a mean and standard deviation. In future lectures we will explore how to quantify this with a significance test. For now, we’ll leave it as a visual comparison.

Pivoting & Melting Dataframes#

This is a process of reshaping, and optionally summarizing, your data as you convert it between wide and long format. These techniques are often required for generating different types of plots.

These are best shown by example.

Pivoting#

long -> wide

pd.pivot_table(merged_info,
               index = 'Patient',
               columns = 'Location',
               values = 'Firmicutes',
               aggfunc = 'mean')

Location	Ethmoid Culture (Deep to Ethmoid Bulla)	Ethmoid Tissue (Deep to Ethmoid Bulla)	Head of Inferior Turbinate Tissue	Maxillary Sinus	Maxillary Sinus Tissue	Middle Meatus	Nasal Vestibule	Sphenoethmoidal Recess Tissue	Sphenoid	Sphenoid Tissue	Superior Meatus	Uncinate Process Tissue
Patient
3062	2714.0	NaN	1746.0	NaN	NaN	1822.0	14987.0	NaN	6125.0	NaN	2992.0	1210.0
3094	385.0	664.0	1089.0	NaN	NaN	4316.0	1397.0	5559.0	4648.0	784.0	3463.0	1538.0
3095	2300.0	651.0	760.0	NaN	NaN	2999.0	5510.0	NaN	519.0	703.0	10097.0	1244.0
3115	2168.0	NaN	4640.0	NaN	NaN	3525.0	5480.0	1861.0	13948.0	NaN	982.0	1531.0
3116	9796.0	6357.0	6261.0	5748.0	8038.0	988.0	2324.0	3673.0	617.0	6709.0	3866.0	7249.0
3117	1800.0	672.0	1499.0	NaN	NaN	2810.0	1017.0	NaN	2082.0	1954.0	3671.0	1233.0
3118	2459.0	380.0	2962.0	NaN	NaN	5253.0	NaN	3724.0	2322.0	NaN	4144.0	4925.0
3119	2494.0	1870.0	1721.0	265.0	412.0	4372.0	NaN	301.0	2796.0	988.0	138.0	457.0
3120	905.0	1614.0	1136.0	NaN	NaN	2342.0	515.0	415.0	2655.0	NaN	2037.0	1301.0
3121	3060.0	NaN	863.0	NaN	NaN	3686.0	4295.0	1624.0	6135.0	NaN	4740.0	NaN
3123	1029.0	914.0	NaN	3865.0	NaN	NaN	5885.0	358.0	1207.0	NaN	4497.0	NaN
3124	3979.0	2318.0	1146.0	2133.0	448.0	2332.0	2320.0	662.0	2798.0	NaN	1940.0	1908.0

This took our “long” data format in which each row represented the observation at a different site of a different person and converted it into a “wide” data format such that each row is a patient and each column is a number of Firmicutes at a location. NaNs represent missing information.

We also had to “give up” some information for this transformation … this is only Firmicutes.

One can do this to include more information:

pd.pivot_table(merged_info,
               index = 'Patient',
               columns = 'Location',
               values = ['Actinobacteria', 'Firmicutes'],
               aggfunc = 'mean')

	Actinobacteria										...	Firmicutes
Location	Ethmoid Culture (Deep to Ethmoid Bulla)	Ethmoid Tissue (Deep to Ethmoid Bulla)	Head of Inferior Turbinate Tissue	Maxillary Sinus	Maxillary Sinus Tissue	Middle Meatus	Nasal Vestibule	Sphenoethmoidal Recess Tissue	Sphenoid	Sphenoid Tissue	...	Head of Inferior Turbinate Tissue	Maxillary Sinus	Maxillary Sinus Tissue	Middle Meatus	Nasal Vestibule	Sphenoethmoidal Recess Tissue	Sphenoid	Sphenoid Tissue	Superior Meatus	Uncinate Process Tissue
Patient
3062	461.0	NaN	232.0	NaN	NaN	292.0	2516.0	NaN	623.0	NaN	...	1746.0	NaN	NaN	1822.0	14987.0	NaN	6125.0	NaN	2992.0	1210.0
3094	243.0	638.0	282.0	NaN	NaN	1486.0	103.0	4604.0	2690.0	1540.0	...	1089.0	NaN	NaN	4316.0	1397.0	5559.0	4648.0	784.0	3463.0	1538.0
3095	1652.0	1038.0	444.0	NaN	NaN	2010.0	1474.0	NaN	587.0	670.0	...	760.0	NaN	NaN	2999.0	5510.0	NaN	519.0	703.0	10097.0	1244.0
3115	80.0	NaN	1.0	NaN	NaN	13.0	0.0	211.0	17.0	NaN	...	4640.0	NaN	NaN	3525.0	5480.0	1861.0	13948.0	NaN	982.0	1531.0
3116	340.0	61.0	19.0	34.0	40.0	5.0	2.0	181.0	32.0	309.0	...	6261.0	5748.0	8038.0	988.0	2324.0	3673.0	617.0	6709.0	3866.0	7249.0
3117	1546.0	1108.0	521.0	NaN	NaN	965.0	426.0	NaN	1537.0	1812.0	...	1499.0	NaN	NaN	2810.0	1017.0	NaN	2082.0	1954.0	3671.0	1233.0
3118	1802.0	824.0	332.0	NaN	NaN	1465.0	NaN	454.0	923.0	NaN	...	2962.0	NaN	NaN	5253.0	NaN	3724.0	2322.0	NaN	4144.0	4925.0
3119	4358.0	2605.0	245.0	1651.0	2625.0	3268.0	NaN	1102.0	7514.0	4183.0	...	1721.0	265.0	412.0	4372.0	NaN	301.0	2796.0	988.0	138.0	457.0
3120	2740.0	3265.0	68.0	NaN	NaN	405.0	107.0	468.0	4195.0	NaN	...	1136.0	NaN	NaN	2342.0	515.0	415.0	2655.0	NaN	2037.0	1301.0
3121	1971.0	NaN	452.0	NaN	NaN	911.0	582.0	399.0	2816.0	NaN	...	863.0	NaN	NaN	3686.0	4295.0	1624.0	6135.0	NaN	4740.0	NaN
3123	1010.0	2263.0	NaN	941.0	NaN	NaN	4186.0	226.0	861.0	NaN	...	NaN	3865.0	NaN	NaN	5885.0	358.0	1207.0	NaN	4497.0	NaN
3124	1602.0	1243.0	891.0	4138.0	265.0	1922.0	2218.0	733.0	1902.0	NaN	...	1146.0	2133.0	448.0	2332.0	2320.0	662.0	2798.0	NaN	1940.0	1908.0

12 rows × 24 columns

But that is usually not a great idea.

Melting#

wide -> long

Our data is part long and part wide (like most real datasets). In some plotting instances we may want to make it “longer” by having each bacteria be a diffent row instead of a different column.

pd.melt(merged_info,
        id_vars = ['Patient', 'Location'], # The things you want preserved in each row
        value_vars = PHYLUM_COLS, # The columns you want to melt
        var_name = 'Phylum', # The name of the column that will have the value_var name
        value_name = 'Counts') # The name of the column that will have the value

	Patient	Location	Phylum	Counts
0	3062	Nasal Vestibule	Actinobacteria	2516
1	3094	Nasal Vestibule	Actinobacteria	103
2	3095	Nasal Vestibule	Actinobacteria	1474
3	3115	Nasal Vestibule	Actinobacteria	0
4	3116	Nasal Vestibule	Actinobacteria	2
...	...	...	...	...
427	3094	Sphenoid Tissue	Proteobacteria	13
428	3095	Sphenoid Tissue	Proteobacteria	0
429	3116	Sphenoid Tissue	Proteobacteria	495
430	3117	Sphenoid Tissue	Proteobacteria	129
431	3119	Sphenoid Tissue	Proteobacteria	13

432 rows × 4 columns

We’ll explore these in more detail as we move into plotting next week. This is just a taste.

Submission#

Submit this assignment through BBLearn.