{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "6d3a4392-aebd-45cd-91b3-3278bc7a4aa3", "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "# Setting up the Colab environment. DO NOT EDIT!\n", "import os\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "try:\n", " import otter\n", "\n", "except ImportError:\n", " ! pip install -q otter-grader==4.0.0\n", " import otter\n", "\n", "if not os.path.exists('walkthrough-tests'):\n", " zip_files = [f for f in os.listdir() if f.endswith('.zip')]\n", " assert len(zip_files)>0, 'Could not find any zip files!'\n", " assert len(zip_files)==1, 'Found multiple zip files!'\n", " ! unzip {zip_files[0]}\n", "\n", "grader = otter.Notebook(colab=True,\n", " tests_dir = 'walkthrough-tests')" ] }, { "cell_type": "markdown", "id": "c7cc7c25-8b0a-4171-a3c5-01e6f7b11bed", "metadata": {}, "source": [ "# Walkthrough" ] }, { "cell_type": "markdown", "id": "d1c2775e", "metadata": {}, "source": [ "## Introduction\n", "\n", "An emerging area of biomedical research over the past decade as been the human microbiome.\n", "This field studies the commensal bacteria that inhabit our bodies and how they influence our health.\n", "These can be found everywhere from our digestive system, to our skin, to our ears, and to every part of our body.\n", "Often times hundreds of different bacterial species can be isolated from a single body site of a single individual.\n", "Disease can be caused or exacerbated by an imbalance in these species.\n", "\n", "This week, we will explore the data generated by researchers here at Drexel.\n", "From a collection of 12 patients they measured the microbiome of of 12 body sites in their nasal passages.\n", "Some of these patients had inner ear infections (_otitis media_) and different disease outcomes.\n", "This week, we will use Python to generate pivot-tables and bar-plots to understand whether the microbiome is impacted by disease outcome." ] }, { "cell_type": "markdown", "id": "74d6867d-860d-4645-b796-f86feed17846", "metadata": { "tags": [] }, "source": [ "## Learning Objectives\n", "At the end of this learning activity you will be able to:\n", " - Practice using `query` to extract data from a larger table.\n", " - Calculate summary values across a a `pd.DataFrame` using methods like sum, mean, and max.\n", " - Utilize `pd.DataFrame.groupby` to aggregate and transform data by group.\n", " - Use `pd.merge` to combine data held in two different tables.\n", " - Employ `pd.pivot_table` and `pd.melt` to reshape and summarize data." ] }, { "cell_type": "code", "execution_count": 2, "id": "66b1764e", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 3, "id": "8c8c34f2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PatientLocationCollectionTypeActinobacteriaBacteroidetesFirmicutesProteobacterianum_otuPredominant
03062Nasal VestibuleSwab25164414987016Firmicutes
13094Nasal VestibuleSwab10301397015Firmicutes
23095Nasal VestibuleSwab1474055102921Firmicutes
33115Nasal VestibuleSwab00548002Firmicutes
43116Nasal VestibuleSwab20232414Firmicutes
..............................
1033094Sphenoid TissueBiopsy154007841314Actinobacteria
1043095Sphenoid TissueBiopsy6700703014Firmicutes
1053116Sphenoid TissueBiopsy3099670949517Firmicutes
1063117Sphenoid TissueBiopsy18120195412918Firmicutes
1073119Sphenoid TissueBiopsy418309881323Actinobacteria
\n", "

108 rows × 9 columns

\n", "
" ], "text/plain": [ " Patient Location CollectionType Actinobacteria Bacteroidetes \\\n", "0 3062 Nasal Vestibule Swab 2516 44 \n", "1 3094 Nasal Vestibule Swab 103 0 \n", "2 3095 Nasal Vestibule Swab 1474 0 \n", "3 3115 Nasal Vestibule Swab 0 0 \n", "4 3116 Nasal Vestibule Swab 2 0 \n", ".. ... ... ... ... ... \n", "103 3094 Sphenoid Tissue Biopsy 1540 0 \n", "104 3095 Sphenoid Tissue Biopsy 670 0 \n", "105 3116 Sphenoid Tissue Biopsy 309 9 \n", "106 3117 Sphenoid Tissue Biopsy 1812 0 \n", "107 3119 Sphenoid Tissue Biopsy 4183 0 \n", "\n", " Firmicutes Proteobacteria num_otu Predominant \n", "0 14987 0 16 Firmicutes \n", "1 1397 0 15 Firmicutes \n", "2 5510 29 21 Firmicutes \n", "3 5480 0 2 Firmicutes \n", "4 2324 1 4 Firmicutes \n", ".. ... ... ... ... \n", "103 784 13 14 Actinobacteria \n", "104 703 0 14 Firmicutes \n", "105 6709 495 17 Firmicutes \n", "106 1954 129 18 Firmicutes \n", "107 988 13 23 Actinobacteria \n", "\n", "[108 rows x 9 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('microbiome_phylum_data.csv', sep = '\\t')\n", "data" ] }, { "cell_type": "markdown", "id": "9e466dd9", "metadata": {}, "source": [ "## Exploring a single patient\n", "\n", "First, we'll explore the distribution of bacteria of a single individual across body site." ] }, { "cell_type": "markdown", "id": "ac13caa4", "metadata": { "deletable": false, "editable": false }, "source": [ "### Q1: Extract the information for patient 3116\n" ] }, { "cell_type": "markdown", "id": "c70ea765-e9b4-44c6-8c9d-c8f1cfe21d99", "metadata": { "deletable": false, "editable": false, "tags": [ "remove_cell" ] }, "source": [ "| | |\n", "| --------------|----|\n", "| Points | 2 |\n", "| Public Checks | 3 |\n", "\n", "_Points:_ 2" ] }, { "cell_type": "code", "execution_count": 4, "id": "1f4466d0", "metadata": { "tags": [ "otter_assign_solution_cell" ] }, "outputs": [], "source": [ "pat_3116 = data.query('Patient == 3116') # SOLUTION" ] }, { "cell_type": "code", "execution_count": 5, "id": "af95ee49-d151-4800-9246-f6ef0981d292", "metadata": { "tags": [ "otter_assign_solution_cell" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PatientLocationCollectionTypeActinobacteriaBacteroidetesFirmicutesProteobacterianum_otuPredominant
43116Nasal VestibuleSwab20232414Firmicutes
143116Head of Inferior Turbinate TissueBiopsy1906261215Firmicutes
253116Middle MeatusSwab50988210Firmicutes
363116Uncinate Process TissueBiopsy17072497912Firmicutes
423116Maxillary SinusSwab340574819011Firmicutes
\n", "
" ], "text/plain": [ " Patient Location CollectionType Actinobacteria \\\n", "4 3116 Nasal Vestibule Swab 2 \n", "14 3116 Head of Inferior Turbinate Tissue Biopsy 19 \n", "25 3116 Middle Meatus Swab 5 \n", "36 3116 Uncinate Process Tissue Biopsy 17 \n", "42 3116 Maxillary Sinus Swab 34 \n", "\n", " Bacteroidetes Firmicutes Proteobacteria num_otu Predominant \n", "4 0 2324 1 4 Firmicutes \n", "14 0 6261 21 5 Firmicutes \n", "25 0 988 2 10 Firmicutes \n", "36 0 7249 79 12 Firmicutes \n", "42 0 5748 190 11 Firmicutes " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pat_3116.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "2909261d", "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "grader.check(\"q1_extract_single\")" ] }, { "cell_type": "markdown", "id": "d7b926b8-f0a8-4eda-8e6c-b93ce9e72660", "metadata": { "deletable": false, "editable": false, "tags": [] }, "source": [ "### Q2: Calculate the average count across regions for each phylum for patient 3116.\n" ] }, { "cell_type": "markdown", "id": "efcae78b-abf2-4971-9622-aab1e54f858c", "metadata": { "deletable": false, "editable": false, "tags": [ "remove_cell" ] }, "source": [ "| | |\n", "| --------------|----|\n", "| Points | 2 |\n", "| Public Checks | 4 |\n", "\n", "_Points:_ 2" ] }, { "cell_type": "code", "execution_count": 9, "id": "31910d19-b1c3-41aa-8ade-a84ee3359e9a", "metadata": { "tags": [ "otter_assign_solution_cell" ] }, "outputs": [], "source": [ "q2_Actinobacteria_mean = pat_3116['Actinobacteria'].mean() # SOLUTION\n", "q2_Bacteroidetes_mean = pat_3116['Bacteroidetes'].mean() # SOLUTION\n", "q2_Firmicutes_mean = pat_3116['Firmicutes'].mean() # SOLUTION\n", "q2_Proteobacteria_mean = pat_3116['Proteobacteria'].mean() # SOLUTION" ] }, { "cell_type": "code", "execution_count": null, "id": "9363e8bc", "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "grader.check(\"q2_summary_vals\")" ] }, { "cell_type": "markdown", "id": "b0daed00-5841-4e60-89f5-b656d65afa43", "metadata": {}, "source": [ "## Summarizing by grouping" ] }, { "cell_type": "markdown", "id": "c0a540be-0128-461f-8b71-a1bacf8b6110", "metadata": {}, "source": [ "Now that we've looked at the summary values for a single individual, how would we look at this for each individual?\n", "Copy-pasting that over and over is unsustainable, `DataFrame`s have useful methods for dealing with this problem." ] }, { "cell_type": "markdown", "id": "c40ddc71-0189-43f0-a529-64b743d82d25", "metadata": {}, "source": [ "All of these fall into the same basic strategy.\n", "\n", "**Split** - **Apply** - **Combine**." ] }, { "cell_type": "code", "execution_count": 14, "id": "25da0ae3-16d9-4899-8b99-0b243114db86", "metadata": {}, "outputs": [], "source": [ "# Split\n", "\n", "grouped_patients = data.groupby('Patient')" ] }, { "cell_type": "code", "execution_count": 15, "id": "35e62761-4e0a-4f68-90f9-807fc41dd6ee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
Patient
3062727.5714297.7142864513.7142862.000000
30941356.9000000.1000002384.30000079.100000
30951447.1111110.1111112753.66666747.777778
311549.8750001.3750004266.87500022.000000
311695.4166671.2500005135.500000299.750000
31171164.3333330.0000001859.77777869.000000
31181329.1250003.1250003271.125000224.375000
31192692.7272730.0000001437.6363645.727273
31201679.3333330.0000001435.55555618.888889
31211205.42857118.5714293486.14285729.714286
31231852.14285712.2857142536.428571129.285714
31241603.5454550.0000001998.54545539.272727
\n", "
" ], "text/plain": [ " Actinobacteria Bacteroidetes Firmicutes Proteobacteria\n", "Patient \n", "3062 727.571429 7.714286 4513.714286 2.000000\n", "3094 1356.900000 0.100000 2384.300000 79.100000\n", "3095 1447.111111 0.111111 2753.666667 47.777778\n", "3115 49.875000 1.375000 4266.875000 22.000000\n", "3116 95.416667 1.250000 5135.500000 299.750000\n", "3117 1164.333333 0.000000 1859.777778 69.000000\n", "3118 1329.125000 3.125000 3271.125000 224.375000\n", "3119 2692.727273 0.000000 1437.636364 5.727273\n", "3120 1679.333333 0.000000 1435.555556 18.888889\n", "3121 1205.428571 18.571429 3486.142857 29.714286\n", "3123 1852.142857 12.285714 2536.428571 129.285714\n", "3124 1603.545455 0.000000 1998.545455 39.272727" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Apply - Combine\n", "\n", "# Capitalizing constants is useful if you will re-use them often.\n", "PHYLUM_COLS = ['Actinobacteria', 'Bacteroidetes',\n", " 'Firmicutes', 'Proteobacteria']\n", "\n", "mean_vals = grouped_patients[PHYLUM_COLS].mean()\n", "mean_vals" ] }, { "cell_type": "code", "execution_count": 16, "id": "98050ce0-ea97-43ec-93be-9b5f33911545", "metadata": {}, "outputs": [], "source": [ "# This is commonly done in a single \"sentence\"\n", "\n", "mean_vals = data.groupby('Patient')[PHYLUM_COLS].mean()" ] }, { "cell_type": "markdown", "id": "b16807c5-3b87-4bbf-bc31-373f64e6fa9b", "metadata": { "deletable": false, "editable": false, "tags": [] }, "source": [ "### Q3: Calculate the average counts of each phylum by body site." ] }, { "cell_type": "markdown", "id": "8c8e7017-6eb5-403b-9149-5d321c3732ab", "metadata": { "deletable": false, "editable": false, "tags": [ "remove_cell" ] }, "source": [ "| | |\n", "| --------------|----|\n", "| Points | 2 |\n", "| Public Checks | 4 |\n", "\n", "_Points:_ 2" ] }, { "cell_type": "code", "execution_count": 17, "id": "583def59-a4d9-4e25-9882-affc9e5ca7ab", "metadata": { "tags": [ "otter_assign_solution_cell" ] }, "outputs": [], "source": [ "q3_mean_phylum_site = data.groupby('Location')[PHYLUM_COLS].mean() # SOLUTION" ] }, { "cell_type": "code", "execution_count": 18, "id": "401b5cec-9571-4762-9a48-b68e6e764ff4", "metadata": { "tags": [ "otter_assign_solution_cell" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
Location
Ethmoid Culture (Deep to Ethmoid Bulla)1483.7500003.4166672757.416667127.000000
Ethmoid Tissue (Deep to Ethmoid Bulla)1449.4444440.7777781715.55555692.111111
Head of Inferior Turbinate Tissue317.0000000.6363642165.72727324.727273
Maxillary Sinus1691.0000000.0000003002.750000122.500000
Maxillary Sinus Tissue976.6666670.0000002966.00000077.000000
Middle Meatus1158.3636360.1818183131.36363643.000000
Nasal Vestibule1161.40000012.5000004373.00000037.900000
Sphenoethmoidal Recess Tissue930.8888894.0000002019.666667175.777778
Sphenoid1974.7500005.4166673821.00000069.666667
Sphenoid Tissue1702.8000001.8000002227.600000130.000000
Superior Meatus1704.4166671.1666673547.250000127.416667
Uncinate Process Tissue868.9000001.7000002259.60000040.700000
\n", "
" ], "text/plain": [ " Actinobacteria Bacteroidetes \\\n", "Location \n", "Ethmoid Culture (Deep to Ethmoid Bulla) 1483.750000 3.416667 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) 1449.444444 0.777778 \n", "Head of Inferior Turbinate Tissue 317.000000 0.636364 \n", "Maxillary Sinus 1691.000000 0.000000 \n", "Maxillary Sinus Tissue 976.666667 0.000000 \n", "Middle Meatus 1158.363636 0.181818 \n", "Nasal Vestibule 1161.400000 12.500000 \n", "Sphenoethmoidal Recess Tissue 930.888889 4.000000 \n", "Sphenoid 1974.750000 5.416667 \n", "Sphenoid Tissue 1702.800000 1.800000 \n", "Superior Meatus 1704.416667 1.166667 \n", "Uncinate Process Tissue 868.900000 1.700000 \n", "\n", " Firmicutes Proteobacteria \n", "Location \n", "Ethmoid Culture (Deep to Ethmoid Bulla) 2757.416667 127.000000 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) 1715.555556 92.111111 \n", "Head of Inferior Turbinate Tissue 2165.727273 24.727273 \n", "Maxillary Sinus 3002.750000 122.500000 \n", "Maxillary Sinus Tissue 2966.000000 77.000000 \n", "Middle Meatus 3131.363636 43.000000 \n", "Nasal Vestibule 4373.000000 37.900000 \n", "Sphenoethmoidal Recess Tissue 2019.666667 175.777778 \n", "Sphenoid 3821.000000 69.666667 \n", "Sphenoid Tissue 2227.600000 130.000000 \n", "Superior Meatus 3547.250000 127.416667 \n", "Uncinate Process Tissue 2259.600000 40.700000 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q3_mean_phylum_site" ] }, { "cell_type": "code", "execution_count": null, "id": "17b02781", "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "grader.check(\"q3_mean_by_site\")" ] }, { "cell_type": "markdown", "id": "e9fe9c09-66d0-4697-9225-c3eb239d1be8", "metadata": {}, "source": [ "There are a number of different built-in summary functions like this." ] }, { "cell_type": "code", "execution_count": 23, "id": "af3e10e0-7e69-4f03-bde6-ebc532efcf01", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
Patient
3062461.02.02714.00.0
30941000.00.01467.541.5
30951223.00.01244.015.0
311515.00.02846.51.5
311637.00.06004.5183.5
31171108.00.01800.066.0
31181023.50.03343.0152.5
31192605.00.0988.03.0
3120468.00.01301.027.0
3121911.012.03686.013.0
31231010.00.01207.063.0
31241602.00.02133.019.0
\n", "
" ], "text/plain": [ " Actinobacteria Bacteroidetes Firmicutes Proteobacteria\n", "Patient \n", "3062 461.0 2.0 2714.0 0.0\n", "3094 1000.0 0.0 1467.5 41.5\n", "3095 1223.0 0.0 1244.0 15.0\n", "3115 15.0 0.0 2846.5 1.5\n", "3116 37.0 0.0 6004.5 183.5\n", "3117 1108.0 0.0 1800.0 66.0\n", "3118 1023.5 0.0 3343.0 152.5\n", "3119 2605.0 0.0 988.0 3.0\n", "3120 468.0 0.0 1301.0 27.0\n", "3121 911.0 12.0 3686.0 13.0\n", "3123 1010.0 0.0 1207.0 63.0\n", "3124 1602.0 0.0 2133.0 19.0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.groupby('Patient')[PHYLUM_COLS].median()" ] }, { "cell_type": "code", "execution_count": 24, "id": "cd3b9eaf-e5b8-46b0-9bd4-c1d4991c41c6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
Patient
30627777
309410101010
30959999
31158888
311612121212
31179999
31188888
311911111111
31209999
31217777
31237777
312411111111
\n", "
" ], "text/plain": [ " Actinobacteria Bacteroidetes Firmicutes Proteobacteria\n", "Patient \n", "3062 7 7 7 7\n", "3094 10 10 10 10\n", "3095 9 9 9 9\n", "3115 8 8 8 8\n", "3116 12 12 12 12\n", "3117 9 9 9 9\n", "3118 8 8 8 8\n", "3119 11 11 11 11\n", "3120 9 9 9 9\n", "3121 7 7 7 7\n", "3123 7 7 7 7\n", "3124 11 11 11 11" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.groupby('Patient')[PHYLUM_COLS].count()" ] }, { "cell_type": "code", "execution_count": 25, "id": "da9fc5c7-0bc5-4fcf-aaf8-a4e1294ddd47", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
Patient
3062251644149876
3094460415559406
30953926110097166
31152111113948108
3116340997961139
3117181203671129
31183709125253771
311975140437218
312041950265536
3121281663613578
31234186815885323
3124413803979183
\n", "
" ], "text/plain": [ " Actinobacteria Bacteroidetes Firmicutes Proteobacteria\n", "Patient \n", "3062 2516 44 14987 6\n", "3094 4604 1 5559 406\n", "3095 3926 1 10097 166\n", "3115 211 11 13948 108\n", "3116 340 9 9796 1139\n", "3117 1812 0 3671 129\n", "3118 3709 12 5253 771\n", "3119 7514 0 4372 18\n", "3120 4195 0 2655 36\n", "3121 2816 63 6135 78\n", "3123 4186 81 5885 323\n", "3124 4138 0 3979 183" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.groupby('Patient')[PHYLUM_COLS].max()" ] }, { "cell_type": "markdown", "id": "4818cb4f-1a2e-41dd-bd1f-378562dc5334", "metadata": {}, "source": [ "You can see an extensive list of available summary functions at the [Pandas Documentation](https://pandas.pydata.org/docs/reference/groupby.html#dataframegroupby-computations-descriptive-stats)" ] }, { "cell_type": "markdown", "id": "b1aae059-de5f-493f-b721-05755f85df31", "metadata": {}, "source": [ "If there isn't a function that does what you want, you can also make your own.\n", "\n", "Here is a simple one that scales the data to a _unit-norm_." ] }, { "cell_type": "code", "execution_count": 26, "id": "2b895bfe-a501-4a2b-aa2f-6cde9aa808bc", "metadata": {}, "outputs": [], "source": [ "def unit_norm(values):\n", " \"Given a series, return a scaled version\"\n", "\n", " mu = values.mean()\n", " std = values.std()\n", "\n", " return (values-mu)/std\n", "\n", "unit_normed_data = data.groupby('Patient', as_index=False)[PHYLUM_COLS].transform(unit_norm)" ] }, { "cell_type": "code", "execution_count": 27, "id": "cb03965f-cea0-4163-be47-f2aadc66041b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
02.1581092.2478762.140374-0.755929
1-0.901854-0.316228-0.515722-0.660942
20.025250-0.3333330.862714-0.303077
3-0.694809-0.3535530.287822-0.549657
4-0.790041-0.441315-0.985677-0.838082
...............
1030.131693-0.316228-0.835926-0.552316
104-0.729756-0.333333-0.641845-0.771142
1051.8063122.7361550.5516500.547734
1061.315743NaN0.1015331.390656
1070.700800NaN-0.3313811.106277
\n", "

108 rows × 4 columns

\n", "
" ], "text/plain": [ " Actinobacteria Bacteroidetes Firmicutes Proteobacteria\n", "0 2.158109 2.247876 2.140374 -0.755929\n", "1 -0.901854 -0.316228 -0.515722 -0.660942\n", "2 0.025250 -0.333333 0.862714 -0.303077\n", "3 -0.694809 -0.353553 0.287822 -0.549657\n", "4 -0.790041 -0.441315 -0.985677 -0.838082\n", ".. ... ... ... ...\n", "103 0.131693 -0.316228 -0.835926 -0.552316\n", "104 -0.729756 -0.333333 -0.641845 -0.771142\n", "105 1.806312 2.736155 0.551650 0.547734\n", "106 1.315743 NaN 0.101533 1.390656\n", "107 0.700800 NaN -0.331381 1.106277\n", "\n", "[108 rows x 4 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unit_normed_data" ] }, { "cell_type": "markdown", "id": "dd5f9c60-0117-48d7-a9f7-9fea10b7bab7", "metadata": {}, "source": [ "Notice I used the `transform` method here instead of a common name.\n", "\n", "When applying custom functions to groups of data there are three different methods depending on your final output shape:\n", "\n", "* `.aggregate()` or `.agg()` - Each group of data produces a single summary number. Commonly used to summarize groups.\n", "* `.transform()` - The output will have the same number (and order) of rows as the input. Commonly used for normalizations.\n", "* `.apply()` - Everything else." ] }, { "cell_type": "markdown", "id": "4c6af90d-3cda-4b11-b1ee-2fdfd22d0deb", "metadata": {}, "source": [ "## Merging data" ] }, { "cell_type": "markdown", "id": "8426fa25-eb2a-4df6-a8e4-b3e77e608a8c", "metadata": {}, "source": [ "Now we come to a common problem, our sample information is in a different file." ] }, { "cell_type": "code", "execution_count": 28, "id": "28e4b9a2-92ec-4593-a3a5-fc13acbec918", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PIDsevere_diseasedisease_type
03062Falsepersistent
13094Falsepersistent
23095Falsepersistent
33115Truetypical
43116Truetypical
\n", "
" ], "text/plain": [ " PID severe_disease disease_type\n", "0 3062 False persistent\n", "1 3094 False persistent\n", "2 3095 False persistent\n", "3 3115 True typical\n", "4 3116 True typical" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_info = pd.read_csv('sample_info.csv')\n", "sample_info.head()" ] }, { "cell_type": "markdown", "id": "c80f1c6d-1239-47a3-85c1-1a5780a877de", "metadata": {}, "source": [ "Now that we have two `DataFrame`s with a common key we can use `pd.merge`." ] }, { "cell_type": "code", "execution_count": 29, "id": "e7d3d076-7ee7-4a27-a01f-2ad2a514c80b", "metadata": {}, "outputs": [], "source": [ "merged_info = pd.merge(data, sample_info,\n", " left_on = 'Patient', # The column of the key in biome_data\n", " right_on = 'PID', # The column of the key in sample_info\n", " how = 'inner') # Keep only those in both" ] }, { "cell_type": "code", "execution_count": 30, "id": "f27c0943-d872-4f19-9297-79da523f3ea1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PatientLocationCollectionTypeActinobacteriaBacteroidetesFirmicutesProteobacterianum_otuPredominantPIDsevere_diseasedisease_type
03062Nasal VestibuleSwab25164414987016Firmicutes3062Falsepersistent
13094Nasal VestibuleSwab10301397015Firmicutes3094Falsepersistent
23095Nasal VestibuleSwab1474055102921Firmicutes3095Falsepersistent
33115Nasal VestibuleSwab00548002Firmicutes3115Truetypical
43116Nasal VestibuleSwab20232414Firmicutes3116Truetypical
\n", "
" ], "text/plain": [ " Patient Location CollectionType Actinobacteria Bacteroidetes \\\n", "0 3062 Nasal Vestibule Swab 2516 44 \n", "1 3094 Nasal Vestibule Swab 103 0 \n", "2 3095 Nasal Vestibule Swab 1474 0 \n", "3 3115 Nasal Vestibule Swab 0 0 \n", "4 3116 Nasal Vestibule Swab 2 0 \n", "\n", " Firmicutes Proteobacteria num_otu Predominant PID severe_disease \\\n", "0 14987 0 16 Firmicutes 3062 False \n", "1 1397 0 15 Firmicutes 3094 False \n", "2 5510 29 21 Firmicutes 3095 False \n", "3 5480 0 2 Firmicutes 3115 True \n", "4 2324 1 4 Firmicutes 3116 True \n", "\n", " disease_type \n", "0 persistent \n", "1 persistent \n", "2 persistent \n", "3 typical \n", "4 typical " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_info.head()" ] }, { "cell_type": "markdown", "id": "f1115bb7-2c0b-4459-924f-ad6a03ab5070", "metadata": { "deletable": false, "editable": false, "tags": [] }, "source": [ "### Q4: Calculate the average counts of each phylum by `severe_disease`." ] }, { "cell_type": "markdown", "id": "c19fe0c8-5b08-4ed5-8351-116f3f4aa6ff", "metadata": { "deletable": false, "editable": false, "tags": [ "remove_cell" ] }, "source": [ "| | |\n", "| --------------|----|\n", "| Points | 2 |\n", "| Public Checks | 4 |\n", "\n", "_Points:_ 2" ] }, { "cell_type": "code", "execution_count": 31, "id": "68d4d352-8e48-4056-b530-d32c08f3723d", "metadata": { "tags": [ "otter_assign_solution_cell" ] }, "outputs": [], "source": [ "q4_severe_means = merged_info.groupby('severe_disease')[PHYLUM_COLS].mean() # SOLUTION" ] }, { "cell_type": "code", "execution_count": null, "id": "5753eb93", "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "grader.check(\"q4_servere\")" ] }, { "cell_type": "markdown", "id": "f73cb52b-2fc8-495e-bfeb-d3e276f77b2b", "metadata": {}, "source": [ "We can also do more advanced things like this:" ] }, { "cell_type": "code", "execution_count": 36, "id": "e0575e0d-abb5-4aeb-9564-984eab815de7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ActinobacteriaBacteroidetesFirmicutesProteobacteria
meanstdmeanstdmeanstdmeanstd
Locationsevere_disease
Ethmoid Culture (Deep to Ethmoid Bulla)False1382.714286875.2454354.14285710.1065752053.1428571314.67098931.57142932.444091
True1625.2000001698.9047062.4000005.3665633743.4000003394.971988260.600000492.952127
Ethmoid Tissue (Deep to Ethmoid Bulla)False1689.4000001065.4305700.0000000.0000001232.200000722.25840391.40000064.103822
True1149.5000001066.2996761.7500003.5000002319.7500002767.63272893.000000125.078642
Head of Inferior Turbinate TissueFalse394.833333282.1109121.1666672.4013881123.333333343.32239519.50000026.883080
True223.600000219.1273600.0000000.0000003416.6000002021.52351031.00000042.023803
Maxillary SinusFalse2539.5000002260.6203790.0000000.0000002999.0000001224.708945150.000000193.747258
True842.5000001143.3916650.0000000.0000003006.5000003877.06648195.000000134.350288
Maxillary Sinus TissueFalse265.000000NaN0.000000NaN448.000000NaN17.000000NaN
True1332.5000001827.8710290.0000000.0000004225.0000005392.396313107.000000147.078210
Middle MeatusFalse1171.000000747.5485270.3333330.8164972916.166667940.97362719.66666718.062853
True1143.2000001343.7760970.0000000.0000003389.6000001624.75545971.00000097.739450
Nasal VestibuleFalse1598.0000001496.13981117.85714332.3131994987.0000004857.27238352.714286119.600326
True142.666667245.3759020.0000000.0000002940.3333332294.4481543.3333334.932883
Sphenoethmoidal Recess TissueFalse1286.0000001863.7613856.20000011.2782981723.6000002203.66202095.000000174.608133
True487.000000427.8340801.2500002.5000002389.7500001640.061660276.750000382.792698
SphenoidFalse1953.4285711362.8659589.14285723.7516923441.0000002253.82733735.14285736.503098
True2004.6000003145.7789660.2000000.4472144353.0000005425.135298118.000000124.715677
Sphenoid TissueFalse1105.000000615.1829000.0000000.000000743.50000057.2756496.5000009.192388
True2101.3333331953.1396093.0000005.1961523217.0000003062.488694212.333333251.573714
Superior MeatusFalse2235.0000001402.0611022.0000004.4721364252.2857142796.54201475.00000069.622793
True961.600000847.0568460.0000000.0000002560.2000001857.766186200.800000326.720370
Uncinate Process TissueFalse766.400000718.5003830.0000000.0000001440.200000291.23221018.40000017.700282
True971.4000001580.3247453.4000004.9799603079.0000002892.90165863.00000060.930288
\n", "
" ], "text/plain": [ " Actinobacteria \\\n", " mean \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 1382.714286 \n", " True 1625.200000 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 1689.400000 \n", " True 1149.500000 \n", "Head of Inferior Turbinate Tissue False 394.833333 \n", " True 223.600000 \n", "Maxillary Sinus False 2539.500000 \n", " True 842.500000 \n", "Maxillary Sinus Tissue False 265.000000 \n", " True 1332.500000 \n", "Middle Meatus False 1171.000000 \n", " True 1143.200000 \n", "Nasal Vestibule False 1598.000000 \n", " True 142.666667 \n", "Sphenoethmoidal Recess Tissue False 1286.000000 \n", " True 487.000000 \n", "Sphenoid False 1953.428571 \n", " True 2004.600000 \n", "Sphenoid Tissue False 1105.000000 \n", " True 2101.333333 \n", "Superior Meatus False 2235.000000 \n", " True 961.600000 \n", "Uncinate Process Tissue False 766.400000 \n", " True 971.400000 \n", "\n", " \\\n", " std \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 875.245435 \n", " True 1698.904706 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 1065.430570 \n", " True 1066.299676 \n", "Head of Inferior Turbinate Tissue False 282.110912 \n", " True 219.127360 \n", "Maxillary Sinus False 2260.620379 \n", " True 1143.391665 \n", "Maxillary Sinus Tissue False NaN \n", " True 1827.871029 \n", "Middle Meatus False 747.548527 \n", " True 1343.776097 \n", "Nasal Vestibule False 1496.139811 \n", " True 245.375902 \n", "Sphenoethmoidal Recess Tissue False 1863.761385 \n", " True 427.834080 \n", "Sphenoid False 1362.865958 \n", " True 3145.778966 \n", "Sphenoid Tissue False 615.182900 \n", " True 1953.139609 \n", "Superior Meatus False 1402.061102 \n", " True 847.056846 \n", "Uncinate Process Tissue False 718.500383 \n", " True 1580.324745 \n", "\n", " Bacteroidetes \\\n", " mean \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 4.142857 \n", " True 2.400000 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 0.000000 \n", " True 1.750000 \n", "Head of Inferior Turbinate Tissue False 1.166667 \n", " True 0.000000 \n", "Maxillary Sinus False 0.000000 \n", " True 0.000000 \n", "Maxillary Sinus Tissue False 0.000000 \n", " True 0.000000 \n", "Middle Meatus False 0.333333 \n", " True 0.000000 \n", "Nasal Vestibule False 17.857143 \n", " True 0.000000 \n", "Sphenoethmoidal Recess Tissue False 6.200000 \n", " True 1.250000 \n", "Sphenoid False 9.142857 \n", " True 0.200000 \n", "Sphenoid Tissue False 0.000000 \n", " True 3.000000 \n", "Superior Meatus False 2.000000 \n", " True 0.000000 \n", "Uncinate Process Tissue False 0.000000 \n", " True 3.400000 \n", "\n", " \\\n", " std \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 10.106575 \n", " True 5.366563 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 0.000000 \n", " True 3.500000 \n", "Head of Inferior Turbinate Tissue False 2.401388 \n", " True 0.000000 \n", "Maxillary Sinus False 0.000000 \n", " True 0.000000 \n", "Maxillary Sinus Tissue False NaN \n", " True 0.000000 \n", "Middle Meatus False 0.816497 \n", " True 0.000000 \n", "Nasal Vestibule False 32.313199 \n", " True 0.000000 \n", "Sphenoethmoidal Recess Tissue False 11.278298 \n", " True 2.500000 \n", "Sphenoid False 23.751692 \n", " True 0.447214 \n", "Sphenoid Tissue False 0.000000 \n", " True 5.196152 \n", "Superior Meatus False 4.472136 \n", " True 0.000000 \n", "Uncinate Process Tissue False 0.000000 \n", " True 4.979960 \n", "\n", " Firmicutes \\\n", " mean \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 2053.142857 \n", " True 3743.400000 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 1232.200000 \n", " True 2319.750000 \n", "Head of Inferior Turbinate Tissue False 1123.333333 \n", " True 3416.600000 \n", "Maxillary Sinus False 2999.000000 \n", " True 3006.500000 \n", "Maxillary Sinus Tissue False 448.000000 \n", " True 4225.000000 \n", "Middle Meatus False 2916.166667 \n", " True 3389.600000 \n", "Nasal Vestibule False 4987.000000 \n", " True 2940.333333 \n", "Sphenoethmoidal Recess Tissue False 1723.600000 \n", " True 2389.750000 \n", "Sphenoid False 3441.000000 \n", " True 4353.000000 \n", "Sphenoid Tissue False 743.500000 \n", " True 3217.000000 \n", "Superior Meatus False 4252.285714 \n", " True 2560.200000 \n", "Uncinate Process Tissue False 1440.200000 \n", " True 3079.000000 \n", "\n", " \\\n", " std \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 1314.670989 \n", " True 3394.971988 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 722.258403 \n", " True 2767.632728 \n", "Head of Inferior Turbinate Tissue False 343.322395 \n", " True 2021.523510 \n", "Maxillary Sinus False 1224.708945 \n", " True 3877.066481 \n", "Maxillary Sinus Tissue False NaN \n", " True 5392.396313 \n", "Middle Meatus False 940.973627 \n", " True 1624.755459 \n", "Nasal Vestibule False 4857.272383 \n", " True 2294.448154 \n", "Sphenoethmoidal Recess Tissue False 2203.662020 \n", " True 1640.061660 \n", "Sphenoid False 2253.827337 \n", " True 5425.135298 \n", "Sphenoid Tissue False 57.275649 \n", " True 3062.488694 \n", "Superior Meatus False 2796.542014 \n", " True 1857.766186 \n", "Uncinate Process Tissue False 291.232210 \n", " True 2892.901658 \n", "\n", " Proteobacteria \\\n", " mean \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 31.571429 \n", " True 260.600000 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 91.400000 \n", " True 93.000000 \n", "Head of Inferior Turbinate Tissue False 19.500000 \n", " True 31.000000 \n", "Maxillary Sinus False 150.000000 \n", " True 95.000000 \n", "Maxillary Sinus Tissue False 17.000000 \n", " True 107.000000 \n", "Middle Meatus False 19.666667 \n", " True 71.000000 \n", "Nasal Vestibule False 52.714286 \n", " True 3.333333 \n", "Sphenoethmoidal Recess Tissue False 95.000000 \n", " True 276.750000 \n", "Sphenoid False 35.142857 \n", " True 118.000000 \n", "Sphenoid Tissue False 6.500000 \n", " True 212.333333 \n", "Superior Meatus False 75.000000 \n", " True 200.800000 \n", "Uncinate Process Tissue False 18.400000 \n", " True 63.000000 \n", "\n", " \n", " std \n", "Location severe_disease \n", "Ethmoid Culture (Deep to Ethmoid Bulla) False 32.444091 \n", " True 492.952127 \n", "Ethmoid Tissue (Deep to Ethmoid Bulla) False 64.103822 \n", " True 125.078642 \n", "Head of Inferior Turbinate Tissue False 26.883080 \n", " True 42.023803 \n", "Maxillary Sinus False 193.747258 \n", " True 134.350288 \n", "Maxillary Sinus Tissue False NaN \n", " True 147.078210 \n", "Middle Meatus False 18.062853 \n", " True 97.739450 \n", "Nasal Vestibule False 119.600326 \n", " True 4.932883 \n", "Sphenoethmoidal Recess Tissue False 174.608133 \n", " True 382.792698 \n", "Sphenoid False 36.503098 \n", " True 124.715677 \n", "Sphenoid Tissue False 9.192388 \n", " True 251.573714 \n", "Superior Meatus False 69.622793 \n", " True 326.720370 \n", "Uncinate Process Tissue False 17.700282 \n", " True 60.930288 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_info.groupby(['Location', 'severe_disease'])[PHYLUM_COLS].aggregate(['mean', 'std'])" ] }, { "cell_type": "markdown", "id": "84c7fdf9-63ab-49e9-a5da-315c6e43d4ad", "metadata": {}, "source": [ "Here I've broken things down by body-site and disease status and calculated both a mean and standard deviation.\n", "In future lectures we will explore how to quantify this with a significance test.\n", "For now, we'll leave it as a visual comparison." ] }, { "cell_type": "markdown", "id": "0a0eb1a1-059a-4ea3-9c4f-919843039953", "metadata": {}, "source": [ "## Pivoting & Melting Dataframes" ] }, { "cell_type": "markdown", "id": "b43776b4-b213-4ab3-8ad9-cf45b28ff32b", "metadata": {}, "source": [ "This is a process of reshaping, and optionally summarizing, your data as you convert it between `wide` and `long` format.\n", "These techniques are often required for generating different types of plots.\n", "\n", "These are best shown by example." ] }, { "cell_type": "markdown", "id": "4ca25ca9-7923-4537-91f4-75b4e8caf6a1", "metadata": {}, "source": [ "### Pivoting\n", "\n", "`long` -> `wide`" ] }, { "cell_type": "code", "execution_count": 37, "id": "c9208e7f-e76c-43a1-9e4b-719899647973", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LocationEthmoid Culture (Deep to Ethmoid Bulla)Ethmoid Tissue (Deep to Ethmoid Bulla)Head of Inferior Turbinate TissueMaxillary SinusMaxillary Sinus TissueMiddle MeatusNasal VestibuleSphenoethmoidal Recess TissueSphenoidSphenoid TissueSuperior MeatusUncinate Process Tissue
Patient
30622714.0NaN1746.0NaNNaN1822.014987.0NaN6125.0NaN2992.01210.0
3094385.0664.01089.0NaNNaN4316.01397.05559.04648.0784.03463.01538.0
30952300.0651.0760.0NaNNaN2999.05510.0NaN519.0703.010097.01244.0
31152168.0NaN4640.0NaNNaN3525.05480.01861.013948.0NaN982.01531.0
31169796.06357.06261.05748.08038.0988.02324.03673.0617.06709.03866.07249.0
31171800.0672.01499.0NaNNaN2810.01017.0NaN2082.01954.03671.01233.0
31182459.0380.02962.0NaNNaN5253.0NaN3724.02322.0NaN4144.04925.0
31192494.01870.01721.0265.0412.04372.0NaN301.02796.0988.0138.0457.0
3120905.01614.01136.0NaNNaN2342.0515.0415.02655.0NaN2037.01301.0
31213060.0NaN863.0NaNNaN3686.04295.01624.06135.0NaN4740.0NaN
31231029.0914.0NaN3865.0NaNNaN5885.0358.01207.0NaN4497.0NaN
31243979.02318.01146.02133.0448.02332.02320.0662.02798.0NaN1940.01908.0
\n", "
" ], "text/plain": [ "Location Ethmoid Culture (Deep to Ethmoid Bulla) \\\n", "Patient \n", "3062 2714.0 \n", "3094 385.0 \n", "3095 2300.0 \n", "3115 2168.0 \n", "3116 9796.0 \n", "3117 1800.0 \n", "3118 2459.0 \n", "3119 2494.0 \n", "3120 905.0 \n", "3121 3060.0 \n", "3123 1029.0 \n", "3124 3979.0 \n", "\n", "Location Ethmoid Tissue (Deep to Ethmoid Bulla) \\\n", "Patient \n", "3062 NaN \n", "3094 664.0 \n", "3095 651.0 \n", "3115 NaN \n", "3116 6357.0 \n", "3117 672.0 \n", "3118 380.0 \n", "3119 1870.0 \n", "3120 1614.0 \n", "3121 NaN \n", "3123 914.0 \n", "3124 2318.0 \n", "\n", "Location Head of Inferior Turbinate Tissue Maxillary Sinus \\\n", "Patient \n", "3062 1746.0 NaN \n", "3094 1089.0 NaN \n", "3095 760.0 NaN \n", "3115 4640.0 NaN \n", "3116 6261.0 5748.0 \n", "3117 1499.0 NaN \n", "3118 2962.0 NaN \n", "3119 1721.0 265.0 \n", "3120 1136.0 NaN \n", "3121 863.0 NaN \n", "3123 NaN 3865.0 \n", "3124 1146.0 2133.0 \n", "\n", "Location Maxillary Sinus Tissue Middle Meatus Nasal Vestibule \\\n", "Patient \n", "3062 NaN 1822.0 14987.0 \n", "3094 NaN 4316.0 1397.0 \n", "3095 NaN 2999.0 5510.0 \n", "3115 NaN 3525.0 5480.0 \n", "3116 8038.0 988.0 2324.0 \n", "3117 NaN 2810.0 1017.0 \n", "3118 NaN 5253.0 NaN \n", "3119 412.0 4372.0 NaN \n", "3120 NaN 2342.0 515.0 \n", "3121 NaN 3686.0 4295.0 \n", "3123 NaN NaN 5885.0 \n", "3124 448.0 2332.0 2320.0 \n", "\n", "Location Sphenoethmoidal Recess Tissue Sphenoid Sphenoid Tissue \\\n", "Patient \n", "3062 NaN 6125.0 NaN \n", "3094 5559.0 4648.0 784.0 \n", "3095 NaN 519.0 703.0 \n", "3115 1861.0 13948.0 NaN \n", "3116 3673.0 617.0 6709.0 \n", "3117 NaN 2082.0 1954.0 \n", "3118 3724.0 2322.0 NaN \n", "3119 301.0 2796.0 988.0 \n", "3120 415.0 2655.0 NaN \n", "3121 1624.0 6135.0 NaN \n", "3123 358.0 1207.0 NaN \n", "3124 662.0 2798.0 NaN \n", "\n", "Location Superior Meatus Uncinate Process Tissue \n", "Patient \n", "3062 2992.0 1210.0 \n", "3094 3463.0 1538.0 \n", "3095 10097.0 1244.0 \n", "3115 982.0 1531.0 \n", "3116 3866.0 7249.0 \n", "3117 3671.0 1233.0 \n", "3118 4144.0 4925.0 \n", "3119 138.0 457.0 \n", "3120 2037.0 1301.0 \n", "3121 4740.0 NaN \n", "3123 4497.0 NaN \n", "3124 1940.0 1908.0 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(merged_info,\n", " index = 'Patient',\n", " columns = 'Location',\n", " values = 'Firmicutes',\n", " aggfunc = 'mean')" ] }, { "cell_type": "markdown", "id": "a61770b4-e85b-4a85-a201-4439249574bd", "metadata": {}, "source": [ "This took our \"long\" data format in which each row represented the observation at a different site of a different person\n", "and converted it into a \"wide\" data format such that each row is a patient and each column is a number of Firmicutes at a location.\n", "`NaNs` represent missing information.\n", "\n", "We also had to \"give up\" some information for this transformation ... this is only Firmicutes.\n", "\n", "One _can_ do this to include more information:" ] }, { "cell_type": "code", "execution_count": 38, "id": "a2feb15c-4310-4c1e-82fe-1bd7bd6b986a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Actinobacteria...Firmicutes
LocationEthmoid Culture (Deep to Ethmoid Bulla)Ethmoid Tissue (Deep to Ethmoid Bulla)Head of Inferior Turbinate TissueMaxillary SinusMaxillary Sinus TissueMiddle MeatusNasal VestibuleSphenoethmoidal Recess TissueSphenoidSphenoid Tissue...Head of Inferior Turbinate TissueMaxillary SinusMaxillary Sinus TissueMiddle MeatusNasal VestibuleSphenoethmoidal Recess TissueSphenoidSphenoid TissueSuperior MeatusUncinate Process Tissue
Patient
3062461.0NaN232.0NaNNaN292.02516.0NaN623.0NaN...1746.0NaNNaN1822.014987.0NaN6125.0NaN2992.01210.0
3094243.0638.0282.0NaNNaN1486.0103.04604.02690.01540.0...1089.0NaNNaN4316.01397.05559.04648.0784.03463.01538.0
30951652.01038.0444.0NaNNaN2010.01474.0NaN587.0670.0...760.0NaNNaN2999.05510.0NaN519.0703.010097.01244.0
311580.0NaN1.0NaNNaN13.00.0211.017.0NaN...4640.0NaNNaN3525.05480.01861.013948.0NaN982.01531.0
3116340.061.019.034.040.05.02.0181.032.0309.0...6261.05748.08038.0988.02324.03673.0617.06709.03866.07249.0
31171546.01108.0521.0NaNNaN965.0426.0NaN1537.01812.0...1499.0NaNNaN2810.01017.0NaN2082.01954.03671.01233.0
31181802.0824.0332.0NaNNaN1465.0NaN454.0923.0NaN...2962.0NaNNaN5253.0NaN3724.02322.0NaN4144.04925.0
31194358.02605.0245.01651.02625.03268.0NaN1102.07514.04183.0...1721.0265.0412.04372.0NaN301.02796.0988.0138.0457.0
31202740.03265.068.0NaNNaN405.0107.0468.04195.0NaN...1136.0NaNNaN2342.0515.0415.02655.0NaN2037.01301.0
31211971.0NaN452.0NaNNaN911.0582.0399.02816.0NaN...863.0NaNNaN3686.04295.01624.06135.0NaN4740.0NaN
31231010.02263.0NaN941.0NaNNaN4186.0226.0861.0NaN...NaN3865.0NaNNaN5885.0358.01207.0NaN4497.0NaN
31241602.01243.0891.04138.0265.01922.02218.0733.01902.0NaN...1146.02133.0448.02332.02320.0662.02798.0NaN1940.01908.0
\n", "

12 rows × 24 columns

\n", "
" ], "text/plain": [ " Actinobacteria \\\n", "Location Ethmoid Culture (Deep to Ethmoid Bulla) \n", "Patient \n", "3062 461.0 \n", "3094 243.0 \n", "3095 1652.0 \n", "3115 80.0 \n", "3116 340.0 \n", "3117 1546.0 \n", "3118 1802.0 \n", "3119 4358.0 \n", "3120 2740.0 \n", "3121 1971.0 \n", "3123 1010.0 \n", "3124 1602.0 \n", "\n", " \\\n", "Location Ethmoid Tissue (Deep to Ethmoid Bulla) \n", "Patient \n", "3062 NaN \n", "3094 638.0 \n", "3095 1038.0 \n", "3115 NaN \n", "3116 61.0 \n", "3117 1108.0 \n", "3118 824.0 \n", "3119 2605.0 \n", "3120 3265.0 \n", "3121 NaN \n", "3123 2263.0 \n", "3124 1243.0 \n", "\n", " \\\n", "Location Head of Inferior Turbinate Tissue Maxillary Sinus \n", "Patient \n", "3062 232.0 NaN \n", "3094 282.0 NaN \n", "3095 444.0 NaN \n", "3115 1.0 NaN \n", "3116 19.0 34.0 \n", "3117 521.0 NaN \n", "3118 332.0 NaN \n", "3119 245.0 1651.0 \n", "3120 68.0 NaN \n", "3121 452.0 NaN \n", "3123 NaN 941.0 \n", "3124 891.0 4138.0 \n", "\n", " \\\n", "Location Maxillary Sinus Tissue Middle Meatus Nasal Vestibule \n", "Patient \n", "3062 NaN 292.0 2516.0 \n", "3094 NaN 1486.0 103.0 \n", "3095 NaN 2010.0 1474.0 \n", "3115 NaN 13.0 0.0 \n", "3116 40.0 5.0 2.0 \n", "3117 NaN 965.0 426.0 \n", "3118 NaN 1465.0 NaN \n", "3119 2625.0 3268.0 NaN \n", "3120 NaN 405.0 107.0 \n", "3121 NaN 911.0 582.0 \n", "3123 NaN NaN 4186.0 \n", "3124 265.0 1922.0 2218.0 \n", "\n", " ... \\\n", "Location Sphenoethmoidal Recess Tissue Sphenoid Sphenoid Tissue ... \n", "Patient ... \n", "3062 NaN 623.0 NaN ... \n", "3094 4604.0 2690.0 1540.0 ... \n", "3095 NaN 587.0 670.0 ... \n", "3115 211.0 17.0 NaN ... \n", "3116 181.0 32.0 309.0 ... \n", "3117 NaN 1537.0 1812.0 ... \n", "3118 454.0 923.0 NaN ... \n", "3119 1102.0 7514.0 4183.0 ... \n", "3120 468.0 4195.0 NaN ... \n", "3121 399.0 2816.0 NaN ... \n", "3123 226.0 861.0 NaN ... \n", "3124 733.0 1902.0 NaN ... \n", "\n", " Firmicutes \\\n", "Location Head of Inferior Turbinate Tissue Maxillary Sinus \n", "Patient \n", "3062 1746.0 NaN \n", "3094 1089.0 NaN \n", "3095 760.0 NaN \n", "3115 4640.0 NaN \n", "3116 6261.0 5748.0 \n", "3117 1499.0 NaN \n", "3118 2962.0 NaN \n", "3119 1721.0 265.0 \n", "3120 1136.0 NaN \n", "3121 863.0 NaN \n", "3123 NaN 3865.0 \n", "3124 1146.0 2133.0 \n", "\n", " \\\n", "Location Maxillary Sinus Tissue Middle Meatus Nasal Vestibule \n", "Patient \n", "3062 NaN 1822.0 14987.0 \n", "3094 NaN 4316.0 1397.0 \n", "3095 NaN 2999.0 5510.0 \n", "3115 NaN 3525.0 5480.0 \n", "3116 8038.0 988.0 2324.0 \n", "3117 NaN 2810.0 1017.0 \n", "3118 NaN 5253.0 NaN \n", "3119 412.0 4372.0 NaN \n", "3120 NaN 2342.0 515.0 \n", "3121 NaN 3686.0 4295.0 \n", "3123 NaN NaN 5885.0 \n", "3124 448.0 2332.0 2320.0 \n", "\n", " \\\n", "Location Sphenoethmoidal Recess Tissue Sphenoid Sphenoid Tissue \n", "Patient \n", "3062 NaN 6125.0 NaN \n", "3094 5559.0 4648.0 784.0 \n", "3095 NaN 519.0 703.0 \n", "3115 1861.0 13948.0 NaN \n", "3116 3673.0 617.0 6709.0 \n", "3117 NaN 2082.0 1954.0 \n", "3118 3724.0 2322.0 NaN \n", "3119 301.0 2796.0 988.0 \n", "3120 415.0 2655.0 NaN \n", "3121 1624.0 6135.0 NaN \n", "3123 358.0 1207.0 NaN \n", "3124 662.0 2798.0 NaN \n", "\n", " \n", "Location Superior Meatus Uncinate Process Tissue \n", "Patient \n", "3062 2992.0 1210.0 \n", "3094 3463.0 1538.0 \n", "3095 10097.0 1244.0 \n", "3115 982.0 1531.0 \n", "3116 3866.0 7249.0 \n", "3117 3671.0 1233.0 \n", "3118 4144.0 4925.0 \n", "3119 138.0 457.0 \n", "3120 2037.0 1301.0 \n", "3121 4740.0 NaN \n", "3123 4497.0 NaN \n", "3124 1940.0 1908.0 \n", "\n", "[12 rows x 24 columns]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(merged_info,\n", " index = 'Patient',\n", " columns = 'Location',\n", " values = ['Actinobacteria', 'Firmicutes'],\n", " aggfunc = 'mean')" ] }, { "cell_type": "markdown", "id": "b8628793-0b10-471e-8a7f-762262f5c246", "metadata": {}, "source": [ "But that is usually not a great idea." ] }, { "cell_type": "markdown", "id": "c88742c1-0bb8-40f0-a9c4-7d32adb12bd0", "metadata": {}, "source": [ "### Melting\n", "\n", "`wide` -> `long`" ] }, { "cell_type": "markdown", "id": "e567e498-ef44-417c-9df2-7a32b23c8e55", "metadata": {}, "source": [ "Our data is part `long` and part `wide` (like most real datasets).\n", "In some plotting instances we may want to make it \"longer\" by having each bacteria be a diffent row instead of a different column." ] }, { "cell_type": "code", "execution_count": 39, "id": "3fd87ac3-eb9b-4cf3-8040-8e71c54cb2e6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PatientLocationPhylumCounts
03062Nasal VestibuleActinobacteria2516
13094Nasal VestibuleActinobacteria103
23095Nasal VestibuleActinobacteria1474
33115Nasal VestibuleActinobacteria0
43116Nasal VestibuleActinobacteria2
...............
4273094Sphenoid TissueProteobacteria13
4283095Sphenoid TissueProteobacteria0
4293116Sphenoid TissueProteobacteria495
4303117Sphenoid TissueProteobacteria129
4313119Sphenoid TissueProteobacteria13
\n", "

432 rows × 4 columns

\n", "
" ], "text/plain": [ " Patient Location Phylum Counts\n", "0 3062 Nasal Vestibule Actinobacteria 2516\n", "1 3094 Nasal Vestibule Actinobacteria 103\n", "2 3095 Nasal Vestibule Actinobacteria 1474\n", "3 3115 Nasal Vestibule Actinobacteria 0\n", "4 3116 Nasal Vestibule Actinobacteria 2\n", ".. ... ... ... ...\n", "427 3094 Sphenoid Tissue Proteobacteria 13\n", "428 3095 Sphenoid Tissue Proteobacteria 0\n", "429 3116 Sphenoid Tissue Proteobacteria 495\n", "430 3117 Sphenoid Tissue Proteobacteria 129\n", "431 3119 Sphenoid Tissue Proteobacteria 13\n", "\n", "[432 rows x 4 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.melt(merged_info,\n", " id_vars = ['Patient', 'Location'], # The things you want preserved in each row\n", " value_vars = PHYLUM_COLS, # The columns you want to melt\n", " var_name = 'Phylum', # The name of the column that will have the value_var name\n", " value_name = 'Counts') # The name of the column that will have the value" ] }, { "cell_type": "markdown", "id": "d3bfc3fc-8dca-4ac4-b461-9abc33d6550f", "metadata": {}, "source": [ "We'll explore these in more detail as we move into plotting next week.\n", "This is just a taste." ] }, { "cell_type": "markdown", "id": "76698471", "metadata": {}, "source": [ "---------------------------------------------" ] }, { "cell_type": "markdown", "id": "c8be12ff-12a6-4cef-8cdc-c4ff0962bd4d", "metadata": { "tags": [ "remove_cell" ] }, "source": [ "## Submission\n", "\n", "You do not need to submit this walkthrough notebook.\n", "Simply complete the quiz." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "otter": { "assignment_name": "Module04_walkthrough" } }, "nbformat": 4, "nbformat_minor": 5 }