Lab#
Learning Objectives#
At the end of this learning activity you will be able to:
Practice creating statistical figures to answer biological questions.
Practice writing figure legends for statistical figures.
Practice writing descriptive reasonings about a figure.
Note: It is difficult to automatically grade figures as they are many “correct” answers. So, most questions will accept any figure or axis and then ask you to answer a question that should be obvious from a properly generated figure. For all questions, assume a 95% confidence interval.
Use this lab as an opportunity to explore the different plot-types that Seaborn can make. Check out https://seaborn.pydata.org/examples/index.html for ideas.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('cytokine_data.csv')
data.head()
Explore the effect of cocaine use on mcp1#
Q1: Do cocaine users have a higher level of expression of mcp1?#
Generate a plot which displays the spread of mcp1 measurements for each user, split by cocaine use.
This can be boxplot, stripplot, hexplot, violin plot, or a number of others.
Use a markdown cell to write a figure legend. At a minimum:
Single declaritive sentence stating the conclusion of the figure.
Each axis must be described.
Every color, line, and dot must be described.
Error bars must be described.
Then answer:
Do cocaine users or non-users have a higher levels of mcp1?
Use a markdown cell to justify your answer using your figure.
Checked variables:
q1_ax- A matplotlib Axes object with a plot showing mcp1 distribution by cocaine useShould be a boxplot, violin plot, or similar showing spread of data
Markdown cell with figure caption and answer
Hint
Use seaborn's boxplot or violinplot with x='cocaine_use' and y='mcp1'. Create a figure caption describing what you see. Compare the distributions between cocaine users and non-users. See Module 6 walkthrough for data visualization examples.Points  | 
5  | 
Public Checks  | 
2  | 
Hidden Tests  | 
1  | 
Points: 5
# Make your plot here
q1_plot = ...
# Create a markdown cell after this to write a figure legend.
# Do cocaine users or non-users have a higher levels of mcp1?
# Answer 'users' or 'non-users', 'same'
q1_higher_level = ...
# Use the cell below to write your justification given the figure you presented.
grader.check("q1_cocaine_use_spread")
Q2: Do cocaine users or non-users have a higher average level of mcp1?#
Generate a plot which displays the confidence of the mean of mcp1 expression across cocaine use.
Then, write a figure legend that at a minimum contains:
Single declaritive sentence stating the conclusion of the figure.
Each axis must be described.
Every color, line, and dot must be described.
Error bars must be described.
Then, use that figure to answer whether cocaine users or non-users have a higher average level of mcp1?
Include a markdown cell that justifies your answer given the figure you presented.
Checked variables:
q2_ax- A matplotlib Axes object showing mean mcp1 with confidence intervals by cocaine useShould show error bars (e.g., using barplot with error bars or pointplot)
Markdown cell with figure caption and answer
Hint
Use seaborn's barplot (shows mean + CI by default) or pointplot with x='cocaine_use' and y='mcp1'. Write a caption explaining the means and confidence intervals. Determine which group has higher average. See Module 6 walkthrough for examples with confidence intervals.Points  | 
5  | 
Public Checks  | 
2  | 
Hidden Tests  | 
1  | 
Points: 5
# Generate a plot which displays the confidence of the mean of mcp1 expression across cocaine use
q2_plot = ...
# Do cocaine users or non-users have a higher average level of mcp1
# Answer 'users' or 'non-users', 'same'
q2_higher_mean = ...
# Make a cell below this to explain your reasoning based on the figure.
grader.check("q2_cocaine_use_mean")
Q3: Does Sex impact the effect of cocaine use on the average level of mcp1 expression?#
Generate a plot which displays the confidence of the mean of mcp1 expression across cocaine use and split by sex.
Then use a markdown cell to write a figure caption that, at a minimum includes:
A single declaritive sentence stating the conclusion of the figure.
Each axis must be described.
Every color, line, and dot must be described.
Error bars must be described.
Then answer the question: Does sex modulate the impact of cocaine use on mcp1 expression? Create a markdown cell afterwards that describes your answer based on the figure you created.
Checked variables:
q3_ax- A matplotlib Axes object showing mean mcp1 by cocaine use, split by sexShould show confidence intervals
Should use color or grouping to separate by sex
Markdown cell with figure caption and answer
Hint
Use seaborn's barplot or pointplot with x='cocaine_use', y='mcp1', hue='sex'. This will show if the cocaine effect differs between males and females. Write a caption explaining the interaction. See Module 6 walkthrough for grouped plots.Points  | 
5  | 
Public Checks  | 
2  | 
Hidden Tests  | 
1  | 
Points: 5
# Generate a plot which displays the confidence of the mean of mcp1 expression across cocaine use
q3_plot = ...
# Is it 'likely' or 'unlikely' that gender has an impact on the effect of cocaine use on mcp1?
q3_gender_impact = ...
grader.check("q3_cocaine_use_gender_mean")
Q4: Is there a correlation between infection length and mcp1 expression?#
Generate a plot which displays the relationship between years infected and mcp1 expression.
Then use a markdown cell to write a figure caption that, at a minimum includes:
A single declaritive sentence stating the conclusion of the figure.
Each axis must be described.
Every color, line, and dot must be described.
Error bars must be described.
Then answer the question: Is there a correlation between infection length and mcp1 expression? Create a markdown cell afterwards that describes your answer based on the figure you created.
Checked variables:
q4_ax- A matplotlib Axes object with a scatterplot showing years_infected vs mcp1Should show the relationship between continuous variables
Consider adding a regression line
Markdown cell with figure caption and answer
Hint
Use seaborn's regplot or scatterplot with x='years_infected' and y='mcp1'. A regression line helps visualize correlation. Write a caption describing the relationship. See Module 6 walkthrough for correlation plots.Points  | 
5  | 
Public Checks  | 
2  | 
Hidden Tests  | 
1  | 
Points: 5
# Generate a plot which displays the relationship between years_infected and mcp1 expression
q4_plot = ...
# Is there a correlation between infection length and mcp1 expression? 'yes' or 'no'
q4_infection_length_corr = ...
grader.check("q4_infection_length")
Q5: Does cocaine use impact the relationship between infection length and mcp1 expression?#
Generate a plot which displays the impact of cocaine use relationship between years infected and mcp1 expression.
Then use a markdown cell to write a figure caption that, at a minimum includes:
A single declaritive sentence stating the conclusion of the figure.
Each axis must be described.
Every color, line, and dot must be described.
Error bars must be described.
Then answer the question: Does cocaine use impact the relationship between infection length and mcp1 expression
Create a markdown cell afterwards that describes your answer based on the figure you created.
Checked variables:
q5_ax- A matplotlib Axes object showing years_infected vs mcp1, split by cocaine useShould use color or separate lines to show different groups
Consider showing regression lines for each group
Markdown cell with figure caption and answer
Hint
Use seaborn's lmplot or regplot with hue='cocaine_use' to show separate regression lines for each group. Compare the slopes and patterns. Write a caption explaining if cocaine use changes the relationship. See Module 6 walkthrough for grouped regression plots.Points  | 
5  | 
Public Checks  | 
2  | 
Hidden Tests  | 
1  | 
# Generate a plot which displays the confidence of the mean of mcp1 expression across years infected and split by cocaine use
q5_plot = ...
# Does cocaine use impact the rate of mcp1 increase with infection length? 'yes' or 'no'
q5_infection_length_cocaine_slope = ...
grader.check("q5_infection_length_cocaine")
Submission#
Check:
That all tables and graphs are rendered properly.
Code completes without errors by using
Restart & Run All.All checks pass.
Then save the notebook and the File -> Download -> Download .ipynb. Upload this file to BBLearn.