Walkthrough#
Introduction#
After completing this learning activity, you will be able to:
Launch Google Colab
Create text using Markdown.
Execute code cells.
Check your answers with the
Otter
grading system.Submit assignments to BBLearn.
Why Python#
Throughout this course we will find cover many biological questions. Each of them we will frame as statistical visualizations that capture the biology question. Then we will rigorously validate them through statistical methods.
These tasks can be accomplished by hand, spreadsheet software like Excel, or statistical software like PRISM. However, as biological datasets get larger and the questions get more complex, these tools become difficult and unwieldy. Python, on the other hand, is a generic programming language that is used to create anything.
The recent explosion of research using Python for the biological and data science fields have spawned dozens of freely available software packages that allow you to take full advantage of the newest way to do things. Learning how to “program” will allow you to stay abreast of emerging techniques and technologies.
Why Google Colab#
I’ve been teaching “coding” in some form or another since 2007. There’s one critical hurdle everyone faces (including young me); I call it the Step 0 Hurdle.
How do you get Python installed?
Before you can even start learning, you need to have it. This used to be an arduous process that would take a skilled system administrator most of a day to do. There has been excellent progress on fixing these issues [footnote-link-to-anaconda-setup-instructions]. Then there is the trouble of installing all of the tools we’ll be using in the course. This hurdle is not insurmountable, by week 4 of this class all of you could do it, but it poses an incredible initial difficulty when you are not familiar with the concepts or decisions required by the installation process.
Goolge Colab, on the other hand, is a free and browser based interface based off the Jupyter Notebook [footnote-link-to-jupyterlab-anaconda-instructions]. The system comes “batteries included” with a large collection of tools for data science, visualization, statistics, and biology already installed. By using this system, we will all have the same environment thereby bypassing these initial stumbling blocks.
Coding expectations#
This is not a programming course, this is a statistics course. We will not cover topics like conditionals, loops, classes, or any complex algorithms or data structures. Anything that requires these concepts will already be included in the skeleton.
Instead, we will use a small collection of tools that abstract away the complexity of the analysis and provide a simple interface. Coming into the course I do not expect any previous coding experience. This course will teach you all of the Python syntax that you need and provide multiple examples.
I’ve taught hundreds of people basic data analysis. I can teach you too.
Quick introduction on cells and blocks#
A notebook is comprised of a series of cells
.
We’ll use two basic flavors:
Markdown
Code
Cell types can be changed using the dropdown menu at the top.
Markdown#
Statistical analysis isn’t all about math, code, and figures. Text is just as important. Jupyter/Colab use a plain-text syntax called Markdown
.
Here are some brief examples:
**Bold text** –> Bold text
*Italicized text* –> Italicized text
***Bold & italicized text*** –> Bold & Italicized text
```coding block``` –> coding block
[A hyperlink](https://www.google.com/) –> A hyperlink
Some of these modifications can be found at the top-left hand corner of the text block. A complete list of Markdown syntax can be found here.
Markdown can also be used to create tables and lists. For example, by typing in the below text:
|Some|Table|
|---|---|
|X|Y|
|Z|A|
Creates the following table:
Some |
Table |
---|---|
X |
Y |
Z |
A |
Typing out this text:
1. A
2. Numbered
3. List
* Bulleted
* Sublist
creates the following list:
A
Numbered
List
Bulleted
Sublist
A bulleted list can be created by typing in * instead of numbers.
Edit the next cell (by double clicking) to include your name.
Try me#
Put your name here:
Click on the next cell and hit shift+return
to execute it.
print('Hello World')
Hello World
A Notebook’s true power come from the fact that it is an easy interface to interact with a Python kernel. We do this through code
cells.
The Inferential Thinking Textbook summarized programming perfectly.
In data science, the purpose of writing a program is to instruct a computer to carry out the steps of an analysis. Computers cannot study the world on their own. People must describe precisely what steps the computer should take in order to collect and analyze data, and those steps are expressed through programs.
https://inferentialthinking.com/chapters/03/programming-in-python.html
Code cells allow us to provide these programs to the computer.
Click on the next cell and hit shift+return
to execute it.
This expression
printed the phrase “Hello world” to the output, which Jupyter/Colab then put underneath the cell. Try executing the next one. Also, notice that the number next to the cell changed. This is the execution count
and tells you how many times ANY cell has been run.
5+4
9
Notice, the result of the last line is printed to the screen. This is a particular feature of Jupyter Notebooks.
If you need to see multiple results
print('first', 5+4, 'second', 12+1)
first 9 second 13
We can also use things called variable
s to hold numbers (and other things) for use later. Try executing the next one.
height = 1.9 # meters
weight = 86 # kg
bmi = weight/height**2
Notice how it didn’t output anything? That was because the last line didn’t output anything. Instead it saved the value into the variable bmi
. Type bmi
into the next cell to see the answer.
print('BMI:', bmi)
BMI: 23.822714681440445
The nice thing about notebooks is their interactive nature. Go back and change the height
and weight
variables, execute both cells, and see how the result changes.
Otter Grader#
In order to rigorously grade the assignments, this course uses the Otter Grader.
This is a tool developed by educators at UC Berkley.
It works by providing a tool, the grader
, which I preload with checks for each question.
This will help ensure that you are moving through the code correctly.
I’ll also use it to give suggestions for common mistakes.
First, we need to do some initializing, these cells would normally be the first in the notebook. I’ve moved them down here to better describe them.
This first cell unzips the file that you downloaded from BBLearn. And then installs the otter-grader tool. You only need to run this once, but it should be graceful if you do it twice.
# Setting up the Colab environment. DO NOT EDIT!
import os
import warnings
warnings.filterwarnings("ignore")
try:
import otter
except ImportError:
! pip install -q otter-grader==4.0.0
import otter
if not os.path.exists('walkthrough-tests'):
zip_files = [f for f in os.listdir() if f.endswith('.zip')]
assert len(zip_files)>0, 'Could not find any zip files!'
assert len(zip_files)==1, 'Found multiple zip files!'
! unzip {zip_files[0]}
grader = otter.Notebook(colab=True,
tests_dir = 'walkthrough-tests')
Now that we have the grader loaded. Let’s take it for a spin with a simple problem.
Calculate a aerobic target heart rate?#
Consider a generally healthy 34 year old woman is looking to exercise in her ideal target heart rate. Per the Mayo Clinic
How to determine your target heart rate zone Use an online calculator to determine your desired target heart rate zone. Or, here’s a simple way to do the math yourself. If you’re aiming for a target heart rate in the vigorous range of 70% to 85%, you can use the heart rate reserve (HRR) method to calculate it like this:
Subtract your age from 220 to get your maximum heart rate. Calculate your resting heart rate by counting how many times your heart beats per minute when you are at rest, such as first thing in the morning. It’s usually somewhere between 60 and 100 beats per minute for the average adult.
Calculate your heart rate reserve (HRR) by subtracting your resting heart rate from your maximum heart rate.
Multiply your HRR by 0.7 (70%). Add your resting heart rate to this number.
Multiply your HRR by 0.85 (85%). Add your resting heart rate to this number.
These two numbers are your average target heart rate zone for vigorous exercise intensity when using the HRR to calculate your heart rate. Your heart rate during vigorous exercise should generally be between these two numbers.
Our subject estimates her resting heart rate at 60 bp.
Q1: Using the information above, calculate the subject’s heart rate reserve.#
You can see that your solution should finish the expression:
heart_rate_reserve = ...
You can take as many lines as you need to solve the problem (and in the future, there will be many lines needed).
That being said, the answer MUST be in the variable heart_rate_reserve
for it to count.
The checks will help to ensure not only that you’ve saved it right, but it is the right type
.
resting_heart_rate = 60
age = 34
heart_rate_reserve = 220 - age - resting_heart_rate # SOLUTION
Q2: Using the information above, calculate the upper limit of the subject’s target heart rate zone.#
upper_target_zone = heart_rate_reserve*0.85 + resting_heart_rate # SOLUTION
Submissions#
Lastly, how do you submit your assignment?
At the very end of all assignments. There will be a check_all
function. Use a Restart & Run All
to validate that your code still works (we’ll talk more about this in the future), Then, use the File
-> Download
-> Download .ipynb
. That file, is your notebook. You can open it up in something like Notepad (not Word) and you’ll see your code, plus some extra formatting.
This .ipynb
file is what you’ll upload to BBLearn.