Walkthrough

Walkthrough#

Introduction#

After completing this learning activity, you will be able to:

Launch Google Colab
Create text using Markdown.
Execute code cells.
Check your answers with the Otter grading system.
Submit assignments to BBLearn.

Why Python#

Throughout this course we will find cover many biological questions. Each of them we will frame as statistical visualizations that capture the biology question. Then we will rigorously validate them through statistical methods.

These tasks can be accomplished by hand, spreadsheet software like Excel, or statistical software like PRISM. However, as biological datasets get larger and the questions get more complex, these tools become difficult and unwieldy. Python, on the other hand, is a generic programming language that is used to create anything.

The recent explosion of research using Python for the biological and data science fields have spawned dozens of freely available software packages that allow you to take full advantage of the newest way to do things. Learning how to “program” will allow you to stay abreast of emerging techniques and technologies.

Why Google Colab#

I’ve been teaching “coding” in some form or another since 2007. There’s one critical hurdle everyone faces (including young me); I call it the Step 0 Hurdle.

How do you get Python installed?

Before you can even start learning, you need to have it. This used to be an arduous process that would take a skilled system administrator most of a day to do. There has been excellent progress on fixing these issues [footnote-link-to-anaconda-setup-instructions]. Then there is the trouble of installing all of the tools we’ll be using in the course. This hurdle is not insurmountable, by week 4 of this class all of you could do it, but it poses an incredible initial difficulty when you are not familiar with the concepts or decisions required by the installation process.

Goolge Colab, on the other hand, is a free and browser based interface based off the Jupyter Notebook [footnote-link-to-jupyterlab-anaconda-instructions]. The system comes “batteries included” with a large collection of tools for data science, visualization, statistics, and biology already installed. By using this system, we will all have the same environment thereby bypassing these initial stumbling blocks.

Coding expectations#

This is not a programming course, this is a statistics course. We will not cover topics like conditionals, loops, classes, or any complex algorithms or data structures. Anything that requires these concepts will already be included in the skeleton.

Instead, we will use a small collection of tools that abstract away the complexity of the analysis and provide a simple interface. Coming into the course I do not expect any previous coding experience. This course will teach you all of the Python syntax that you need and provide multiple examples.

I’ve taught hundreds of people basic data analysis. I can teach you too.

Quick introduction on cells and blocks#

A notebook is comprised of a series of cells.

We’ll use two basic flavors:

Markdown
Code

Cell types can be changed using the dropdown menu at the top.

Markdown#

Statistical analysis isn’t all about math, code, and figures. Text is just as important. Jupyter/Colab use a plain-text syntax called Markdown.

Here are some brief examples:

**Bold text** –> Bold text

*Italicized text* –> Italicized text

***Bold & italicized text*** –> Bold & Italicized text

```coding block``` –> coding block

[A hyperlink](https://www.google.com/) –> A hyperlink

Some of these modifications can be found at the top-left hand corner of the text block. A complete list of Markdown syntax can be found here.

Markdown can also be used to create tables and lists. For example, by typing in the below text:

|Some|Table|
|---|---|
|X|Y|
|Z|A|

Creates the following table:

Some	Table
X	Y
Z	A

Typing out this text:

1. A
2. Numbered
3. List
  * Bulleted
  * Sublist

creates the following list:

A
Numbered
List

Bulleted
Sublist

A bulleted list can be created by typing in * instead of numbers.

Edit the next cell (by double clicking) to include your name.

Try me#

Put your name here:

Click on the next cell and hit shift+return to execute it.

print('Hello World')

Hello World

A Notebook’s true power come from the fact that it is an easy interface to interact with a Python kernel. We do this through code cells.

The Inferential Thinking Textbook summarized programming perfectly.

In data science, the purpose of writing a program is to instruct a computer to carry out the steps of an analysis. Computers cannot study the world on their own. People must describe precisely what steps the computer should take in order to collect and analyze data, and those steps are expressed through programs.

https://inferentialthinking.com/chapters/03/programming-in-python.html

Code cells allow us to provide these programs to the computer.

Click on the next cell and hit shift+return to execute it.

This expression printed the phrase “Hello world” to the output, which Jupyter/Colab then put underneath the cell. Try executing the next one. Also, notice that the number next to the cell changed. This is the execution count and tells you how many times ANY cell has been run.

5+4

Notice, the result of the last line is printed to the screen. This is a particular feature of Jupyter Notebooks.

If you need to see multiple results

print('first', 5+4, 'second', 12+1)

first 9 second 13

We can also use things called variables to hold numbers (and other things) for use later. Try executing the next one.

height = 1.9 # meters
weight = 86 # kg

bmi = weight/height**2

Notice how it didn’t output anything? That was because the last line didn’t output anything. Instead it saved the value into the variable bmi. Type bmi into the next cell to see the answer.

print('BMI:', bmi)

BMI: 23.822714681440445

The nice thing about notebooks is their interactive nature. Go back and change the height and weight variables, execute both cells, and see how the result changes.

Otter Grader#

In order to rigorously grade the assignments, this course uses the Otter Grader. This is a tool developed by educators at UC Berkley. It works by providing a tool, the grader, which I preload with checks for each question. This will help ensure that you are moving through the code correctly. I’ll also use it to give suggestions for common mistakes.

First, we need to do some initializing, these cells would normally be the first in the notebook. I’ve moved them down here to better describe them.

This first cell unzips the file that you downloaded from BBLearn. And then installs the otter-grader tool. You only need to run this once, but it should be graceful if you do it twice.

# Setting up the Colab environment. DO NOT EDIT!
import os
import warnings
warnings.filterwarnings("ignore")

try:
    import otter

except ImportError:
    ! pip install -q otter-grader==4.0.0
    import otter

if not os.path.exists('walkthrough-tests'):
    zip_files = [f for f in os.listdir() if f.endswith('.zip')]
    assert len(zip_files)>0, 'Could not find any zip files!'
    assert len(zip_files)==1, 'Found multiple zip files!'
    ! unzip {zip_files[0]}

grader = otter.Notebook(colab=True,
                        tests_dir = 'walkthrough-tests')

Now that we have the grader loaded. Let’s take it for a spin with a simple problem.

Calculate a aerobic target heart rate?#

Consider a generally healthy 34 year old woman is looking to exercise in her ideal target heart rate. Per the Mayo Clinic

How to determine your target heart rate zone Use an online calculator to determine your desired target heart rate zone. Or, here’s a simple way to do the math yourself. If you’re aiming for a target heart rate in the vigorous range of 70% to 85%, you can use the heart rate reserve (HRR) method to calculate it like this:

Subtract your age from 220 to get your maximum heart rate. Calculate your resting heart rate by counting how many times your heart beats per minute when you are at rest, such as first thing in the morning. It’s usually somewhere between 60 and 100 beats per minute for the average adult.

Calculate your heart rate reserve (HRR) by subtracting your resting heart rate from your maximum heart rate.
Multiply your HRR by 0.7 (70%). Add your resting heart rate to this number.
Multiply your HRR by 0.85 (85%). Add your resting heart rate to this number.
These two numbers are your average target heart rate zone for vigorous exercise intensity when using the HRR to calculate your heart rate. Your heart rate during vigorous exercise should generally be between these two numbers.

Our subject estimates her resting heart rate at 60 bp.

Q1: Using the information above, calculate the subject’s heart rate reserve.#

You can see that your solution should finish the expression:

heart_rate_reserve = ...

You can take as many lines as you need to solve the problem (and in the future, there will be many lines needed). That being said, the answer MUST be in the variable heart_rate_reserve for it to count. The checks will help to ensure not only that you’ve saved it right, but it is the right type.

resting_heart_rate = 60
age = 34

heart_rate_reserve = 220 - age - resting_heart_rate # SOLUTION

Q2: Using the information above, calculate the upper limit of the subject’s target heart rate zone.#

upper_target_zone = heart_rate_reserve*0.85 + resting_heart_rate # SOLUTION

Submissions#

Lastly, how do you submit your assignment?

At the very end of all assignments. There will be a check_all function. Use a Restart & Run All to validate that your code still works (we’ll talk more about this in the future), Then, use the File -> Download -> Download .ipynb. That file, is your notebook. You can open it up in something like Notepad (not Word) and you’ll see your code, plus some extra formatting.

This .ipynb file is what you’ll upload to BBLearn.