Lecture 05: A bit about Jupyter

Workflow and debugging
Programming is more than writing code
Design patterns
Debugging
VSCode
Modules
Git
Summary

A few pointers about working in JupyterLab

Basically, there are 2 kinds of cells: code (press y) and markdown (press m)
Run a cell: Ctrl + Enter
Delete a cell: x
Create a cell above current cell: a
Create a cell below current cell: b
Write inline latex in markdown cell, eg. an $\alpha$ : $\alpha$
Write separate equation line latex in markdown cell: use 2 dollar signs : $$\alpha = \beta$$
Create titles in markdown cell: # for top level section, ## for second level, etc.

See Drawing icon on the left bar for many more commands and hot keys.

Lets quickly look at some variable through the Variable Inspector, also found at the bottom of Drawing

[ ]

# An example for Inspector Variable:
x = {"first":True, "second":False}
items = ["first", "second"]
y = [k in x and x[k] for k in items]

1. Workflow and debugging

[3]

#from IPython.display import YouTubeVideo
#YouTubeVideo('ABx55cEop-o', width=800, height=300)

You will learn how to structure and comment your code and document it for later use. You will learn how to debug your code using print, assert and try/except statements. You will learn how to write modules and run scripts from a terminal in VSCode and how to share your code with others through Git.

[14]

import math
import numpy as np
from IPython.display import Image

2. Programming is more than writing code

You seldom write some code, run it, get the right results, and then never use it again.

Firstly: You make errors (bugs) when you code.
Secondly: You need to share your code with colleagues and your future self.

Transparent macro- and microstructure is important:

For preventing errors.
For finding errors.
For making your code interpretable for others and your future-self.

No code is self-explanatory - even though if might seem so when you write it.

Cleaning, commenting and documenting code takes time, but is a crucial aspect of good programming.

In scientific programming, a transparent program structure and good documentation is also a cornerstone in securing replicability.

2.1 Structure

Macro structure (wrt. folders and files):

One folder for each project with ALL required files.
End goal: 1 file (notebook) to run it all. Very important!
Module files (.py): Define functions, classes, etc. Perhaps different modules for different kind of tasks (solving, simulating, plotting).
Notebook files (.ipynb): Call functions, classes etc. and explain and present the results.
Larger projects: Sub-folders for data, figures, etc. (not relevant now).

Workflow:

Notebooks (.ipynb): Work with them in JupyterLab.
Modules (.py): Work with them preferably in VSCode (but JupyterLab will also do).

Microstructure:
How do you structure your code and format it like a pro?
Answer is the official PEP8 guideline.
Will tell you all about how to do formatting, commenting and structuring Python code. Makes your code readable to others.

Note: A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.

Recommendations:

Code layout:
- Indentation: Four spaces
- Line length: Max of 79 characters (wrap line + indent properly)
- Strings: Use single or double quote (be consistent)
- White space:
  - After comma: x = [1, 2, 3] (not required)
  - Around assignment: x = y
  - After colon: if x == 2: print(x)
  - Around operators with lowest priority in a calculation: c = (a+b) * (a-b) or z = x*x + y*y
Naming conventions: Short, but also precise
- Modules: Lower case with potential underscores (e.g. numecon or num_econ)
- Classes: Camel case (e.g. ConsumerClass)
- Variables, functions and methods: Lower case with potential underscores
Ordered section comments: Break your code into sections
- Give each section a name and a place in the ordering
- Level 1: a, b, c etc.
- Level 2: i, ii, iii, iv etc.
- Level 3: o, oo, ooo, oooo etc.
Line comments: Small additional hints
- Again, short and precise
- Avoid just explaining what the code does (must provide additional information)
Docstrings: Should be written for all functions, methods and classes (see how below).

More on names:

Name functions after their intended use. Verbs can be handy for such naming. (But its doing many things? Not a good sign, see design patterns below)
Help your self in debugging and name variables in a searchable way (unless they are super local). You cannot search for the name i in a bunch of code files.
Normally avoid using any special characters.
Unused variables and non-public methods should start with a _

Two different perspectives on comments:

The comments explain humans what the code does.(~ you'll write the code first, then comments)
The code makes the computer do what the comments say. (~ you'll write the comments first, then code)

Example of well formatted code:

[ ]

import math

# a. name for section
alpha = 1
beta = 2
x = [-3, -2, -1, 1, 2, 3]

# b. name for section
def my_function(x,alpha,beta):
    """ explain what the function does (docstring)
    
    Args:
    
        x (float): explanation
        alpha (float): explanation
        beta (float): explanation
        
    Returns:
    
        y (float): explanation
    
    """
    
    y = x**2 
    return y

# c. name for section
for i in range(len(x)):
    
    # i. name for sub-section
    y = my_function(x[i],alpha,alpha)
    
    # ii. name for sub-section
    cond = y > 0 # non-positive not allowed due to log (line comment)
    
    # iii. name for sub-section
    if cond:
        print(math.log(y))

Try: Write my_function( and press Shift+Tab

Recommendation: Try to think about which sections and sub-sections you need beforehand. You can even write before you write code!

3. Design patterns

When thinking about how organize your functions and objects, few commandments that will serve you well:

DRY: Do not Repeat Yourself. A specific line of code must only appear once in your script. Get rid of code repetitions by looping or create functions for the lines that are being repeated. Code repetition induces bugs when you change code.
1 job: A function has 1 job only. That is, it should only try to accomplish one well-defined task. Sub-tasks within the main task is delegated to other functions.
No side effects: A corollary of 1 job. If a function returns $x$ , then it should not also produce lasting changes to $y$ if $y$ lives outside the local scope of the function.
1 screen fits all: A handy rule-of-thumb is that the body of code (not including doc strings) in a function should fit into your screen in a readable way. One should not be too religious about this principle, though. But if you have one long function after the other, you have probably violated the commandments above.

More on design patterns:

You can check out Google's Python style guide to catch a quick glimpse of how they organize their work.
One of the bibles on design patterns is edited by the famous Uncle Bob. The code examples are not based on Python, but the logic and insights still apply. It's actually fun and not-that-hard to read - and full of wisdom!

4. Debugging

Why is a programming error called a bug?

General advice:

Code is always partly a black box: Print and plot results to convince yourself (and others) that your results are sensible.
Errors are typically something very very simple, look after that.
If Python raises an error first try to locate the line where the error occurs.
Your code can often run, but give you unexpected behavior.
Include if, print and assert statements to catch errors.

Most of the time spend programming is debugging!! Even when the final code is simple, it can take a lot of trial-and-error to get there.

Assertions: Whenever you know something about your variables (e.g. that they should be positive), you should assert this. If the assertion does not hold Python raises an error.
Exceptions: When code fails, it generates ('raises') an exception.

[7]

x = -2
y = x**2
assert y > 0, f'x = {x}, y = {y}'

Task: Make the above assertion fail.

4.1 Example

See official Introduction to errors and exceptions and RealPython on exceptions.

Consider the following code:

[9]

a = 0.8
xlist = [-1,2,3]

def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        y += z
    return y

myfun(xlist,a)

(3.340308817497993+0.5877852522924732j)

Problem: Our result is a complex number. We did not expect that. Why does this problem arise?

Find the error with print:

[10]

def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        print(f'x = {x} -> {z}') # temp
        y += z
    return y

myfun(xlist,a)

x = -1 -> (-0.8090169943749473+0.5877852522924732j)
x = 2 -> 1.7411011265922482
x = 3 -> 2.4082246852806923

(3.340308817497993+0.5877852522924732j)

Solution with an assert:

[17]

def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        assert np.isreal(z), f'z is not real for x = {x}, but {z}'
        y += z
    return y
try:
    myfun(xlist,a)
except:
    print('assertion failed')

assertion failed

[ ]

# Running the function to generate error messsage
myfun(xlist,a)

Solution with if and raise exception:

[21]

def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        if not np.isreal(z):
            print(f'z is not real for x = {x}, but {z}')
            raise ValueError('Negative input number')# an exception will be raised here  
        y += z
    return y

try:
    myfun(xlist,a)
except:
    # we'll end up down here because the exception was raised. 
    print('assertion failed')

z is not real for x = -1, but (-0.8090169943749473+0.5877852522924732j)
assertion failed

[ ]

# Running the function to generate error messsage
myfun(xlist,a)

Note: You could also decide that the function should return e.g. $ -\infty $ when experiencing a complex number.

[ ]

def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        if not np.isreal(z):
            return -np.inf
        y += z
    return y

myfun(xlist,a)

4.2 Numpy warnings

Here we see an example of an error warning thrown by numpy. Notice the term RuntimeWarning.

Run time is the moment a function is executed by the computer - so a run time error is a kind of error that cannot be detected before the program actually runs.
In the present case: Jupyter/Python simply cannot detect that the -1 in xlist will get in conflict with the log function before the cell is run.
However, you may see that the VS Code Intellisense informs you that perhaps some syntax is wrong or that a variable has not been defined before it is put in use.

[23]

import numpy as np
xlist = [-1,2,3]
def f(xlist):
    y = np.empty(len(xlist))
    for i,x in enumerate(xlist):
        y[i] = np.log(x)
    return y

f(xlist)

/var/folders/vb/59m5ytss5h77h6fgdj5bl_7h0000gp/T/ipykernel_53240/2369909365.py:6: RuntimeWarning: invalid value encountered in log
  y[i] = np.log(x)

array([       nan, 0.69314718, 1.09861229])

You can ignore all warnings:

[ ]

def f(xlist):
    y = np.empty(len(xlist))
    for i,x in enumerate(xlist):
        with np.errstate(all='ignore'):
            y[i] = np.log(x)
    return y

f(xlist)

Better: Decide what the code should do.

[24]

def f(xlist):
    y = np.empty(len(xlist))
    for i,x in enumerate(xlist):
        if x <= 0:
            y[i] = -np.inf
        else:
            y[i] = np.log(x)
    return y

f(xlist)

array([      -inf, 0.69314718, 1.09861229])

4.3 Scope bugs

Global variables are dangerous:

[25]

# a. define a function to multiple a variable with 5
a = 5
def f(x):
    return a*x

# many lines of code
# many lines of code
# many lines of code

# z. setup the input and call f
y = np.array([3,3])
a = np.mean(y)
b = np.mean(f(y))

print(b)

9.0

Question: What is the error?

Conclusion: Never use global variables, they can give poisonous side effects. Use a positional or a keyword argument instead.

Useful tool I: The variable inspector.

Install: See here
Open it: Right-click and choose "Open Variable Inspector"

Useful tool II: The console.

Install: Done automatically
Open it: Right-click and choose "New Console for Notebook"

4.4 Index bugs

[26]

# a. setup
N = 10
x = np.linspace(1.3,8.2,N)
y = 9.2

# b. count all entries in x below y
i = 0
try:
    while x[i] < y:
        i += 1
except:
    print('error found')

error found

Task: Solve the problem.

5. VSCode

Central benefits of VSCode:

Good editor (easy to move across and with-in files)
Linting (find errors before you run the code)
Run scripts
Interactive sessions
Integrated git (to share your code online) (see below)
Debugging (not today)

Example: We go through this guide together.

6. Modules

Long notebooks can be very hard to read. Code is structured better in modules saved in .py files.

Open VSCode
Locate the folder with your notebook
Create mymodule.py
In the notebook: import mymodule
All functions in mymodule.py is now avaliable in the notebook with the prefix mymodule.

Important: if you write changes in the code of your own module, eg. mymodule, and if mymodule has already been imported to Jupyter before the changes, then simply running the import mymodule statement again will not import your changes. Python sees that mymodule is already imported, and thus does nothing.

Solution: Use the %load_ext autoreload magic with %autoreload 2. Then your modules are automatically reloaded each time you run a cell. Without the module is never reloaded.

[ ]

%load_ext autoreload
%autoreload 2

[30]

import mymodule as mm

[31]

try:
    mm.myfun(2)
except:
    print('error found')

error found

Another solution is to use importlib

[ ]

import importlib
import mymodule
# something happens..
importlib.reload(mymodule)

Extra: Locating modules

You may wonder: how does Python actually find modules on my computer?
And: what if I made 2 modules with the same name, or created a module with a name that was already present in Anaconda?

Simple answer: when loading modules, Python browses through a predetermined set of folders on your computer, defined by the sys.path.
The first time it encounters a module with the name of the module to be loaded, it brings that one up. If have 2 similarly named modules, the first on the list would therefore be chosen.

You can inspect the order of folders where Python searches for modules on your computer by running the cell below. Notice that your current working directory is second on the list.

[33]

import sys
sp = sorted(sys.path)
for p in sp:
    print(p)


/Users/jzk870/.ipython
/Users/jzk870/Dropbox/Work/Undervisning/NumericalMethods/Course2022/lectures-2022/05
/Users/jzk870/opt/anaconda3/lib/python3.9
/Users/jzk870/opt/anaconda3/lib/python3.9/lib-dynload
/Users/jzk870/opt/anaconda3/lib/python3.9/site-packages
/Users/jzk870/opt/anaconda3/lib/python3.9/site-packages/IPython/extensions
/Users/jzk870/opt/anaconda3/lib/python3.9/site-packages/aeosa
/Users/jzk870/opt/anaconda3/lib/python3.9/site-packages/locket-0.2.1-py3.9.egg
/Users/jzk870/opt/anaconda3/lib/python39.zip

7. Git

The purpose of git is to allow you to easily share your code with collaborators and track the changes each of you make.

We go through this guide together.

Note: You will be given repositories named github.com/NumEconCopenhagen/projects-2022-YOURGROUPNAME

Essential Git terms:

Local your computer.
Remotes the code on Github and on other computers.
Branch a branch of code is a separate track or copy of the code base on which you can develop new stuff. There is normally a structure of a main branch that holds the current working version of code and then several testing branches where new stuff is developed. After development, those braches are merged onto the main branch.
.gitignore a file that contains specifications on which types of files that are not included in process of sending changes back and forth.
.git there is a hidden folder in all git repositories. This folder includes the diff and head files that contain the whole history of changes to code so far. Delete .git, and your code folders are no longer a working repository. Now it's just regular code.

Essential Git commands:

Fetch is the process of getting aware of any changes to code outside the local repository on your computer. Does not happen automatically! You are not importing changes by fetching, you just make your local check if anything has happened on the remote repo(s).
Stage before you can send off your own changes to code, you need to decide which chunks of code specifically to send. Mostly, you will just stage all, that is, send off all changes you have made.
Commit the process of making your changes available to the remote repo.
Merge when you let changes to code from remotes get weaved into your own code.
Push after committing, you order the remote take the changes you made. The remote will not automatically accept the order, if you are not the admin of the remote repo.
Pull is fetching and merging with the remote.
Sync is pulling and then pushing to the remote. It's a function special to VS Code.

See the file helpful_git_commands.txt for code to do the above commmands.

Recommendation for Inaugral Project: if you are new to git, just work as you normally would and then commit the final product to your repo when you are done.

8. Summary

This lecture: We have discussed

Structuring and commenting on code
Debugging (try-except, assert, warnings)
Writing and running Python in VSCode
Git (version control)

Note deadline for hand-in of inaugural project is 27th of March.