Appendix Lecture A: Other programming languages

Today we talk about various programming languages: If you have learned one programming language, it is easy to learn the next.

Different kinds of programming languages:

  1. Low-level, compiled (C/C++, Fortran): You are in full control, but need to specify types, allocate memory and clean up after your-self
  2. High-level, interpreted (MATLAB, Python, Julia, R): Types are inferred, memory is allocated automatically, and there is automatic garbage collection

Others:

  1. Wolfram Mathematica: A mathematical programming langauge. The inspiration for sympy.
  2. STATA: For many economists still the prefered statistical program, because it is so good at panel data and provides standard errors for a lot of the commonly used estimators.

Note: Data cleaning and structuring is increasingly done in R or Python, and STATA is then only used for estimation.

Comparison: We solve the same Simulated Minimum Distance (SMD) problem in MATLAB, Python and Julia.

Observations:

  1. Any language can typically be used to solve a task. But some have a comparative advantage.
  2. If a syntax in a language irritates you, you will write worse code.
  3. A community in your field around a language is important.
  4. No language is the best at everything.

Comparisons:

1. High-level programming languages

1.1 MATLAB

The godfather of high-level scientific programming. The main source of inspiration for numpy and Julia.

The good things:

  1. Full scientific programming langauge
  2. Especially good at optimization and (sparse) matrix algebra
  3. Well-developed interface (IDE) and debugger
  4. Integration with C++ through mex functions

The bad things:

  1. Not open source and costly outside of academia
  2. Not always easy to parallelize natively
  3. Not complete programming langauge
  4. Not in JupyterLab

Download: Available in the Absalon software library.

Example: SMD_MATLAB.mlx

More:

  1. Mini-course in MATLAB: See the folder \MATLAB_course
  2. NumPy for Matlab users

1.2 Python

The swiss-knife of programming languages.

The good things:

  1. Allround programming language
  2. Full scientific programming (numpy+scipy)
  3. Good at statistics (in particular data handling and machine learning)
  4. Just-in-time (jit) compilation availible (numba)
  5. Easy to integrate with C++ (ctypes, cffi)

The bad things:

  1. Messy package system at times
  2. Sometimes hard to jit-compile and parallelize

Example: SMD_Python.ipynb

1.3 Julia

The newcomer of scientific programming languages.

  1. All-round programming language
  2. Automatic just-in-time compilation with native parallization - almost as fast as C++
  3. Focused on scientific computing and high performance computing

The bad things:

  1. Young language, with smallish, but growing, community
  2. Sometimes hard to ensure that the just-in-time compliation works efficiently

Example: SMD_Julia.ipynb

Julia community:

For introductory material on Julia for economists, see https://lectures.quantecon.org/jl/.

1.4 R

The statistician favorite choice of programming language.

  1. Great package system
  2. The best statistical packages
  3. Well-developed interface (IDE) (Rstudio)
  4. Easy to integrate with C++ (Rcpp)

The bad things:

  1. Not designed to be a scientific programming langauge
  2. Not a complete programming langauge

Download: https://www.rstudio.com/

2. Low-level programming languages

2.1 Fortran

What I have nightmares about...

In the old days, it was a bit faster than C++. This is no longer true.

2.2 C/C++

The fastest you can get. A very powerfull tool, but hard to learn, and impossible to master.

[1]
import numpy as np
import ctypes as ct
import callcpp # local library
[2]
import psutil
CPUs = psutil.cpu_count()
CPUs_list = set(np.sort([1,2,4,*np.arange(8,CPUs+1,4)])) 
print(f'this computer has {CPUs} CPUs')
this computer has 8 CPUs

2.3 Calling C++ from Python

Note I: This section can only be run on a Windows computer with the free Microsoft Visual Studio 2017 Community Edition (download here) installed.

Note II: Learning C++ is somewhat hard. These tutorials are helpful.

Pyton contains multiple ways of calling functions written in C++. Here I use ctypes.

C++ file: example.cpp in the current folder.

Step 1: Compile C++ to a .dll file

[3]
callcpp.compile_cpp('example') # compiles example.cpp
cpp files compiled

Details: Write a file called compile.bat and run it in a terminal under the hood.

Step 2: Link to .dll file

[4]
# funcs (list): list of functions with elements (functionname,[argtype1,argtype2,etc.])
funcs = [('myfun_cpp',[ct.POINTER(ct.c_double),ct.POINTER(ct.c_double),ct.POINTER(ct.c_double),
                       ct.c_long,ct.c_long,ct.c_long])]

# ct.POINTER(ct.c_double) to a double
# ct.c_long interger

cppfile = callcpp.link_cpp('example',funcs)
cpp files loaded

Step 3: Call function

[5]
def myfun_numpy_vec(x1,x2):
    y = np.empty((1,x1.size))
    I = x1 < 0.5
    y[I] = np.sum(np.exp(x2*x1[I]),axis=0)
    y[~I] = np.sum(np.log(x2*x1[~I]),axis=0)
    return y

# setup
x1 = np.random.uniform(size=10**6)
x2 = np.random.uniform(size=np.int(100*CPUs/8)) # adjust the size of the problem
x1_np = x1.reshape((1,x1.size))
x2_np = x2.reshape((x2.size,1))

# timing
%timeit myfun_numpy_vec(x1_np,x2_np)
2.51 s ± 842 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[6]
def myfun_cpp(x1,x2,threads):
    y = np.empty(x1.size)
    p_x1 = np.ctypeslib.as_ctypes(x1) # pointer to x1
    p_x2 = np.ctypeslib.as_ctypes(x2) # pointer to x2
    p_y = np.ctypeslib.as_ctypes(y) # pointer to y
    cppfile.myfun_cpp(p_x1,p_x2,p_y,x1.size,x2.size,threads)
    return y

assert np.allclose(myfun_numpy_vec(x1_np,x2_np),myfun_cpp(x1,x2,1))
for threads in CPUs_list:
    print(f'threads = {threads}')
    %timeit myfun_cpp(x1,x2,threads)
    print('')
threads = 8 271 ms ± 7.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) threads = 1 1.05 s ± 60.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) threads = 2 668 ms ± 52.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) threads = 4 388 ms ± 9.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Observation: Compare with results in lecture 12. Numba is roughly as fast as C++ here (I get different results across different computers). In larger problems, C++ is usually faster, and while Numba is limited in terms of which Python and Numpy features it supports, everything can be coded in C++.

Step 4: Delink .dll file

[7]
callcpp.delink_cpp(cppfile,'example')
cpp files delinked

More information: See the folder "Numba and C++" in the ConsumptionSavingNotebooks repository. Incudes, an explanation on how to use the NLopt optimizers in C++.