Appendix Lecture A: Other programming languages
Today we talk about various programming languages: If you have learned one programming language, it is easy to learn the next.
Different kinds of programming languages:
- Low-level, compiled (C/C++, Fortran): You are in full control, but need to specify types, allocate memory and clean up after your-self
- High-level, interpreted (MATLAB, Python, Julia, R): Types are inferred, memory is allocated automatically, and there is automatic garbage collection
Others:
- Wolfram Mathematica: A mathematical programming langauge. The inspiration for sympy.
- STATA: For many economists still the prefered statistical program, because it is so good at panel data and provides standard errors for a lot of the commonly used estimators.
Note: Data cleaning and structuring is increasingly done in R or Python, and STATA is then only used for estimation.
Comparison: We solve the same Simulated Minimum Distance (SMD) problem in MATLAB, Python and Julia.
Observations:
- Any language can typically be used to solve a task. But some have a comparative advantage.
- If a syntax in a language irritates you, you will write worse code.
- A community in your field around a language is important.
- No language is the best at everything.
Comparisons:
- Coleman et al. (2020): MATLAB, Python and Julia: What to choose in economics?
- Fernández-Villaverde and Valencia (2019): A Practical Guide to Parallization in Economics
1.1 MATLAB
The godfather of high-level scientific programming. The main source of inspiration for numpy and Julia.
The good things:
- Full scientific programming langauge
- Especially good at optimization and (sparse) matrix algebra
- Well-developed interface (IDE) and debugger
- Integration with C++ through mex functions
The bad things:
- Not open source and costly outside of academia
- Not always easy to parallelize natively
- Not complete programming langauge
- Not in JupyterLab
Download: Available in the Absalon software library.
Example: SMD_MATLAB.mlx
More:
- Mini-course in MATLAB: See the folder
\MATLAB_course
- NumPy for Matlab users
1.2 Python
The swiss-knife of programming languages.
The good things:
- Allround programming language
- Full scientific programming (numpy+scipy)
- Good at statistics (in particular data handling and machine learning)
- Just-in-time (jit) compilation availible (numba)
- Easy to integrate with C++ (ctypes, cffi)
The bad things:
- Messy package system at times
- Sometimes hard to jit-compile and parallelize
Example: SMD_Python.ipynb
1.3 Julia
The newcomer of scientific programming languages.
- All-round programming language
- Automatic just-in-time compilation with native parallization - almost as fast as C++
- Focused on scientific computing and high performance computing
The bad things:
- Young language, with smallish, but growing, community
- Sometimes hard to ensure that the just-in-time compliation works efficiently
Example: SMD_Julia.ipynb
For introductory material on Julia for economists, see https://lectures.quantecon.org/jl/.
1.4 R
The statistician favorite choice of programming language.
- Great package system
- The best statistical packages
- Well-developed interface (IDE) (Rstudio)
- Easy to integrate with C++ (Rcpp)
The bad things:
- Not designed to be a scientific programming langauge
- Not a complete programming langauge
Download: https://www.rstudio.com/
2.1 Fortran
What I have nightmares about...
In the old days, it was a bit faster than C++. This is no longer true.
2.2 C/C++
The fastest you can get. A very powerfull tool, but hard to learn, and impossible to master.
import numpy as np
import ctypes as ct
import callcpp # local library
import psutil
CPUs = psutil.cpu_count()
CPUs_list = set(np.sort([1,2,4,*np.arange(8,CPUs+1,4)]))
print(f'this computer has {CPUs} CPUs')
this computer has 8 CPUs
2.3 Calling C++ from Python
Note I: This section can only be run on a Windows computer with the free Microsoft Visual Studio 2017 Community Edition (download here) installed.
Note II: Learning C++ is somewhat hard. These tutorials are helpful.
Pyton contains multiple ways of calling functions written in C++. Here I use ctypes.
C++ file: example.cpp in the current folder.
Step 1: Compile C++ to a .dll file
callcpp.compile_cpp('example') # compiles example.cpp
cpp files compiled
Details: Write a file called
compile.bat
and run it in a terminal under the hood.
Step 2: Link to .dll file
# funcs (list): list of functions with elements (functionname,[argtype1,argtype2,etc.])
funcs = [('myfun_cpp',[ct.POINTER(ct.c_double),ct.POINTER(ct.c_double),ct.POINTER(ct.c_double),
ct.c_long,ct.c_long,ct.c_long])]
# ct.POINTER(ct.c_double) to a double
# ct.c_long interger
cppfile = callcpp.link_cpp('example',funcs)
cpp files loaded
Step 3: Call function
def myfun_numpy_vec(x1,x2):
y = np.empty((1,x1.size))
I = x1 < 0.5
y[I] = np.sum(np.exp(x2*x1[I]),axis=0)
y[~I] = np.sum(np.log(x2*x1[~I]),axis=0)
return y
# setup
x1 = np.random.uniform(size=10**6)
x2 = np.random.uniform(size=np.int(100*CPUs/8)) # adjust the size of the problem
x1_np = x1.reshape((1,x1.size))
x2_np = x2.reshape((x2.size,1))
# timing
%timeit myfun_numpy_vec(x1_np,x2_np)
2.51 s ± 842 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def myfun_cpp(x1,x2,threads):
y = np.empty(x1.size)
p_x1 = np.ctypeslib.as_ctypes(x1) # pointer to x1
p_x2 = np.ctypeslib.as_ctypes(x2) # pointer to x2
p_y = np.ctypeslib.as_ctypes(y) # pointer to y
cppfile.myfun_cpp(p_x1,p_x2,p_y,x1.size,x2.size,threads)
return y
assert np.allclose(myfun_numpy_vec(x1_np,x2_np),myfun_cpp(x1,x2,1))
for threads in CPUs_list:
print(f'threads = {threads}')
%timeit myfun_cpp(x1,x2,threads)
print('')
threads = 8
271 ms ± 7.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
threads = 1
1.05 s ± 60.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
threads = 2
668 ms ± 52.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
threads = 4
388 ms ± 9.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Observation: Compare with results in lecture 12. Numba is roughly as fast as C++ here (I get different results across different computers). In larger problems, C++ is usually faster, and while Numba is limited in terms of which Python and Numpy features it supports, everything can be coded in C++.
Step 4: Delink .dll file
callcpp.delink_cpp(cppfile,'example')
cpp files delinked
More information: See the folder "Numba and C++" in the ConsumptionSavingNotebooks repository. Incudes, an explanation on how to use the NLopt optimizers in C++.