Today's topics:

  • Solving systems of equations
    • Linear systems
    • Non-linear systems

  • Newton's method for root finding
  • Bisection for root finding
  • Symbolic math in Python

Lecture 10: Solving equations

You will learn about working with matrices and linear algebra (scipy.linalg), including solving systems of linear equations. You will learn to find roots of linear and non-linear equations both numerically (scipy.optimize) and symbolically (sympy).

Note: The algorithms written here are meant to be illustrative. The scipy implementations are always both the fastest and the safest choice.

Links:

  1. scipy.linalg: overview + tutorial
  2. sympy: overview + tutorial
  3. scipy.optimize: overview + turtorial
[ ]
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import ipywidgets as widgets
import time
from scipy import linalg
from scipy import optimize
import sympy as sm
from IPython.display import display

# local module for linear algebra
%load_ext autoreload
%autoreload 2
import numecon_linalg

1. Systems of linear equations

A lot of economic models can be expressed as system of equations. Question is how to solve them? As a very easy example, just think of the market equilibrium when you have a supply curve and a demand curve.

Consider a market, where suppliers have a supply curve

q=5+13pq = 5 + \frac{1}{3}p

and consumers have a demand curve

q=102pq = 10 - 2p

This gives rise to a linear system of equations

KaTeX parse error: No such environment: align at position 7: \begin{̲a̲l̲i̲g̲n̲}̲ q-\frac{1}{3}p…

When can put this into matrix notation Ax=bAx=b by

[1211/3][qp]=[105]\begin{bmatrix} 1 & 2 \\ 1 & -1/3 \\ \end{bmatrix} \cdot \begin{bmatrix} q \\ p \\ \end{bmatrix} = \begin{bmatrix} 10 \\ 5 \\ \end{bmatrix}

Solving for the equilibrium x=[q,p]x = [q, p] means solving xx in Ax=bAx=b. And we have done that if AA can be inverted: x=A1bx = A^{-1}b.

[ ]
A = np.array([[1.0, 2.0],[1.0, -1/3]])
b = np.array([10.0, 5.0])

Ainv = linalg.inv(A)
sol = Ainv @ b
print(sol)

1.1 Matrix equations

More generally, we consider matrix equations with nn equations and nn unknowns:

Ax=b[a11a12a1na21a22a2nan1an2ann][x1x2xn]=[b1b2bn]\begin{aligned} Ax = b \Leftrightarrow \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}\begin{bmatrix}x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{bmatrix} & = \begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n} \end{bmatrix} \end{aligned}

where AA is a square parameter matrix, bb is a parameter vector, and xx is the vector of unknowns.

A specific example could be:

Ax=b[320110051][x1x2x3]=[241]\begin{aligned} Ax = b \Leftrightarrow \begin{bmatrix} 3 & 2 & 0 \\ 1 & -1 & 0 \\ 0 & 5 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \,=\, \begin{bmatrix} 2 \\ 4 \\ -1 \end{bmatrix} \end{aligned}

How to solve this?

[ ]
A = np.array([[3.0, 2.0, 0.0], [1.0, -1.0, 0], [0.0, 5.0, 1.0]])
b = np.array([2.0, 4.0, -1.0])

We could just guess..

[ ]
Ax = A@[2,-1,9] # @ is matrix multiplication
print('A@x: ',Ax)

if np.allclose(Ax,b): 
    print('solution found')
else:
    print('solution not found')

Various matrix operations:

[ ]
A.T # transpose
[ ]
np.diag(A) # diagonal
[ ]
np.tril(A) # lower triangular matrix
[ ]
np.triu(A) # upper triangular matrix
[ ]
B = A.copy()
np.fill_diagonal(B,0) # fill diagonal with zeros
print(B)
[ ]
linalg.inv(A) # inverse
[ ]
linalg.eigvals(A) # eigen values

1.2 Direct solution with Gauss-Jordan elimination

Consider the column stacked matrix:

X=[Ab]=[a11a12a1nb1a21a22a2nb2an1an2annbn]X=[A\,|\,b]=\begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n} & b_{1}\\ a_{21} & a_{22} & \cdots & a_{2n} & b_{2}\\ \vdots & \vdots & \ddots & \vdots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} & b_{n} \end{bmatrix}

Find the row reduced echelon form by performing row operations, i.e.

  1. Multiply row with constant
  2. Swap rows
  3. Add one row to another row,

until the AA part of the matrix is the identity matrix.

Manually:

[ ]
# a. stack
X = np.column_stack((A,b))

print('stacked:\n',X)

# b. row operations
X[0,:] += 2*X[1,:]
X[0,:] /= 5.0
X[1,:] -= X[0,:]
X[1,:] *= -1
X[2,:] -= 5*X[1,:]
print('\nrow reduced echelon form:\n',X)

# c. print result (the last column in X in row reduced echelon form)
print('\nsolution \n', X[:,-1])

General function:

[ ]
Y = np.column_stack((A,b))
numecon_linalg.gauss_jordan(Y)
print('solution',Y[:,-1])

which can also be used to find the inverse if we stack with the identity matrix instead,

[ ]
# a. construct stacked matrix
Z = np.hstack((A,np.eye(3)))
print('stacked:\n',Z)

# b. apply gauss jordan elimination
numecon_linalg.gauss_jordan(Z)

# b. find inverse
inv_Z = Z[:,3:] # last 3 columns of Z in row reduced echelon form
print('inverse:\n',inv_Z)

assert np.allclose(Z[:,3:]@A,np.eye(3))

1.3 Iteative Gauss-Seidel (+)

We can always decompose AA into additive lower and upper triangular matrices,

A=L+U=[a1100a21a220an1an2ann]+[0a12a1n00a2n000]A=L+U=\begin{bmatrix}a_{11} & 0 & \cdots & 0\\ a_{21} & a_{22} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}+\begin{bmatrix}0 & a_{12} & \cdots & a_{1n}\\ 0 & 0 & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & 0 \end{bmatrix}

such that

Ax=bLxLHS=bUxRHSAx=b\Leftrightarrow \underbrace{Lx}_{\text{LHS}}=\underbrace{b-Ux}_{\text{RHS}}

The idea and beauty of the algorithm is that we go from an identity, Ax=bAx=b, to an iteration on xx. This is because the xx on the LHS above is not the same xx as on the RHS. It is an update! And if we keep making updates, we will eventually get the solution. See how below.

Algorithm: gauss_seidel()

  1. Choose tolerance ϵ>0\epsilon > 0 and set n=1n=1. Define the initial guess on xx denoted x0x_0.
  2. From AA, get LL and UU as the lower and upper part.
  3. Set x~=x0\tilde{x}= x_0
  4. Given x~\tilde{x}, calculate yn=(bUx~)y_n = (b-U\tilde{x}).
  5. Given yny_n solve for xnx_{n} in the equation Lxn=ynLx_{n} = y_n.
  6. If xnx~<ϵ|x_{n}-\tilde{x}|_{\infty} < \epsilon stop.
    Else, set x~=xn\tilde{x} = x_{n} and n=n+1n=n+1 and return to step 4.

Why is this smart? Because it relies on solving a system of equations, Lxn=ynLx_n=y_n, where LL is lower triangular. It is much easier to solve a system of a lower triangular matrix, because we can use forward substitution.

Consider the equation

Lx=y[a1100a21a220an1an2ann][x1x2xn]=[y1y2yn]Lx = y \Leftrightarrow \begin{bmatrix} a_{11} & 0 & \cdots & 0\\ a_{21} & a_{22} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} %\cdot \begin{bmatrix} x_1 \\ x_2 \\ \vdots\\ x_n \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots\\ y_n \\ \end{bmatrix}

Note: Solving directly by forward substitution:

x1=y1a11x_1 = \frac{y_1}{a_{11}}

Using x1x_1 one can find x2x_2

x2=(y2a21x1)a22x_2 = \frac{(y_2 - a_{21} x_1)}{a_{22}}

x3=(y3a31x1a32x2)a33x_3 = \frac{(y_3 - a_{31} x_1 - a_{32} x_2)}{a_{33}}

etc.

Apply Gauss-Seidel:

[ ]
x0 = np.array([1,1,1])
x =  numecon_linalg.gauss_seidel(A,b,x0)
print('solution',x)

Note: Convergence is not ensured unless the matrix is diagonally dominant or symmetric and positive definite.

[ ]
x =  numecon_linalg.gauss_seidel(A,b,x0,do_print=True)

1.4 Scipy functions

Option 1: Use .solve() (scipy chooses what happens).

[ ]
x1 = linalg.solve(A, b)
print(x1)
assert np.all(A@x1 == b)

Option 2: Compute .inv() first and then solve.

[ ]
Ainv = linalg.inv(A)
x2 = Ainv@b
print(x2)

Note: Computing the inverse is normally not a good idea due to numerical stability.

Option 3: Compute LU decomposition and then solve.

[ ]
LU,piv = linalg.lu_factor(A) # decomposition (factorization)
x3 = linalg.lu_solve((LU,piv),b)
print(x3)

Detail: piv contains information on a numerical stable reordering.

1.5 Comparisons

  1. linalg.solve() is the best choice for solving once.
  2. linalg.lu_solve() is the best choice when solving for multipe bb's for a fixed AA (the LU decomposition only needs to be done once).
  3. Gauss-Seidel is an alternative when e.g. only an approximate solution is needed.

1.6 Details on LU factorization (+)

When AA is regular (invertible), we can decompose it into a lower unit triangular matrix, LL, and an upper triangular matrix, UU:

A=[a11a12a1na21a22a2nan1an2ann]=LU=[100l2110ln1ln21][u11u12u1n0u22u2n00unn]A= \begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} = L\cdot U = \begin{bmatrix}1 & 0 & \cdots & 0\\ l_{21} & 1 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ l_{n1} & l_{n2} & \cdots & 1 \end{bmatrix}\cdot\begin{bmatrix}u_{11} & u_{12} & \cdots & u_{1n}\\ 0 & u_{22} & \cdots & u_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & u_{nn} \end{bmatrix}

By starting with the top row of the upperupper part and the first columncolumn of the lower part, we can work our way through using the definition of a matrix product. Note that we actively use the fact that there are 1s on the diagonal of the lower matrix.

if i=1:u1j=a1j(top row is identical to that of A)j=1:li1=ai1u11elseuij=aijk=1i1ukjliklij=1ujj(aijk=1j1ukjlik)\begin{aligned} \textrm{if }&\\ &i = 1: \:\: u_{1j} = a_{1j} \:\:\:\: (\textit{top row is identical to that of A})\\ &j = 1: \:\: l_{i1} = \frac{a_{i1}}{u_{11}} \\ \textrm{else}&\\ &u_{ij} = a_{ij} - \sum_{k=1}^{i-1} u_{kj} l_{ik} \\ %\textrm{else }\:\: &l_{ij} = \frac{1}{u_{jj}} \big( a_{ij} - \sum_{k=1}^{j-1} u_{kj} l_{ik} \big) \end{aligned}

You can therefore get LL and UU by following steps:

  • First obtain row 1 of UU. It's equal to row 1 of AA.
  • Then get column 1 of LL.
  • This will allow you to get u2,2u_{2,2}
  • Based on u1,2,u2,1,u2,2,l1,1,l1,2u_{1,2},u_{2,1},u_{2,2},l_{1,1},l_{1,2} you can get l3,2l_{3,2}.
  • Keep working out subsequent uiju_{ij} and lijl_{ij} based on above formulas.

The factorization implies that the equation system can be written

Ax=L(Ux)=bAx = L(Ux) = b

Algorithm: lu_solve()
3 steps and you are done: 1. Perform LU decomposition (factorization) 2. Solve for yy in Ly=bLy = b (using forward substitution - since we know that y=Uxy = Ux, see above) 3. Solve for xx in Ux=yUx = y (using backward substitution ~ going from bottom to top)

[ ]
L,U = numecon_linalg.lu_decomposition(A) # step 1
y = numecon_linalg.solve_with_forward_substitution(L,b) # step 2
x = numecon_linalg.solve_with_backward_substitution(U,y) # step 3
print('A:\n',A)
print('L:\n',L)
print('\nU:\n',U)
print('\nsolution:',x)

Relation to scipy:

  1. Scipy use pivoting to improve numerical stability.
  2. Scipy is implemented much much better than here.

1.7 Sparse matrices (+)

Sparse matrix:
You may sometimes deal with a matrix that is really large, but most of the entries are 0s. Not uncommon in econometrics or large systems of equations.

In that case, you can save a lot of memory by removing all the 0s from memory and let Python only worry about the non-zero numbers. That of course requires a whole new set of routines for matrix operations, since the elements are no longer contiguous.

It can be worth it if about 70% of your matrix is just 0s.

scipy.linalg allows you to solve systems of equations with sparse matrices.

Documentation: basics + linear algebra

Create a sparse matrix, where most elements are on the diagonal:

[ ]
from scipy import sparse
import scipy.sparse.linalg

S = sparse.lil_matrix((1000, 1000)) # 1000x1000 matrix with zeroes
S.setdiag(np.random.rand(1000)) # some values on the diagonal
S[200, :100] = np.random.rand(100) # some values in a row
S[200:210, 100:200] = S[200, :100] # and the same value in some other rows

Create a plot of the values in the matrix:

[ ]
S_np = S.toarray() # conversion to numpy
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.matshow(S_np,cmap=plt.cm.binary);

Solve it in four different ways:

  1. Like it was not sparse
  2. Using the sparsity
  3. Using the sparsity + explicit factorization
  4. Iterative solver (similar to Gauss-Seidel)
[ ]
# just use some random numbers for right hand side of equation
k = np.random.rand(1000) 

# a. solve 
t0 = time.time()
x = linalg.solve(S_np,k)
print(f'{"solve":12s}: {time.time()-t0:.5f} secs')

# b. solve using sparse matrix algebra in spsolve
t0 = time.time()
x_alt = sparse.linalg.spsolve(S.tocsr(), k)
print(f'{"spsolve":12s}: {time.time()-t0:.5f} secs')
assert np.allclose(x,x_alt)
      
# c. solve with explicit factorization
t0 = time.time()
S_solver = sparse.linalg.factorized(S.tocsc())
x_alt = S_solver(k)
print(f'{"factorized":12s}: {time.time()-t0:.5f} secs')
assert np.allclose(x,x_alt)
      
# d. solve with iterative solver (bicgstab)
t0 = time.time()
x_alt,_info = sparse.linalg.bicgstab(S,k,x0=1.001*x,tol=10**(-8))
print(f'{"bicgstab":12s}: {time.time()-t0:.5f} secs')
assert np.allclose(x,x_alt),x-x_alt

Conclusion:

  1. Using the sparsity can be very important.
  2. Iterative solvers can be very very slow.

2. Non-linear equations - one dimensional

2.1 Introduction

In economics, we really like setting First Order Conditions to 0. We do it every time we want to solve for optimal behavior..

Considering the FOCs as separate equations, finding optimal behavior is the same as finding the root of the FOCs.

Therefore, we often want to solve non-linear equations on the form,

f(x)=0,xRf(x) = 0, x \in \mathbb{R}

There are 2 ways of going about:

  • Using a derivative based method Is more robust and takes fever steps.
  • Derivative free methods Less numerically stable, but does not require a gradient, so often easier to implement.

Using derivate based methods is general preferable. BUT: formulating the gradient of a very complex function can be impossible and then derivative free methods is the way out.

A simple example of a function for our root finding:

f(x)=x3+2x2+4x+30f(x) = -x^3 + 2x^2 + 4x + 30

2.2 Derivative based methods

Newton methods:
Basic assumption: you know the function value and derivatives at a point x0x_0.

We now want to know what point xx^* gives a root of the function!

Importantly since we are looking for a root, we know that f(x)=0f(x^*) = 0

We use a first order Taylor approximation of the function at a new point xkx_k to search for the root of ff:

f(xk)f(x0)+f(x0)(xkx0)f(x_k) \approx f(x_0) + f^{\prime}(x_0)(x_k-x_0)

implying that if xkx_k is the root, we have

f(xk)=0xk=x0f(x0)f(x0)f(x_k) = 0 \Leftrightarrow x_k = x_0 - \frac{f(x_0)}{f^{\prime}(x_0)}

This is called Newton's method.

You can think of it as an operator on xx with respect to ff used to find the nearest root of ff.

Let's call the operator Nf\mathcal{N}_f. If our current guess of a root to ff is xkx_k, we can get a new guess xk+1x_{k+1} by applying Nf(xk)\mathcal{N}_f(x_k)

  • x1=Nf(x0)=x0f(x0)f(x0)x_1 = \mathcal{N}_f(x_0) = x_0 - \frac{f(x_0)}{f^{\prime}(x_0)}
  • x2=Nf(x1)x_2 = \mathcal{N}_f(x_1)
  • x3=Nf(x2)x_3 = \mathcal{N}_f(x_2)
  • ...

We have found a root when f(xk)<ϵ|f(x_{k})| < \epsilon which implies that the consecutive guesses also will become very close: xk+1xk<ϵ|x_{k+1}-x_k| < \epsilon'.

An alternative is Halleys method (see derivation), which uses

xk=x0f(x0)f(x0)[1f(x0)f(x0)f(x0)2f(x0)]1:=Hf(x0)x_k = x_0 - \frac{f(x_0)}{f^{\prime}(x_0)} \Big[ 1-\frac{f(x_0)}{f^{\prime}(x_0)}\frac{f^{\prime\prime}(x_0)}{2f^{\prime}(x_0)} \Big]^{-1} := \mathcal{H}_f(x_0)

making use of information from the second derivative. Note that if the second derivative is close to 0, Halley's method collapses into Newton's.

We denote this operator by Hf(xk)\mathcal{H}_f(x_k)

Algorithm: find_root()

  1. Choose tolerance ϵ>0\epsilon > 0, guess on x0x_0 and set k=0k = 0.
  2. Calculate f(xk)f(x_k) and f(xk)f^{\prime}(x_k). Also calculate f(xk)f^{\prime\prime}(x_k) when using Halley's method.
  3. If f(xk)<ϵ|f(x_k)| < \epsilon then stop.
  4. Calculate new candidate xk+1=Nf(xk)x_{k+1} = \mathcal{N}_f(x_k) when using Newtons.
    Otherwise, calculate xk+1=Hf(xk)x_{k+1} = \mathcal{H}_f(x_k) when using Halleys formula.
  5. Set k=k+1k = k + 1 and return to step 2.
[ ]
def find_root(x0,f,df,d2f=None,method='newton',max_iter=500,tol=1e-8,full_info=False):
    """ find root
        
    Args:
    
        x0 (float): initial value
        f (callable): function
        df (callable): derivative
        d2f (callable): second derivative
        method (str): newton or halley
        max_iter (int): maximum number of iterations
        tol (float): tolerance
        full_info (bool): controls information returned
        
    Returns:
    
        x (float/ndarray): root (if full_info, all x tried)
        i (int): number of iterations used
        fx (ndarray): function values used (if full_info) 
        fpx (ndarray): derivative values used (if full_info)
        fppx (ndarray): second derivative values used (if full_info)
        
    """
    
    # initialize
    xs = []
    fxs = []
    dfxs = []
    d2fxs = []
    
    # iterate
    x = x0    
    i = 0    
    while True:
        
        # step 2: evaluate function and derivatives
        fx = f(x)
        dfx = df(x)
        if method == 'halley':
            d2fx = d2f(x)
        
        # step 3: check convergence
        if abs(fx) < tol or i >= max_iter:
            break
            
        # step 4: update x
        if method == 'newton':
            x_k = x - fx/dfx
        elif method == 'halley':
            a = fx/dfx
            b = a*d2fx/(2*dfx)
            x_k = x - a/(1-b)
        
        # step 5: increment counter
        i += 1
        
        # step 6: store history
        xs.append(x)
        fxs.append(fx)
        dfxs.append(dfx)
        if method == 'halley':
            d2fxs.append(d2fx)
        
        # step 7: apply new guess for x
        x = x_k
        
    # return
    if full_info:
        return np.array(xs),i,np.array(fxs),np.array(dfxs),np.array(d2fxs)
    else:
        return x,i

Note: The cell below contains a function for plotting the convergence.

[ ]
def plot_find_root(x0,f,fp,fpp=None,method='newton',xmin=-8,xmax=8,xn=100, vline = False):
    
    # a. find root and return all information 
    x,max_iter,fx,fpx,fppx = find_root(x0,f,df=fp,d2f=fpp,method=method,full_info=True)
    
    # b. compute function on grid
    xvec = np.linspace(xmin,xmax,xn)
    fxvec = f(xvec)
    
    # c. figure
    def _figure(i):
        
        # i. approximation
        if method == 'newton':
            fapprox = fx[i] + fpx[i]*(xvec-x[i])
        elif method == 'halley':
            fapprox = fx[i] + fpx[i]*(xvec-x[i]) + fppx[i]/2*(xvec-x[i])**2  
            
        # ii. figure
        fig = plt.figure()
        ax = fig.add_subplot(1,1,1)
        
        ax.plot(xvec,fxvec,label='function') # on grid
        ax.plot(x[i],0,'o',color='blue',mfc='none',label='$x_{k}$')# now
        ax.plot(x[i],fx[i],'o',color='black',label='$f(x_k)$') # now       
        ax.plot(xvec,fapprox,label='approximation') # approximation
        
        if vline:
            ax.axvline(x[i+1],ls='--',lw=1,color='black') # cross zero
        
        ax.axvline(0,ls='-',lw=1,color='black') # cross zero
        ax.axhline(0, ls='-',lw=1,color='black')
        #ax.plot(x[i+1],fx[i+1],'o',color='black',mfc='none',label='next')# next
        ax.plot(x[i+1],0,'o',color='green',mfc='none',label='$x_{k+1}$')# next
            
        ax.legend(loc='lower right',facecolor='white',frameon=True)
        ax.set_ylim([fxvec[0],fxvec[-1]])
    
    widgets.interact(_figure,
        i=widgets.IntSlider(description="iterations", min=0, max=max_iter-2, step=1, value=0)
    );

Illustrating rootfinding by Newton's methods

[ ]
f = lambda x: -x**3 + 2*(x**2) + 4*x + 30 
df = lambda x: -3*(x**2) + 4*x + 4
df2 = lambda x: -6*x + 4 
x0 = -5

plot_find_root(x0,f,df)

2.3 Numerical derivative

Sometimes, you might not have the analytical derivative. Then, you can instead use the numerical derivative.

Numerical derivative
Define Δ\Delta to be a small number, then we approximate the derivative by

dfdxf(x+Δ)f(x)Δ \frac{df}{dx} \approx \frac{f(x+\Delta) - f(x)}{\Delta}
[ ]
# a. function
#f = lambda x: 10*x**3 - x**2 -1

# b. numerical derivative (forward)
Δ = 1e-8
fp_approx = lambda x: (f(x+Δ)-f(x))/Δ

# b. find root
x0 = -1.5
x,i = find_root(x0,f,fp_approx,method='newton')
print(f'iterations: {i}, root: {x}, f(x) = {f(x)}')

Question: What happens if you increase the stepsize?

2.4 Another example

[ ]
g = lambda x: np.sin(x)
gp = lambda x: np.cos(x)
gpp = lambda x: -np.sin(x)

x0 = 4.0
plot_find_root(x0,g,gp,gpp,method='newton')

Question: Is the initial value important?

2.5 Derivative free methods: Bisection

Bisection is, like Newton's, an important root finding algorithm that you will find versions of in industry grade software like Matlab. It is not hard to get the idea behind!

Look at the graph below. If f(a)×f(b)<0f(a) \times f(b) < 0 then there must be a root in between aa and bb. Bisection just this idea iteratively together with the idea of using midpoints as in the phone book search from L09.

[118]
xs = np.linspace(0, 6, 100)
ys = f(xs)

fig = plt.figure()
ax = fig.add_subplot(1,1,1)

ax.plot(xs,ys) # on grid
ax.axhline(0, ls='-',lw=1,color='black')
ax.plot(xs[50],ys[50],'o',color='blue',label='$f(a)$') # now
ax.plot(xs[80],ys[80],'o',color='green',label='$f(b)$') # now 
ax.legend(loc='lower left',facecolor='white',frameon=True);
ax.set_ylim(-60,50);
ax.set_xlim(1,6);
ax.grid(b=None)

Algorithm: bisection()

  1. Set a0=aa_0 = a and b0=bb_0 = b where f(a)f(a) and f(b)f(b) has oposite sign, f(a0)f(b0)<0f(a_0)f(b_0)<0
  2. Compute f(m0)f(m_0) where m0=(a0+b0)/2m_0 = (a_0 + b_0)/2 is the midpoint.
  3. Determine the next sub-interval [a1,b1][a_1,b_1]:
  • If f(a0)f(m0)<0f(a_0)f(m_0) < 0 (different signs) then a1=a0a_1 = a_0 and b1=m0b_1 = m_0 (i.e. focus on the range [a0,m0][a_0,m_0]).
  • If f(m0)f(b0)<0f(m_0)f(b_0) < 0 (different signs) then a1=m0a_1 = m_0 and b1=b0b_1 = b_0 (i.e. focus on the range [m0,b0][m_0,b_0]).
  1. Repeat step 2 and step 3 until f(mn)<ϵf(m_n) < \epsilon.
[ ]
def bisection(f,a,b,max_iter=500,tol=1e-6,full_info=False):
    """ bisection
    
    Solve equation f(x) = 0 for a <= x <= b.
    
    Args:
    
        f (callable): function
        a (float): left bound
        b (float): right bound
        max_iter (int): maximum number of iterations
        tol (float): tolerance on solution
        full_info (bool): controls information returned
        
    Returns:
    
        m (float/ndarray): root (if full_info, all x tried)
        i (int): number of iterations used
        a (ndarray): left bounds used
        b (ndarray): right bounds used
        fm (ndarray): funciton values at midpoints
        
    """
    
    # test inputs
    if f(a)*f(b) >= 0:
        print("bisection method fails.")
        return None
    
    # step 1: initialize
    a_l = []
    b_l = []
    m_l = []
    fm_l = []
    
    # step 2-4: main
    i = 0
    while i < max_iter:
        
        # step 2: midpoint and associated value
        m = (a+b)/2
        fm = f(m)
        
        # substep: update the lists of history 
        a_l.append(a)
        b_l.append(b)
        m_l.append(m)
        fm_l.append(fm)
        
        # step 3: determine sub-interval
        if abs(fm) < tol:
            break        
        elif f(a)*fm < 0:
            b = m
        elif f(b)*fm < 0:
            a = m
        else:
            print("bisection method fails.")
            return None
        
        i += 1
        
    if full_info:
        # Returned lists are converted to np.arrays for good measure
        return np.array(m_l), i, np.array(a_l), np.array(b_l), np.array(fm_l)
    else:
        return m,i

Same result as before, but trade-off between more iterations and no evaluation of derivatives.

[ ]
m,i = bisection(f,2,7)
print(i,m,f(m))

Note: The cell below contains a function for plotting the convergence.

[ ]
def plot_bisection(f,a,b,xmin=-8,xmax=8,xn=100):
    
    # a. find root and return all information 
    res = bisection(f,a,b,full_info=True)
    if res == None:
        return
    else:
        m,max_iter,a,b,fm = res
    
    # b. compute function on grid
    xvec = np.linspace(xmin,xmax,xn)
    fxvec = f(xvec)
    
    # c. figure
    def _figure(i):
        
        # ii. figure
        fig = plt.figure()
        ax = fig.add_subplot(1,1,1)
        
        ax.plot(xvec,fxvec) # on grid
        ax.axhline(y = 0, xmin=xmin, xmax=xmax, color = 'black')
        ax.plot(m[i],fm[i],'o',color='black',label='current') # mid
        ax.plot([a[i],b[i]],[fm[i],fm[i]],'--',color='green',label='range') # range
        ax.axvline(a[i],ls='--',color='green')
        ax.axvline(b[i],ls='--',color='green')        
        
        ax.legend(loc='lower right',facecolor='white',frameon=True)
        ax.set_ylim([fxvec[0],fxvec[-1]])
    
    widgets.interact(_figure,
        i=widgets.IntSlider(description="iterations", min=0, max=max_iter-1, step=1, value=0)
    );

plot_bisection(f,-8,8)

Quiz 4

Note: Bisection is not good at the final convergence steps. Generally true for methods not using derivatives.

2.6 Scipy

Scipy, naturally, has better implementations of the above algorithms.

You will most likely want to go for these when doing your own model solution. Made by professionals...

Newton:

[ ]
result = optimize.root_scalar(f,x0=-4,fprime=df,method='newton')
print(result)

Halley:

[ ]
result = optimize.root_scalar(f,x0=-4,fprime=df,fprime2=df2,method='halley')
print(result)

Bisect:

[ ]
result = optimize.root_scalar(f,bracket=[-8,7],method='bisect')
print(result)

The best choice is the more advanced Brent-method:

[ ]
result = optimize.root_scalar(f,bracket=[-8,7],method='brentq')
print(result)

3. Solving non-linear equations (multi-dimensional)

3.1 Introduction

We consider solving non-linear equations on the form,

f(x)=f(x1,x2,,xk)=0,xRkf(\boldsymbol{x}) = f(x_1,x_2,\dots,x_k) = \boldsymbol{0}, \boldsymbol{x} \in \mathbb{R}^k

A specific example is:

h(x)=h(x1,x2)=[h1(x1,x2)h2(x1,x2)]=[x1+0.5(x1x2)31x2+0.5(x1x2)3]R2h(\boldsymbol{x})=h(x_{1,}x_{2})=\begin{bmatrix}h_{1}(x_{1},x_{2})\\ h_{2}(x_{1},x_{2}) \end{bmatrix}=\begin{bmatrix}x_{1}+0.5(x_{1}-x_{2})^{3}-1\\ x_{2}+0.5(x_{1}-x_{2})^{3} \end{bmatrix}\in\mathbb{R}^{2}

where the Jacobian is

h(x)=[h1x1h1x2h2x1h2x2]=[1+1.5(x1x2)21.5(x1x2)21.5(x2x1)21+1.5(x2x1)2]\nabla h(\boldsymbol{x})=\begin{bmatrix}\frac{\partial h_{1}}{\partial x_{1}} & \frac{\partial h_{1}}{\partial x_{2}}\\ \frac{\partial h_{2}}{\partial x_{1}} & \frac{\partial h_{2}}{\partial x_{2}} \end{bmatrix}=\begin{bmatrix}1+1.5(x_{1}-x_{2})^{2} & -1.5(x_{1}-x_{2})^{2}\\ -1.5(x_{2}-x_{1})^{2} & 1+1.5(x_{2}-x_{1})^{2} \end{bmatrix}
[ ]
def h(x):
    y = np.zeros(2)
    y[0] = x[0]+0.5*(x[0]-x[1])**3-1.0
    y[1] = x[1]+0.5*(x[1]-x[0])**3
    return y

def hp(x):
    y = np.zeros((2,2))
    y[0,0] = 1+1.5*(x[0]-x[1])**2
    y[0,1] = -1.5*(x[0]-x[1])**2
    y[1,0] = -1.5*(x[1]-x[0])**2
    y[1,1] = 1+1.5*(x[1]-x[0])**2
    return y

3.2 Newton's method

Solving a multidimensional system of equations follows the exact same strategy as finding the root of a single equation - except we are now working with the Jacobian instead of a single derivative.

Same as Newton's method in one dimension, but with the following update step:

xn+1=xn[h(xn)]1f(xn)\boldsymbol{x}_{n+1} = \boldsymbol{x_n} - [ \nabla h(\boldsymbol{x_n})]^{-1} f(\boldsymbol{x_n})

Notice something important here: [h(xn)]1[ \nabla h(\boldsymbol{x_n})]^{-1} is a matrix inverse! As you saw above, these are costly to perform.

[ ]
def find_root_multidim(x0,f,fp,max_iter=500,tol=1e-8):
    """ find root
        
    Args:
    
        x0 (float): initial value
        f (callable): function
        fp (callable): derivative
        max_iter (int): maximum number of iterations
        tol (float): tolerance
        
    Returns:
    
        x (float): root
        i (int): number of iterations used
        
    """
    
    # initialize
    x = x0
    i = 0
    
    # iterate
    while i < max_iter:
        
        # step 2: function and derivatives
        fx = f(x)
        fpx = fp(x)
        
        # step 3: check convergence
        if max(abs(fx)) < tol:
            break
            
        # step 4: update x - here is where you want to use a sensible algorithm for inversion
        fpx_inv = linalg.inv(fpx)        
        x = x - fpx_inv@fx
        
        # step 5: increment counter
        i += 1
        
    return x,i

Test algorithm:

[ ]
x0 = np.array([0,0])
x,i = find_root_multidim(x0,h,hp)
print(i,x,h(x))

3.3 Using Scipy

You should use profesionally implemented routines for optimizing your models!
There exist a lot of efficient algorithms for finding roots in multiple dimensions. The default scipy choice is something called hybr.

With the Jacobian:

[ ]
result = optimize.root(h,x0,jac=hp)
print(result)
print('\nx =',result.x,', h(x) =',h(result.x))

Without the Jacobian: (numerical derivative)

[ ]
result = optimize.root(h,x0)
print(result)
print('\nx =',result.x,', h(x) =',h(result.x))

4. Solving equations by symbolic math

Just like your old TI calculator and Mathmatica, Python has a module for solving equations symbolically. Which also means solving them exactly. No numerical errors!

4.1 Solve consumer problem

Consider solving the following problem:

maxx1,x2x1αx2β s.t. p1x1+p2x2=I\max_{x_1,x_2} x_1^{\alpha} x_2^{\beta} \text{ s.t. } p_1x_1 + p_2x_2 = I

Define all symbols:

[ ]
x1 = sm.symbols('x_1') # x1 is a Python variable representing the symbol x_1
x2 = sm.symbols('x_2')
alpha = sm.symbols('alpha')
beta = sm.symbols('beta')
p1 = sm.symbols('p_1')
p2 = sm.symbols('p_2')
I = sm.symbols('I')

print('x1 is of type: ', type(x1))

Define objective and budget constraint:

[ ]
# Write out the equation as if it was regular code
objective = x1**alpha*x2**beta
objective
[ ]
# Define the budget constraint as an equality
budget_constraint = sm.Eq(p1*x1+p2*x2,I)
budget_constraint

Solve in four steps:

  1. Isolate x2x_2 from the budget constraint
  2. Substitute in x2x_2
  3. Take the derivative wrt. x1x_1
  4. Solve the FOC for x1x_1

Step 1: Isolate

[ ]
# Isolate x2 on LHS
x2_from_con = sm.solve(budget_constraint, x2)
x2_from_con[0]

Step 2: Substitute

[ ]
objective_subs = objective.subs(x2, x2_from_con[0])
objective_subs

Step 3: Take the derivative

[ ]
foc = sm.diff(objective_subs, x1)
foc

Step 4: Solve the FOC

[ ]
sol = sm.solve(sm.Eq(foc,0), x1)
sol[0]

An alternative is sm.solveset(), which will be the default in the future, but it is still a bit immature in my view.

Task: Solve the consumer problem with quasi-linear preferences,

maxx1,x2x1+γx2 s.t. p1x1+p2x2=I\max_{x_1,x_2} \sqrt{x_1} + \gamma x_2 \text{ s.t. } p_1x_1 + p_2x_2 = I

[ ]
# write your code here        
[ ]
gamma = sm.symbols('gamma')
objective_alt = sm.sqrt(x1) + gamma*x2
objective_alt_subs = objective_alt.subs(x2,x2_from_con[0])
foc_alt = sm.diff(objective_alt_subs,x1)
sol_alt = sm.solve(foc_alt,x1)
sol_alt[0]

4.2 Use solution

LaTex: Print in LaTex format:

[ ]
print(sm.latex(sol[0]))

Turn solution into Python function

Sympy can do a fantastic trick!

Once you have the solution of your equation, this can be turned into a Python function. Thus you can use the solution on arrays. It's called lambdification (think "lambda functions").

[ ]
# Simple example. 1st element of lambdify: a tuple of symbols to be used. 2nd element: the expression used on the symbols.  
x = sm.symbols('x')
x_square = sm.lambdify(args = (x), expr = x**2)
x_square(12)
[ ]
# Create a function out of the solution by providing the "expression" you want (ie the solution) and the inputs to the expression in a tuple. 
sol_func = sm.lambdify(args = (p1, I, alpha, beta), expr = sol[0])

# Run solution. DO NOT overwrite the SYMBOLS (I,alpha,beta) with numeric data
p1_vec = np.array([1.2,3,5,9])
I_val = 10
alpha_val = 0.5
beta_val = 0.5

# Run solution function with vector of prices
demand_p1 = sol_func(p1_vec, I_val, alpha_val, beta_val)

for d in demand_p1:
    print(f'demand: {d:1.3f}')

Analyzing properties of the solution (expression)

Is demand always positive?

Give the computer the information we have. I.e. that p1p_1, p2p_2, α\alpha, β\beta, II are all strictly positive:

[ ]
for var in [p1,p2,alpha,beta,I]:
    sm.assumptions.assume.global_assumptions.add(sm.Q.positive(var)) # var is always positive
sm.assumptions.assume.global_assumptions    

Ask the computer a question:

[ ]
answer = sm.ask(sm.Q.positive(sol[0]))
print(answer)

We need the assumption that p1>0p_1 > 0:

[ ]
sm.assumptions.assume.global_assumptions.remove(sm.Q.positive(p1))
answer = sm.ask(sm.Q.positive(sol[0]))
print(answer)

To clear all assumptions we can use:

[ ]
sm.assumptions.assume.global_assumptions.clear()

4.3 More features of symbolic math (mixed goodies)

[ ]
x = sm.symbols('x')

Derivatives: Higher order derivatives are also available

[ ]
sm.Derivative('x**4',x,x, evaluate=True)

Alternatively,

[ ]
sm.diff('x**4',x,x)
[ ]
expr = sm.Derivative('x**4',x,x)
expr.doit()
[ ]
# With two variables
y = sm.symbols('y')
sm.diff('(x**2)*log(y)*exp(y)',x,y)

Integrals:

[ ]
sm.Integral(sm.exp(-x), (x, 0, sm.oo))
[ ]
sm.integrate(sm.exp(-x), (x, 0, sm.oo))

Limits:

[ ]
c = sm.symbols('c')
rho = sm.symbols('rho')

# Write up the definition of your limit
sm.Limit((c**(1-rho)-1)/(1-rho),rho,1)
[ ]
# Evaluate the limit
sm.limit((c**(1-rho)-1)/(1-rho),rho,1)

Integers:

[ ]
X = sm.Integer(7)/sm.Integer(3)
Y = sm.Integer(3)/sm.Integer(8)
display(X)
display(Y)
Z = 3
(X*Y)**Z

Simplify:

[ ]
expr = sm.sin(x)**2 + sm.cos(x)**2
display(expr)
[ ]
sm.simplify(expr)

Solve multiple equations at once:

[ ]
x = sm.symbols('x')
y = sm.symbols('y')
Eq1 = sm.Eq(x**2+y-2,0)
Eq2 = sm.Eq(y**2-4,0)
display(Eq1)
display(Eq2)
[ ]
# Solve the system
sol = sm.solve([Eq1,Eq2],[x,y])

# print all solutions
for xy in sol:
    print(f'(x,y) = ({xy[0]},{xy[1]})')

4.4 Solving matrix equations symbolically (+)

Ax=bAx = b

Construct a symbolic matrix:

[ ]
A_sm = numecon_linalg.construct_sympy_matrix(['11','12','21','22','32','33']) # somewhat complicated function
A_sm

Find the inverse symbolically:

[ ]
A_sm_inv = A_sm.inv()
A_sm_inv

Fill in the numeric values:

[ ]
A_inv_num = numecon_linalg.fill_sympy_matrix(A_sm_inv,A) # somewhat complicated function
x = A_inv_num@b
print('solution:',x)

Note: The inverse multiplied by the determinant looks nicer...

[ ]
A_sm_det = A_sm.det()
A_sm_det
[ ]
A_sm_inv_raw = sm.simplify(A_sm_inv*A_sm_det)
A_sm_inv_raw

4.5 Newtons method and symbolic math

[ ]
# Another use case of our symbolic math
x = sm.symbols('x')
func = -x**3 + 2*x**2 + 4*x + 30
dfunc = sm.diff(func, x)
d2func = sm.diff(dfunc, x)

display(func)
display(dfunc)
display(d2func)
[ ]
# Lambdify
f = sm.lambdify((x), func)
df = sm.lambdify((x), dfunc)
d2f = sm.lambdify((x), d2func)
[ ]
x, i = find_root(-2,f,df,method='newton', full_info=False)
print(f'Iterations: {i}, root = {x}')

Notice how the flat region tricks both Newton's and Halley's methods.
Especially Halley's method does better if it is started to the right of the root rather than to the left.

[ ]
plot_find_root(8,f,df,method='newton')
[ ]
x,i = find_root(-5,f,df,d2f,method='halley')
print(i,x,f(x))
[ ]
plot_find_root(-2,f,df,d2f,method='halley', vline='True')

Sympy can actually tell us that there are many solutions to an equation:

[ ]
x = sm.symbols('x')
sm.solveset(sm.sin(x),)

5. Summary

This lecture:

  1. Solving matrix equations (directly, decomposition, iterative)
  2. Symbollic solutions (substitution, derivative, solution)
  3. Root-finding (one dimension, multiple dimensions, Newton's method, biscetion)

Your work: Play around with the code in this notebook before solving the problem set. Especially, try out the various scipy functions used.

Next lecture: Numerical optimization.