Posted on January 15, 2016
In a previous post we numerically solved the heat equation with an explicit finite difference formulation, one of the simpler 2D problems in numerical analysis.
In this post we are going to speed up that computation with some vectorization. Results show we can more than double the speed of our code with some simple changes (but also that we can make it dangerously slow if implemented incorrectly).
The code examples below are in MATLAB, but you can take advantage of vectorization with other interpreted languages such as Python as well. You can read more about vectorization of MATLAB code here.
Step 1: Initialize VariablesNot much changes in this section except that we will replace nx and ny with np and use a square mesh for simplicity.
%Initialize Variables dt=.005; %time step nts=500; %number of time steps alpha=.01; %thermal diffusivity l=1; %reference length plate np=50; %number of points (x and y) dx=l/(np-1); dy=dx; %discretized distance %Temperature and derivatives stored in 2D arrays u = zeros(np); uxx = zeros(np); uyy = zeros(np);Step 2: Apply Boundary Conditions
In the following sections we will use snippets outlined in yellow to show the original code, green to show standard vectorization, and red to show sliced vectorization. In the original code (below) we looped through the two edges to apply the boundary conditions
%Boundary Conditions, original loop for i=1:np u(1,i)=1; end for i=1:np u(i,np)=1; end
This is a one-time computation so there isn’t significant time saved by changing this, but regardless we can do better by both simplifying and speeding up our code.
%Boundary Conditions, % Vectorization without slicing u(1,:)=u(1,:)+1; u(:,np)=u(:,np)+1; u(1,np)=1;
We had to correct the value of at the coordinate $(1,np)$ because are adding to it twice. So why didn’t we just take a slice of the array, like 2:np? Here is what it looks like with sliced vectorization
%Boundary Conditions, vectorization with slicing part=2:np; u(1,:)=u(1,:)+1; u(part,np)=u(part,np)+1;
This looks a little nicer, but actually has a significant impact on run time. Results of a benchmark comparison with 10,000,000 test cases is shown in the table below
Original loop | 13.46 s |
Vectorization with slicing | 33.29 s |
Vectorization without slicing | 10.29 s |
So while sliced vectorization looks nice and doesn't require a correction, it more than triples execution time.
Step 3: Time LoopThe more important changes come with the code inside of the time loop, so let's shift our focus there. Here is the original code.
% Original Loop for t=1:nts %Calc the Spatial 2nd Derivative, Centered in Space for i=2:np-1 for j=2:np-1 uxx(i,j)=(u(i+1,j)-2*u(i,j)+u(i-1,j))/(dx)^2; uyy(i,j)=(u(i,j+1)-2*u(i,j)+u(i,j-1))/(dy)^2; end end %Update Temperature, Forwards in Time for i=2:np-1 for j=2:np-1 u(i,j)=u(i,j)+alpha*(uxx(i,j)+uyy(i,j))*dt; end end end
We can simplify this code greatly with vectorization. The code below requires corrections, but improves run time drastically.
% Vectorization without slicing for t=1:nts %Calc the Spatial 2nd Derivative, Centered in Space for i=2:np-1 uxx(i,:)=(u(i+1,:)-2*u(i,:)+u(i-1,:))/(dx)^2; uyy(:,i)=(u(:,i+1)-2*u(:,i)+u(:,i-1))/(dy)^2; end uxx(2,1)=0; %correction uyy(np,np-1)=0; %correction % Note, we do not need to correct other BC points since the uxx,uyy=0 - it would be safer to tho %Update Temperature, Forwards in Time u=u+alpha*(uxx+uyy)*dt; end
Note that we had to once again make a correction after the vectorized calculations. Here is what it looks like with sliced vectorization that wouldn't require corrections
% Vectorization with slicing part=2:np-1; for t=1:nts %Calc the Spatial 2nd Derivative, Centered in Space for i=2:np-1 uxx(i,part)=(u(i+1,part)-2*u(i,part)+u(i-1,part))/(dx)^2; uyy(part,i)=(u(part,i+1)-2*u(part,i)+u(part,i-1))/(dy)^2; end %Update Temperature, Forwards in Time u=u+alpha*(uxx+uyy)*dt; end
Once again this looks much nicer but as we see below, the slices are once again much slower.
The comparison of all three methods run 30 times with 5000 time steps each are shown in the following table.
Original loop | 25.48 s |
Vectorization with slicing | 84.31 s |
Vectorization without slicing | 10.68 s |
With properly implemented vectorization we can drastically speed up our code, but as the results show it must be done carefully and tested to ensure both accuracy (by making the proper corrections) and speed.