Speeding up Code with Vectorization


Posted on January 15, 2016


In a previous post we numerically solved the heat equation with an explicit finite difference formulation, one of the simpler 2D problems in numerical analysis.

In this post we are going to speed up that computation with some vectorization. Results show we can more than double the speed of our code with some simple changes (but also that we can make it dangerously slow if implemented incorrectly).

The code examples below are in MATLAB, but you can take advantage of vectorization with other interpreted languages such as Python as well. You can read more about vectorization of MATLAB code here.

Step 1: Initialize Variables

Not much changes in this section except that we will replace nx and ny with np and use a square mesh for simplicity.

%Initialize Variables
dt=.005;                 %time step
nts=500;                 %number of time steps
alpha=.01;               %thermal diffusivity
l=1;                     %reference length plate
np=50;                   %number of points (x and y)
dx=l/(np-1); dy=dx;      %discretized distance

%Temperature and derivatives stored in 2D arrays
u = zeros(np); uxx = zeros(np); uyy = zeros(np);

Step 2: Apply Boundary Conditions

In the following sections we will use snippets outlined in yellow to show the original code, green to show standard vectorization, and red to show sliced vectorization. In the original code (below) we looped through the two edges to apply the boundary conditions

%Boundary Conditions, original loop
for i=1:np
   u(1,i)=1;
end

for i=1:np
   u(i,np)=1;
end

This is a one-time computation so there isn’t significant time saved by changing this, but regardless we can do better by both simplifying and speeding up our code.

%Boundary Conditions, % Vectorization without slicing
u(1,:)=u(1,:)+1;
u(:,np)=u(:,np)+1;
u(1,np)=1;

We had to correct the value of at the coordinate $(1,np)$ because are adding to it twice. So why didn’t we just take a slice of the array, like 2:np? Here is what it looks like with sliced vectorization

%Boundary Conditions, vectorization with slicing
part=2:np;
u(1,:)=u(1,:)+1;
u(part,np)=u(part,np)+1;

This looks a little nicer, but actually has a significant impact on run time. Results of a benchmark comparison with 10,000,000 test cases is shown in the table below

Original loop 13.46 s
Vectorization with slicing 33.29 s
Vectorization without slicing 10.29 s

So while sliced vectorization looks nice and doesn't require a correction, it more than triples execution time.

Step 3: Time Loop

The more important changes come with the code inside of the time loop, so let's shift our focus there. Here is the original code.

% Original Loop
for t=1:nts

   %Calc the Spatial 2nd Derivative, Centered in Space
   for i=2:np-1
      for j=2:np-1
         uxx(i,j)=(u(i+1,j)-2*u(i,j)+u(i-1,j))/(dx)^2;
         uyy(i,j)=(u(i,j+1)-2*u(i,j)+u(i,j-1))/(dy)^2;
      end
   end

   %Update Temperature, Forwards in Time
   for i=2:np-1
      for j=2:np-1
         u(i,j)=u(i,j)+alpha*(uxx(i,j)+uyy(i,j))*dt;
      end
   end
end

We can simplify this code greatly with vectorization. The code below requires corrections, but improves run time drastically.

% Vectorization without slicing
for t=1:nts

   %Calc the Spatial 2nd Derivative, Centered in Space
    for i=2:np-1
        uxx(i,:)=(u(i+1,:)-2*u(i,:)+u(i-1,:))/(dx)^2;
        uyy(:,i)=(u(:,i+1)-2*u(:,i)+u(:,i-1))/(dy)^2;
    end
    uxx(2,1)=0; %correction
    uyy(np,np-1)=0; %correction
    % Note, we do not need to correct other BC points since the uxx,uyy=0 - it would be safer to tho
    
    %Update Temperature, Forwards in Time
    u=u+alpha*(uxx+uyy)*dt;
end

Note that we had to once again make a correction after the vectorized calculations. Here is what it looks like with sliced vectorization that wouldn't require corrections

% Vectorization with slicing
part=2:np-1;
for t=1:nts

    %Calc the Spatial 2nd Derivative, Centered in Space
    for i=2:np-1
        uxx(i,part)=(u(i+1,part)-2*u(i,part)+u(i-1,part))/(dx)^2;
        uyy(part,i)=(u(part,i+1)-2*u(part,i)+u(part,i-1))/(dy)^2;
    end
        
    %Update Temperature, Forwards in Time
    u=u+alpha*(uxx+uyy)*dt;
end

Once again this looks much nicer but as we see below, the slices are once again much slower.

The comparison of all three methods run 30 times with 5000 time steps each are shown in the following table.

Original loop 25.48 s
Vectorization with slicing 84.31 s
Vectorization without slicing 10.68 s

With properly implemented vectorization we can drastically speed up our code, but as the results show it must be done carefully and tested to ensure both accuracy (by making the proper corrections) and speed.