The power of NumPy

Nikos DetsikasNovember 6, 2016Image Processing, OpenCV, Python developmentNo comments

After writing the post on the remap function, I thought of doing the same in Python. That is, to use the remap function to make a waving flag.

It did not take much to rewrite the code in Python. You can find all python versions here.

However, simply translating the C++ code into python, revealed a big problem. The whole thing became unbelievably slow. I mean, ridiculously slow!

The frame rate (it is not a real frame rate, but I like to call it that way since we are doing an animation here) dropped to something like 1 frame every few seconds.

So, what is the problem here? Let’s get a closer look. Here is the python code for reference

#!/Users/me/anaconda2/envs/opencv/bin/python
import cv2
import sys
import numpy as np
import time

def rippleEffect (inputImage, outputImage, mapX, mapY):
    cv2.remap(inputImage, mapX, mapY, cv2.INTER_LINEAR, outputImage, cv2.BORDER_REPLICATE)

if len(sys.argv)!=2:
    print "Usage "+sys.argv[0]+" <filename>"
    sys.exit()

# Read input image
inputImage = cv2.imread(sys.argv[1])
cv2.imshow("Input image", inputImage)
cv2.waitKey(0)

# Create output image
outputImage = np.zeros((inputImage.shape[0], inputImage.shape[1],
                        inputImage.shape[2]), inputImage.dtype)

# Ripple paramters
A = 10.0                        # Ripple magnitute
f = np.pi*2/inputImage.shape[0]     # Ripple frequency
p = 3.0*np.pi/4.0               # Ripple phase
direction = (1.0/5.0)*np.pi/2.0 # Ripple direction

mapX = np.zeros((inputImage.shape[0], inputImage.shape[1]), dtype=np.float32)
mapY = np.zeros((inputImage.shape[0], inputImage.shape[1]), dtype=np.float32)

while(1):
    for m in range(100):
        p = m*5.0*np.pi/100 + np.pi
        begin = time.clock()
        for j in range(inputImage.shape[0]):
            for i in range(inputImage.shape[1]):
                mapX[j,i] = i+A*(i/inputImage.shape[1])*np.sin(f*j+p)*np.cos(direction)
                mapY[j,i] = j+A*np.sin(f*i+p)*np.sin(direction)
        end = time.clock()
        print "Duration of map filling: "+str((end-begin)*1000)+"ms"
        begin2 = time.clock()
        rippleEffect(inputImage, outputImage, mapX, mapY)
        end2 = time.clock()
        print "Duration of ripple application: "+str((end2-begin2)*1000)+"ms"
        cv2.imshow("Output image", outputImage)
        cv2.waitKey(1)

Here are the average time measurements:

map filling: 2.5s
remap function: 1.49ms

The respective C++ measurements are:

map filling: 4.89ms
remap function: 1.02ms

The above measurements are not the most accurate measurements the world has ever seen. However, they are accurate enough to compare the methods across different languages.

It makes sense for the remap function to cost the same, in both C++ and python, as all OpenCV algorithms are implemented in C++ irrespective of the language used (you can have a look here for an explanation).

However, the map filling is about 500 times slower in Python.

And here is where NumPy comes into the picture.

As the introduction of that page says: “Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined.”

Translation: if you are using for loops to access arrays of data, you have to reconsider.

Here is how the above code would have been with NumPy instead of the for loops

#!/Users/me/anaconda2/envs/opencv/bin/python
import cv2
import sys
import numpy as np
import time

def rippleEffect (inputImage, outputImage, mapX, mapY):
    cv2.remap(inputImage, mapX.astype(np.float32), mapY.astype(np.float32), cv2.INTER_LINEAR, outputImage, cv2.BORDER_REPLICATE)

if len(sys.argv)!=2:
    print "Usage "+sys.argv[0]+" <filename>"
    sys.exit()

# Read input image
inputImage = cv2.imread(sys.argv[1])
cv2.imshow("Input image", inputImage)
cv2.waitKey(0)

# Create output image
outputImage = np.zeros((inputImage.shape[0], inputImage.shape[1],
                        inputImage.shape[2]), inputImage.dtype)

# Ripple parameters
A = 10.0                        # Ripple magnitute
omega = np.pi*2/inputImage.shape[0]     # Ripple frequency
p = 3.0*np.pi/4.0               # Ripple phase
direction = (1.0/5.0)*np.pi/2.0 # Ripple direction
rippleFrames = 100

# Integer ranges used to construct the mapping matrices
jrange = np.arange(inputImage.shape[0])
irange = np.arange(inputImage.shape[1])
while(1):
    for m in range(rippleFrames):
        p = m*2.0*np.pi/rippleFrames + np.pi
        begin = time.clock()

        #mapX matrix construction
        mapX1 = np.tile(A*np.cos(direction)*irange/inputImage.shape[1] , (inputImage.shape[0],1))
        mapX2 = np.tile((np.sin(omega*jrange+p)), (inputImage.shape[1],1)).transpose()
        mapX = np.multiply(mapX1, mapX2) + np.tile(irange, (inputImage.shape[0],1))
        
        #mapY matrix construction
        mapY1 = np.tile(A*np.sin(omega*irange+p)*np.sin(direction), (inputImage.shape[0],1))
        mapY2 = np.tile(jrange, (inputImage.shape[1],1)).transpose()
        mapY = mapY1 + mapY2

        end = time.clock()
        print "Duration of map filling: "+str((end-begin)*1000)+"ms"
        
        begin2 = time.clock()
        rippleEffect(inputImage, outputImage, mapX, mapY)
        end2 = time.clock()
        print "Duration of ripple application: "+str((end2-begin2)*1000)+"ms"
        
        cv2.imshow("Output image", outputImage)
        cv2.waitKey(1)

Of course, the external for loop, the one that is looping over the animation frames, cannot go away.

What is important here is lines 38-46. Within these lines, we create the two maps using NumPy functions, instead of looping over their elements.

The above process is also called vectorization or array programming.

Care to see the result? Here it is

map filling: 2.32ms
remap function: 1.49ms

Unbelievable! I would say that it is even faster than C++. I guess it is because NumPy is really optimized for matrix operations and arranges data in an efficient manner. Even if it implements some of its algorithms in C or C++, I am pretty sure it does extra optimizations on top. In any case, looking it up on the web will provide with a lot of info on the subject.

If you are not familiar with the NumPy functions I have used, here is a list of them with references to their documentation:

zeros
pi (constant, the same as in SciPy)
float32 (data type)
arange
tile
cos
sin
multiply
transpose
astype

Explaining how I turned the for loops into NumPy code is beyond the scope of this post. It is easy to observe that whenever we have to face a similar problem, one where data can be stored, processed and accessed in multi-dimensional containers, it is natural for us to think in for loops. Our mind processes it this way. However, with python, not only is it a great exercise but it is an absolute necessity, to turn those loops into NumPy matrix operations.

My implementation above may not be the optimal. Anyone who has a better solution is very welcome to post it in the comments section. I would be glad to time it and post the results.

Finally, it turns out that one post leads to another! While doing all these, I came across the convertMaps OpenCV function, which seems to go hand in hand with remap. I am doing a future post on that as well, as I could not find any examples out there.

Thanks for reading!

The power of NumPy

Nikos Detsikas

Sign up to our newsletter!

Search

Recent Posts

Share this:

Nikos Detsikas

Sign up to our newsletter!

Search

Recent Posts