Getting Started with Dolphin
Manipulating dolphin.dtype
:
dolphin.dtype
is a class that represents the data type of a
dolphin.darray
object. It is similar to numpy’s dtype. It is used
to create a gate between numpy types and cuda types. It currently supports
the following operations :
Creating dolphin.dtype
:
There are several ways to create a dolphin.dtype
object :
import dolphin as dp
import numpy as np
d = dp.dtype.float32
print(d) # float
# Create a dtype from a numpy dtype
d = dp.dtype.from_numpy_dtype(np.float32)
Manipulating dolphin.darray
:
Creating dolphin.darray
:
There are several ways to create a dolphin.darray
object :
import dolphin as dp
import numpy as np
# Create a darray from a numpy array
a = np.arange(10).astype(np.float32)
d = dp.darray(array=a)
# Create a zero-filled darray
d = dp.zeros(shape=(10,), dtype=dp.float32)
# Create an empty darray
d = dp.empty(shape=(10,), dtype=dp.float32)
# or
d = dp.darray(shape=(10,), dtype=dp.float32)
# Create a zeros darray like another
d = dp.zeros_like(d)
# Create an empty darray like another
d = dp.empty_like(d)
# Create a ones darray
d = dp.ones(shape=(10,), dtype=dp.float32)
# Create a ones darray like another
d = dp.ones_like(d)
Numpy-Dolphin interoperability :
You can convert a dolphin.darray
object to a numpy array using
the method dolphin.darray.to_numpy()
. You can also convert a numpy
array to a dolphin.darray
object using the function dolphin.from_numpy()
.
import dolphin as dp
import numpy as np
# numpy to darray using dolphin constructor
a = np.arange(10).astype(np.float32)
d = dp.darray(array=a)
# Convert a darray to a numpy array
a = d.to_numpy()
# Convert a numpy array to a darray
# numpy array and darray need to
# have the same dtype and shape.
d = dp.from_numpy(a)
Transpose dolphin.darray
:
Transpose a dolphin.darray
object is easy and works like numpy.
You can use the method dolphin.darray.transpose()
, the shortcut
dolphin.darray.T
or the function dolphin.transpose()
.
import dolphin as dp
d = dp.darray(shape=(4, 3, 2), dtype=dp.float32)
print(d.shape) # (4, 3, 2)
t = d.transpose(1, 0, 2)
print(d.shape) # (3, 4, 2)
# You can also use the shortcut
t = d.T
print(d.shape) # (2, 4, 3)
# Or dp.transpose
t = dp.transpose(src=d, axes=(2, 1, 0))
Cast dolphin.darray
:
As numpy implements astype operation, Dolphin also implements it. You can
use the method dolphin.darray.astype()
. Also, take a look
at dolphin.dtype
to see the supported types.
import dolphin as dp
d = dp.darray(shape=(4, 3, 2), dtype=dp.float32)
print(d.dtype) # float32
d = d.astype(dp.int32)
print(d.dtype) # int32
Indexing dolphin.darray
:
Indexing a dolphin.darray
object is easy and works like numpy.
import dolphin as dp
import numpy as np
n = np.random.rand(10, 10).astype(np.float32)
d = dp.darray(array=n)
d_1 = d[0:5, 0:5]
d_2 = d[5:10, 5:10]
Indexing works in read and write mode :
import dolphin as dp
import numpy as np
d = dp.zeros((4, 4))
d[0:2, 0:2] = 10
d[2:4, 2:4] = 20
print(d)
# array([[10., 10., 0., 0.],
# [10., 10., 0., 0.],
# [ 0., 0., 20., 20.],
# [ 0., 0., 20., 20.]])
Operations with dolphin.darray
:
Dolphin implements several operations with dolphin.darray
objects :
import dolphin as dp
d = dp.zeros((4, 4))
z = dp.ones((4, 4))
# Addition
d = d + z
d += 5
# Subtraction
d = d - z
d -= 5
# Multiplication
d = d * z
d *= 5
# Division
d = d / z
d /= 5
Manipulating dolphin.dimage
:
As dolphin.dimage
is a subclass of dolphin.darray
,
you can use all the methods and functions of dolphin.darray
.
On top of that, dolphin.dimage
implements several methods and
functions to manipulate images as well as image specific attributes.
Creating dolphin.dimage
:
Creating a dolphin.dimage
object is easy and works like
dolphin.darray
. The difference comes from is the argument
dimage_channel_format. This argument is used to specify the channel format
of the image. It has to be dolphin.dimage_channel_format
, by default :
py:attr:dolphin.dimage_channel_format.DOLPHIN_BGR.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
# or
d = dp.dimage(array=image, channel_format=dp.dimage_channel_format.DOLPHIN_BGR)
Resizing dolphin.dimage
:
With Dolphin, you can resize a dolphin.dimage
object using
2 methods dolphin.dimage.resize()
and dolphin.dimage.resize_padding()
.
The first one resizes the image without padding. The second one resizes the image
with padding. The padding is computed to keep the aspect ratio of the image.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
# Resize without padding
a = d.resize((100, 100))
print(a.shape) # (100, 100, 3)
# Resize with padding
b = d.resize_padding((100, 100), padding_value=0)
print(b.shape) # (100, 100, 3)
Normalization dolphin.dimage
:
With Dolphin, you can normalize a dolphin.dimage
object using
the method dolphin.dimage.normalize()
. You have Normalization modes defined
by the Enum class dolphin.dimage_normalize_type
. By default, the mode
is dolphin.dimage_normalize_type.DOLPHIN_255
. This method is optimized.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
# image/255
a = d.normalize(dp.DOLPHIN_255)
# image/127.5 - 1
b = d.normalize(dp.DOLPHIN_TF)
# image - mean/std
c = d.normalize(dp.DOLPHIN_MEAN_STD, mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
Change channel format dolphin.dimage
:
The equivalent of cv2.cvtColor is dolphin.dimage.cvtColor()
which
converts a dolphin.dimage
object from one channel format to another.
The channel formats are defined by the Enum class dolphin.dimage_channel_format
.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
a = dp.dimage.cvtColor(d, dp.dimage_channel_format.DOLPHIN_GRAY_SCALE) # BGR to GRAY
b = dp.dimage.cvtColor(d, dp.dimage_channel_format.DOLPHIN_RGB) # BGR to RGB
Manipulating dolphin.Engine
:
Creating dolphin.Engine
:
dolphin.Engine
is a TensorRT based object. It is used to create, manage
and run TensorRT engines. To create an dolphin.Engine
object, you need
to specify the path to an onnx model or a TensorRT engine. You can also specify
different other arguments in order to customize the engine built.
import dolphin as dp
# Create an engine from an onnx model
engine = dp.Engine(onnx_file_path="your_model.onnx")
# Create an engine from a TensorRT engine
engine = dp.Engine(engine_path="your_engine.trt")
# Create an engine from an onnx model and specify different arguments
engine = dp.Engine(onnx_file_path="your_model.onnx",
engine_path="your_engine.trt",
mode="fp16",
explicit_batch=True,
direct_io=False)
Running a dolphin.Engine
:
Once a dolphin.Engine
is created, you can run it using the method
dolphin.Engine.infer()
. This method takes a dictionary as argument, this dictionary
defines the inputs of the engine. The keys of the dictionary are the names of the inputs of
the engine. The values of the dictionary are dolphin.darray
objects. The method returns
a dictionary with the outputs of the engine or None (see below). The keys of the dictionary are
the names of the outputs of the engine. The values of the dictionary are dolphin.darray
.
dolphin.Engine
implements internally dolphin.CudaTrtBuffers
which are used
to efficiently bufferize the inputs of the engine. The purpose is to memory copy between host and device
and to rather do device to device copies which is faster. By default, calling dolphin.Engine.infer()
will be batch-blocking, meaning that the method will not infer the engine if the buffer is not full, it allows
the user to fill the buffer automatically. You can still force infer with the argument force_infer=True.
Here are some use cases of dolphin.Engine.infer()
.
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 1
input_dict = {
"image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred
print(output) # {"output": darray(shape=(1000,), dtype=float32)}
In case you want to use a batch size greater than 1.
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 2
input_dict = {
"image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict) # batch-blocking
print(output) # None
output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred
print(output) # {"output": darray(shape=(2,1000), dtype=float32)}
# or you can force infer
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 2
input_dict = {
"image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict, force_infer=True) # batch-blocking
print(output) # {"output": darray(shape=(2,1000), dtype=float32)}
You can also use batched inferences.
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 16
input_dict = {
"image": dp.zeros(shape=(16, 224, 224, 3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred
print(output) # {"output": darray(shape=(16,1000), dtype=float32)}
Full example
You can go to the examples folder to see a full example of how to use the library. Here, we will go step by step through Yolov7 inference using Dolphin.
1. Preprocessing
Most of the time, we underestimate the latency of preprocessing and try to find ways to accelerate the inference
part which would make a lot of sense if the bottleneck was indeed the inference time. In reality, in real-time applications,
it often happens that your fps are drastically decreased compared to your expectations due to pre/post processing.
In this example, Yolov7 needs images to be resized using dp.dimage.resize_padding()
method in order to keep the
orginal aspect ratio of the image as well as it needs to be normalized.
A good practice would be to resize your image first before doing any further processings in order to limit
the amount of data processed at a time.
Keep in mind that it is much better to pre-allocate the dp.darray
and dp.dimage
in order not
to perform memory allocation during the core of your application. This is what we will be doing here.
import cv2
import dolphin as dp
stream = cv2.VideoCapture("your_video.mp4")
# We need to know the size of the frame
width = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT))
# As OpenCV reads HWC uint8_t images, we allocate the
# corresponding dp.dimage
d_frame = dp.dimage(shape=(height, width, 3), dtype=dp.uint8)
# Yolov7 is processing directly CHW images, we thus have to
# transpose the array, meaning, pre-allocate where we will
# store the transposed reordered data
transposed_frame = dp.dimage(shape=(3, height, width),
dtype=dp.uint8,
stream=stream)
# We also pre-allocate the image once resized in order
# (640, 640) is the size Yolov7 works with
resized_frame = dp.dimage(shape=(3, 640, 640),
dtype=dp.uint8,
stream=stream)
# Once the image is correctly formatted, meaning :
# 3x640x640 uint8, we need to normalize the image
# between 0<=image<=1. To do so, we need to use
# dp.DOLPHIN_255 flag which will write float32
# data
inference_frame = dp.dimage(shape=(3, 640, 640),
dtype=dp.float32,
stream=stream)
2. Inference
We thus have pre-allocated 18MB to speed up the preprocessing by avoiding on-the-fly allocations. Shall we now go through the inference part of all of this.
# We now instanciate our AI model as a TensorRT engine
engine = dp.Engine("your_model.onnx",
"your_model.engine",
mode="fp16",
verbosity=True)
while(True):
# We copy the OpenCV frame onto the GPU
d_frame.from_numpy(frame)
# We process the frame
# 1. We transpose the frame
d_frame = d_frame.transpose(2, 0, 1)
# 2. We perform padding resize
_, r, dwdh = dp.resize_padding(src=transposed_frame,
shape=(640, 640),
dst=resized_frame)
# 3. We do channel swapping in order to transform
# our BGR image into RGB
dp.cvtColor(src=resized_frame,
color_format=dp.DOLPHIN_RGB,
dst=resized_frame)
# 4. We normalize the frame as described just above
dp.normalize(src=resized_frame,
dst=inference_frame,
normalize_type=dp.DOLPHIN_255)
# 5. We finally infer our model
output = engine.infer({"images": inference_frame})