Welcome to Dolphin’s documentation!
Documentation version 0.0.10
Dolphin
A python package for GPU-accelerated data processing for TensorRT inference. Dolphin notably provides a set of functions to manipulate GPU arrays (dolphin.darray) and images (dolphin.dimage) and TensorRT functions (dolphin.Engine).
Official documentation :
https://dolphin-python.readthedocs.io/en/latest/
This package is strongly relying on the CUDA Python bindings PyCuda, available at https://github.com/inducer/pycuda, https://documen.tician.de/pycuda/. And TensorRT, available at https://developer.nvidia.com/tensorrt.
Installation
Dolphin is available on PyPI. You can install it using pip:
pip install dolphin-python
and import it in your python code using import dolphin
.
Core Concepts
Dolphin is a tool for CUDA-accelerated computing in a deep learning context. It is designed to be used as a library. Its purpose is to gather the most common optimisation techniques for deep learning inference and make them available in a simple and easy to use interface.
Overview
Dolphin provides a set of classes implementing CUDA-accelerated operations.
The base class dolphin.darray
is an object manipulating a CUDA
array. It provides a set of numpy like methods to perform operations on the
array.
The dolphin.dimage
part is the part of the library dedicated to
image processing. It provides a set of methods to manipulate images which are
the ones providing the most speed up compared to common CPU implementations.
Simplicity
Dolphin is meant to provide a simple and easy to use interface for deep learning inference application, centralizing the most common optimisation techniques in a single library in order to have an easy-to-use optimized library.
Disclaimer
This library is currently under development. The API might not be stable yet. Some features might be missing, some might be broken, some might be optimized. You are vert welcome to contribute to this project. Be kind, be constructive, be open.
Getting Started with Dolphin
Manipulating dolphin.dtype
:
dolphin.dtype
is a class that represents the data type of a
dolphin.darray
object. It is similar to numpy’s dtype. It is used
to create a gate between numpy types and cuda types. It currently supports
the following operations :
Creating dolphin.dtype
:
There are several ways to create a dolphin.dtype
object :
import dolphin as dp
import numpy as np
d = dp.dtype.float32
print(d) # float
# Create a dtype from a numpy dtype
d = dp.dtype.from_numpy_dtype(np.float32)
Manipulating dolphin.darray
:
Creating dolphin.darray
:
There are several ways to create a dolphin.darray
object :
import dolphin as dp
import numpy as np
# Create a darray from a numpy array
a = np.arange(10).astype(np.float32)
d = dp.darray(array=a)
# Create a zero-filled darray
d = dp.zeros(shape=(10,), dtype=dp.float32)
# Create an empty darray
d = dp.empty(shape=(10,), dtype=dp.float32)
# or
d = dp.darray(shape=(10,), dtype=dp.float32)
# Create a zeros darray like another
d = dp.zeros_like(d)
# Create an empty darray like another
d = dp.empty_like(d)
# Create a ones darray
d = dp.ones(shape=(10,), dtype=dp.float32)
# Create a ones darray like another
d = dp.ones_like(d)
Numpy-Dolphin interoperability :
You can convert a dolphin.darray
object to a numpy array using
the method dolphin.darray.to_numpy()
. You can also convert a numpy
array to a dolphin.darray
object using the function dolphin.from_numpy()
.
import dolphin as dp
import numpy as np
# numpy to darray using dolphin constructor
a = np.arange(10).astype(np.float32)
d = dp.darray(array=a)
# Convert a darray to a numpy array
a = d.to_numpy()
# Convert a numpy array to a darray
# numpy array and darray need to
# have the same dtype and shape.
d = dp.from_numpy(a)
Transpose dolphin.darray
:
Transpose a dolphin.darray
object is easy and works like numpy.
You can use the method dolphin.darray.transpose()
, the shortcut
dolphin.darray.T
or the function dolphin.transpose()
.
import dolphin as dp
d = dp.darray(shape=(4, 3, 2), dtype=dp.float32)
print(d.shape) # (4, 3, 2)
t = d.transpose(1, 0, 2)
print(d.shape) # (3, 4, 2)
# You can also use the shortcut
t = d.T
print(d.shape) # (2, 4, 3)
# Or dp.transpose
t = dp.transpose(src=d, axes=(2, 1, 0))
Cast dolphin.darray
:
As numpy implements astype operation, Dolphin also implements it. You can
use the method dolphin.darray.astype()
. Also, take a look
at dolphin.dtype
to see the supported types.
import dolphin as dp
d = dp.darray(shape=(4, 3, 2), dtype=dp.float32)
print(d.dtype) # float32
d = d.astype(dp.int32)
print(d.dtype) # int32
Indexing dolphin.darray
:
Indexing a dolphin.darray
object is easy and works like numpy.
import dolphin as dp
import numpy as np
n = np.random.rand(10, 10).astype(np.float32)
d = dp.darray(array=n)
d_1 = d[0:5, 0:5]
d_2 = d[5:10, 5:10]
Indexing works in read and write mode :
import dolphin as dp
import numpy as np
d = dp.zeros((4, 4))
d[0:2, 0:2] = 10
d[2:4, 2:4] = 20
print(d)
# array([[10., 10., 0., 0.],
# [10., 10., 0., 0.],
# [ 0., 0., 20., 20.],
# [ 0., 0., 20., 20.]])
Operations with dolphin.darray
:
Dolphin implements several operations with dolphin.darray
objects :
import dolphin as dp
d = dp.zeros((4, 4))
z = dp.ones((4, 4))
# Addition
d = d + z
d += 5
# Subtraction
d = d - z
d -= 5
# Multiplication
d = d * z
d *= 5
# Division
d = d / z
d /= 5
Manipulating dolphin.dimage
:
As dolphin.dimage
is a subclass of dolphin.darray
,
you can use all the methods and functions of dolphin.darray
.
On top of that, dolphin.dimage
implements several methods and
functions to manipulate images as well as image specific attributes.
Creating dolphin.dimage
:
Creating a dolphin.dimage
object is easy and works like
dolphin.darray
. The difference comes from is the argument
dimage_channel_format. This argument is used to specify the channel format
of the image. It has to be dolphin.dimage_channel_format
, by default :
py:attr:dolphin.dimage_channel_format.DOLPHIN_BGR.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
# or
d = dp.dimage(array=image, channel_format=dp.dimage_channel_format.DOLPHIN_BGR)
Resizing dolphin.dimage
:
With Dolphin, you can resize a dolphin.dimage
object using
2 methods dolphin.dimage.resize()
and dolphin.dimage.resize_padding()
.
The first one resizes the image without padding. The second one resizes the image
with padding. The padding is computed to keep the aspect ratio of the image.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
# Resize without padding
a = d.resize((100, 100))
print(a.shape) # (100, 100, 3)
# Resize with padding
b = d.resize_padding((100, 100), padding_value=0)
print(b.shape) # (100, 100, 3)
Normalization dolphin.dimage
:
With Dolphin, you can normalize a dolphin.dimage
object using
the method dolphin.dimage.normalize()
. You have Normalization modes defined
by the Enum class dolphin.dimage_normalize_type
. By default, the mode
is dolphin.dimage_normalize_type.DOLPHIN_255
. This method is optimized.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
# image/255
a = d.normalize(dp.DOLPHIN_255)
# image/127.5 - 1
b = d.normalize(dp.DOLPHIN_TF)
# image - mean/std
c = d.normalize(dp.DOLPHIN_MEAN_STD, mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
Change channel format dolphin.dimage
:
The equivalent of cv2.cvtColor is dolphin.dimage.cvtColor()
which
converts a dolphin.dimage
object from one channel format to another.
The channel formats are defined by the Enum class dolphin.dimage_channel_format
.
import dolphin as dp
import cv2
image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)
a = dp.dimage.cvtColor(d, dp.dimage_channel_format.DOLPHIN_GRAY_SCALE) # BGR to GRAY
b = dp.dimage.cvtColor(d, dp.dimage_channel_format.DOLPHIN_RGB) # BGR to RGB
Manipulating dolphin.Engine
:
Creating dolphin.Engine
:
dolphin.Engine
is a TensorRT based object. It is used to create, manage
and run TensorRT engines. To create an dolphin.Engine
object, you need
to specify the path to an onnx model or a TensorRT engine. You can also specify
different other arguments in order to customize the engine built.
import dolphin as dp
# Create an engine from an onnx model
engine = dp.Engine(onnx_file_path="your_model.onnx")
# Create an engine from a TensorRT engine
engine = dp.Engine(engine_path="your_engine.trt")
# Create an engine from an onnx model and specify different arguments
engine = dp.Engine(onnx_file_path="your_model.onnx",
engine_path="your_engine.trt",
mode="fp16",
explicit_batch=True,
direct_io=False)
Running a dolphin.Engine
:
Once a dolphin.Engine
is created, you can run it using the method
dolphin.Engine.infer()
. This method takes a dictionary as argument, this dictionary
defines the inputs of the engine. The keys of the dictionary are the names of the inputs of
the engine. The values of the dictionary are dolphin.darray
objects. The method returns
a dictionary with the outputs of the engine or None (see below). The keys of the dictionary are
the names of the outputs of the engine. The values of the dictionary are dolphin.darray
.
dolphin.Engine
implements internally dolphin.CudaTrtBuffers
which are used
to efficiently bufferize the inputs of the engine. The purpose is to memory copy between host and device
and to rather do device to device copies which is faster. By default, calling dolphin.Engine.infer()
will be batch-blocking, meaning that the method will not infer the engine if the buffer is not full, it allows
the user to fill the buffer automatically. You can still force infer with the argument force_infer=True.
Here are some use cases of dolphin.Engine.infer()
.
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 1
input_dict = {
"image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred
print(output) # {"output": darray(shape=(1000,), dtype=float32)}
In case you want to use a batch size greater than 1.
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 2
input_dict = {
"image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict) # batch-blocking
print(output) # None
output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred
print(output) # {"output": darray(shape=(2,1000), dtype=float32)}
# or you can force infer
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 2
input_dict = {
"image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict, force_infer=True) # batch-blocking
print(output) # {"output": darray(shape=(2,1000), dtype=float32)}
You can also use batched inferences.
import dolphin as dp
engine = dp.Engine(engine_path="your_engine.trt") # batch size = 16
input_dict = {
"image": dp.zeros(shape=(16, 224, 224, 3), dtype=dp.float32)
}
output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred
print(output) # {"output": darray(shape=(16,1000), dtype=float32)}
Full example
You can go to the examples folder to see a full example of how to use the library. Here, we will go step by step through Yolov7 inference using Dolphin.
1. Preprocessing
Most of the time, we underestimate the latency of preprocessing and try to find ways to accelerate the inference
part which would make a lot of sense if the bottleneck was indeed the inference time. In reality, in real-time applications,
it often happens that your fps are drastically decreased compared to your expectations due to pre/post processing.
In this example, Yolov7 needs images to be resized using dp.dimage.resize_padding()
method in order to keep the
orginal aspect ratio of the image as well as it needs to be normalized.
A good practice would be to resize your image first before doing any further processings in order to limit
the amount of data processed at a time.
Keep in mind that it is much better to pre-allocate the dp.darray
and dp.dimage
in order not
to perform memory allocation during the core of your application. This is what we will be doing here.
import cv2
import dolphin as dp
stream = cv2.VideoCapture("your_video.mp4")
# We need to know the size of the frame
width = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT))
# As OpenCV reads HWC uint8_t images, we allocate the
# corresponding dp.dimage
d_frame = dp.dimage(shape=(height, width, 3), dtype=dp.uint8)
# Yolov7 is processing directly CHW images, we thus have to
# transpose the array, meaning, pre-allocate where we will
# store the transposed reordered data
transposed_frame = dp.dimage(shape=(3, height, width),
dtype=dp.uint8,
stream=stream)
# We also pre-allocate the image once resized in order
# (640, 640) is the size Yolov7 works with
resized_frame = dp.dimage(shape=(3, 640, 640),
dtype=dp.uint8,
stream=stream)
# Once the image is correctly formatted, meaning :
# 3x640x640 uint8, we need to normalize the image
# between 0<=image<=1. To do so, we need to use
# dp.DOLPHIN_255 flag which will write float32
# data
inference_frame = dp.dimage(shape=(3, 640, 640),
dtype=dp.float32,
stream=stream)
2. Inference
We thus have pre-allocated 18MB to speed up the preprocessing by avoiding on-the-fly allocations. Shall we now go through the inference part of all of this.
# We now instanciate our AI model as a TensorRT engine
engine = dp.Engine("your_model.onnx",
"your_model.engine",
mode="fp16",
verbosity=True)
while(True):
# We copy the OpenCV frame onto the GPU
d_frame.from_numpy(frame)
# We process the frame
# 1. We transpose the frame
d_frame = d_frame.transpose(2, 0, 1)
# 2. We perform padding resize
_, r, dwdh = dp.resize_padding(src=transposed_frame,
shape=(640, 640),
dst=resized_frame)
# 3. We do channel swapping in order to transform
# our BGR image into RGB
dp.cvtColor(src=resized_frame,
color_format=dp.DOLPHIN_RGB,
dst=resized_frame)
# 4. We normalize the frame as described just above
dp.normalize(src=resized_frame,
dst=inference_frame,
normalize_type=dp.DOLPHIN_255)
# 5. We finally infer our model
output = engine.infer({"images": inference_frame})
Dolphin darray
- class dolphin.darray(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.float32, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[pycuda.driver.DeviceAllocation] = None)[source]
Bases:
object
This class implements a generic numpy style array that can be used with the dolphin library. It implements common features available with numpy arrays such as astype, transpose, copy…
darray is made with the same philosophy as numpy.ndarray. The usability is really close to numpy arrays. However, darray is meant to be much more performant than numpy.ndarray since it is GPU accelerated.
- Parameters:
shape (Tuple[int, ...], optional) – Shape of the darray, defaults to None
dtype (dolphin.dtype, optional) – dtype of the darray, defaults to None
stream (cuda.Stream, optional) – CUDA stream to use, defaults to None
array (numpy.ndarray, optional) – numpy array to copy, defaults to None
strides (Tuple[int, ...], optional) – strides of the darray, defaults to None
allocation (cuda.DeviceAllocation, optional) – CUDA allocation to use, defaults to None
allocation_size (int, optional) – Size of the allocation, defaults to None
- __init__(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.float32, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[pycuda.driver.DeviceAllocation] = None) None [source]
- static broadcastable(shape_1: Tuple[int, ...], shape_2: Tuple[int, ...]) bool [source]
Checks if two shapes are broadcastable.
- Parameters:
shape_1 (Tuple[int, ...]) – First shape
shape_2 (Tuple[int, ...]) – Second shape
- Returns:
True if the shapes are broadcastable, False otherwise
- Return type:
bool
- static compute_strides(shape: Tuple[int, ...]) Tuple[int, ...] [source]
Computes the strides of an array from the shape. The strides are the number of elements to skip to get to the next element. Also, the strides are in elements, not bytes.
- Parameters:
shape (Tuple[int, ...]) – shape of the ndarray
- Returns:
Strides
- Return type:
Tuple[int, …]
- property shape_allocation: pycuda.driver.DeviceAllocation
Property to access the cuda allocation of the shape.
- Returns:
The cuda allocation of the shape
- Return type:
cuda.DeviceAllocation
- property strides_allocation: pycuda.driver.DeviceAllocation
Property to access the cuda allocation of the strides.
- Returns:
The cuda allocation of the strides
- Return type:
cuda.DeviceAllocation
- property ndim: uint32
Computes the number of dimensions of the array.
- Returns:
Number of dimensions of the array
- Return type:
numpy.uint32
- property strides: Tuple[int, ...]
Property to access the strides of the array.
- Returns:
Strides of the array
- Return type:
Tuple[int, …]
- property size: uint32
Property to access the size of the array. Size is defined as the number of elements in the array.
- Returns:
The size of the array, in terms of number of elements
- Return type:
numpy.uint32
- property dtype: dtype
Property to access the dolphin.dtype of the array
- Returns:
dolphin.dtype of the array
- Return type:
- property shape: Tuple[int, ...]
Property to access the shape of the array
- Returns:
Shape of the array
- Return type:
Tuple[int, …]
- property nbytes: int
Property to access the number of bytes of the array
- Returns:
Number of bytes of the array
- Return type:
int
- property T: darray
Performs a transpose operation on the darray. This transpose reverse the order of the axes:
a = darray(shape=(2, 3, 4)) a.T.shape >>> (4, 3, 2)
Also, please note that this function is not efficient as it performs a copy of the darray.
- Returns:
Transposed darray
- Return type:
- property stream: pycuda.driver.Stream
Property to access (Read/Write) the cuda stream of the array
- Returns:
Stream used by the darray
- Return type:
cuda.Stream
- property allocation: pycuda.driver.DeviceAllocation
Property to access (Read/Write) the cuda allocation of the array
- Returns:
The cuda allocation of the array
- Return type:
cuda.DeviceAllocation
- property np: ndarray
Alias for to_numpy()
- Returns:
numpy.ndarray of the darray
- Return type:
numpy.ndarray
- to_numpy() ndarray [source]
Converts the darray to a numpy.ndarray. Note that a copy from the device to the host is performed.
- Returns:
numpy.ndarray of the darray
- Return type:
numpy.ndarray
- from_numpy(array: ndarray) None [source]
Writes allocation from a numpy array. If the array is not the same shape or dtype as the darray, an error is raised.
- Parameters:
array (numpy.ndarray) – Numpy array create the darray from
- astype(dtype: dtype, dst: Optional[darray] = None) darray [source]
Converts the darray to a different dtype. Note that a copy from device to device is performed.
- Parameters:
dtype (dolphin.dtype) – dtype to convert to
dst (darray, optional) – darray to write the result of the operation, defaults to None
- Raises:
ValueError – In case the dst shape or dtype doesn’t match
- Returns:
darray with the new dtype
- Return type:
- transpose(*axes: int) darray [source]
Transposes the darray according to the axes.
- Parameters:
axes (Tuple[int]) – Axes to permute
- Returns:
Transposed darray
- Return type:
- copy() darray [source]
Returns a copy of the current darray. Note that a copy from device to device is performed.
- Returns:
Copy of the array with another cuda allocation
- Return type:
- __getitem__(index: Union[int, slice, tuple]) darray [source]
Returns a view of the darray with the given index.
- Parameters:
index (Union[int, slice, tuple]) – Index to use
- Raises:
IndexError – Too many indexes. The number of indexes must be less than the number of dimensions of the array.
IndexError – Axes must be in the range of the number of dimensions of the array.
- Returns:
View of the darray
- Return type:
- __setitem__(index: Union[int, slice, tuple], other: Union[int, float, number, darray]) None [source]
Sets the value of the darray with the given index.
- __str__() str [source]
Returns the string representation of the numpy array. Note that a copy from the device to the host is performed.
- Returns:
String representation of the numpy array
- Return type:
str
- __repr__() str [source]
Returns the representation of the numpy array. Note that a copy from the device to the host is performed.
- Returns:
Representation of the numpy array
- Return type:
str
- flatten(dst: Optional[darray] = None) darray [source]
Returns a flattened view of the darray. Order = C.
- fill(value: Union[int, float, number]) darray [source]
Fills the darray with the value of value.
- Parameters:
value (Union[int, float, numpy.number]) – Value to fill the array with
- Returns:
Filled darray
- Return type:
- add(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Efficient addition of a darray with another object. Can be a darray or a scalar. If dst is None, normal __add__ is called. This method is much more efficient than the __add__ method because __add__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- __add__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient addition of a darray with another object. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to add
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
A copy of the darray where the result is written
- Return type:
- __radd__(other: Union[int, float, number, darray]) darray
Non-Efficient addition of a darray with another object. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to add
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
A copy of the darray where the result is written
- Return type:
- __iadd__(other: Union[int, float, number, darray]) darray [source]
Implements += operator. As __add__, this method is not efficient because it implies a copy of the array where the usage of cuda.memalloc which is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to add
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- substract(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Efficient substraction of a darray with another object.
- sub(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray
Efficient substraction of a darray with another object.
- __sub__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient substraction of a darray with another object. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
A copy of the darray where the result is written
- Return type:
- reversed_substract(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Efficient reverse substraction of an object with darray. It is efficient if dst is provided because it does not invoke cuda.memalloc. If dst is not provided, normal __rsub__ is called.
- __rsub__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient substraction of another object with darray. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
A copy of the darray where the result is written
- Return type:
- __isub__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient -= operation. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
A copy of the darray where the result is written
- Return type:
- multiply(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Efficient multiplication of a darray with another object. Can be a darray or a scalar. If dst is None, normal __add__ is called. This method is much more efficient than the __mul__ method because __mul__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- Parameters:
dst (darray) – darray where to write the result
other (Union[int, float, numpy.number, 'darray']) – numpy.ndarray or scalar to multiply
- Raises:
ValueError – If the size, dtype or shape of dst is not matching
- Returns:
darray where the result is written. Can be dst or self
- Return type:
- mul(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray
Efficient multiplication of a darray with another object. Can be a darray or a scalar. If dst is None, normal __add__ is called. This method is much more efficient than the __mul__ method because __mul__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- Parameters:
dst (darray) – darray where to write the result
other (Union[int, float, numpy.number, 'darray']) – numpy.ndarray or scalar to multiply
- Raises:
ValueError – If the size, dtype or shape of dst is not matching
- Returns:
darray where the result is written. Can be dst or self
- Return type:
- __mul__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient multiplication of a darray with another object. This multiplication is element-wise multiplication, not matrix multiplication. For matrix multiplication please refer to matmul. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to multiply
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __rmul__(other: Union[int, float, number, darray]) darray
Non-Efficient multiplication of a darray with another object. This multiplication is element-wise multiplication, not matrix multiplication. For matrix multiplication please refer to matmul. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to multiply
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __imul__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient multiplication of a darray with another object. This multiplication is element-wise multiplication, not matrix multiplication. For matrix multiplication please refer to matmul. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to multiply
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- divide(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Efficient division of a darray with another object. Can be a darray or a scalar. If dst is None, normal __div__ is called. This method is much more efficient than the __div__ method because __div__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- div(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray
Efficient division of a darray with another object. Can be a darray or a scalar. If dst is None, normal __div__ is called. This method is much more efficient than the __div__ method because __div__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- __div__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- reversed_divide(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Efficient division of a darray with another object.
Can be a darray or a scalar. If dst is None, normal __rdiv__ is called. This method is much more efficient than the __rdiv__ method because __rdiv__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- rdiv(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray
Efficient division of a darray with another object.
Can be a darray or a scalar. If dst is None, normal __rdiv__ is called. This method is much more efficient than the __rdiv__ method because __rdiv__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).
- __rdiv__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient reverse division of an object by darray. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __idiv__(other: Union[int, float, number, darray]) darray [source]
Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __truediv__(other: Union[int, float, number, darray]) darray
Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __itruediv__(other: Union[int, float, number, darray]) darray
Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __rtruediv__(other: Union[int, float, number, darray]) darray
Non-Efficient reverse division of an object by darray. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)
- Parameters:
other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by
- Raises:
ValueError – If other is not a scalar or a darray
- Returns:
The darray where the result is written
- Return type:
- __len__() int [source]
Returns the length of the first dimension of the array.
- Returns:
Size of the first dimension of the array
- Return type:
int
- __abs__() darray [source]
Returns the absolute value of the array.
- Returns:
Absolute value of the array
- Return type:
- absolute(dst: Optional[darray] = None) darray [source]
Returns the absolute value of the array.
- Returns:
Absolute value of the array
- Return type:
- __module__ = 'dolphin.core.darray'
Built-in functions
- dolphin.zeros(shape: Tuple[int, ...], dtype: dtype = dtype.float32) darray [source]
Returns a darray for a given shape and dtype filled with zeros.
This function is a creation function, thus, it does not take an optional destination darray as argument.
- Parameters:
shape (Tuple[int, ...]) – Shape of the array
dtype (dolphin.dtype) – Type of the array
- Returns:
darray filled with zeros
- Return type:
- dolphin.zeros_like(other: Union[darray, array]) darray [source]
Returns a darray filled with zeros with the same shape and dtype as another darray.
This function is a creation function, thus, it does not take an optional destination darray as argument.
- dolphin.empty(shape: Tuple[int, ...], dtype: dtype = dtype.float32) darray [source]
Returns a darray of a given shape and dtype without initializing entries.
This function is a creation function, thus, it does not take an optional destination darray as argument.
- Parameters:
shape (Tuple[int, ...]) – Shape of the array
dtype (dolphin.dtype) – Type of the array
- Returns:
darray filled with random values
- Return type:
- dolphin.empty_like(other: Union[darray, array]) darray [source]
Returns a darray without initializing entries with the same shape and dtype as another darray.
This function is a creation function, thus, it does not take an optional destination darray as argument.
- dolphin.ones(shape: Tuple[int, ...], dtype: dtype = dtype.float32) darray [source]
Returns a darray for a given shape and dtype filled with ones.
This function is a creation function, thus, it does not take an optional destination darray as argument.
- Parameters:
shape (Tuple[int, ...]) – Shape of the array
dtype (dolphin.dtype) – Type of the array
- Returns:
darray filled with ones
- Return type:
- dolphin.ones_like(other: Union[darray, array]) darray [source]
Returns a darray filled with ones with the same shape and dtype as another darray.
This function is a creation function, thus, it does not take an optional destination darray as argument.
- dolphin.transpose(src: darray, axes: Tuple[int, ...]) darray [source]
Returns a darray with the axes transposed.
- dolphin.add(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Returns the addition of two darrays. It works that way:
result = src + other
- dolphin.multiply(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Returns the multiplication of two darrays. It works that way:
result = src * other
- dolphin.divide(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Returns the division of a darray by an object. It works that way:
result = src / other
- dolphin.reversed_divide(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Returns the division of a darray and an object. It works that way:
result = other / src
- dolphin.substract(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray [source]
Returns the substraction of two darrays. It works that way:
result = src - other
Dolphin dimage
- class dolphin.dimage(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.uint8, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, channel_format: Optional[dimage_channel_format] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[int] = None)[source]
Bases:
darray
This class inherits from
darray
in order to provide image processing functionalities.dimage is made with the same philosophy as
darray
but with additionnal functionalities for image processing gpu-accelerated. It supports all the methods defined indarray
but has some specific attributes in order to better handle images.We tried to partly follow the same philosophy as OpenCV in order to make the transition easier.
Important : dimage is assuming to follow (height, width) order as per defined in OpenCV.
- Parameters:
shape (Tuple[int, ...][int, ...], optional) – Shape of the darray, defaults to None
dtype (dolphin.dtype, optional) – dtype of the darray, defaults to None
stream (cuda.Stream, optional) – CUDA stream to use, defaults to None
array (numpy.ndarray, optional) – numpy array to copy, defaults to None
channel_format (dimage_channel_format, optional) – Channel format of the image, defaults to None
strides (Tuple[int, ...][int, ...], optional) – strides of the darray, defaults to None
allocation (cuda.DeviceAllocation, optional) – CUDA allocation to use, defaults to None
allocation_size (int, optional) – Size of the allocation, defaults to None
- __init__(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.uint8, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, channel_format: Optional[dimage_channel_format] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[int] = None) None [source]
- property image_channel_format: dimage_channel_format
Returns the image channel format.
- Returns:
The image channel format
- Return type:
- property image_dim_format: dimage_dim_format
Returns the image dimension format.
- Returns:
The image dimension format
- Return type:
- property height: uint16
Returns the height of the image.
- Returns:
The height of the image
- Return type:
numpy.uint16
- property width: uint16
Returns the width of the image.
- Returns:
The width of the image
- Return type:
numpy.uint16
- property channel: uint8
Returns the number of channels of the image.
- Returns:
The number of channels of the image
- Return type:
numpy.uint8
- astype(dtype: dtype, dst: Optional[dimage] = None) dimage [source]
Convert the image to a different type
This function converts the image to a different type. To use it efficiently, the destination image must be preallocated and passed as an argument to the function:
src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32) src.astype(dolphin.dtype.float32, dst)
- Parameters:
dtype (dolphin.dtype) – Dtype to convert the darray to
dst (dimage) – Destination image
- resize_padding(shape: Tuple[int, ...], dst: Optional[dimage] = None, padding_value: Union[int, float] = 127) Tuple[dimage, float, Tuple[int, int]] [source]
Padded resize the image
This function resizes the image to a new shape with padding. It means that the image is resized to the new shape and the remaining pixels are filled with the padding value (127 by default).
If for instance the image is resized from (50, 100) to (200, 200), the aspect ratio of the image is preserved and the image is resized to (100, 200). The remaining pixels are filled with the padding value. In this scenario, the padding would appear on the left and right side of the image, with a width of 50 pixels.
In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:
src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.uint8) src.resize_padding((200, 200), dst)
If aspect ratio does not matter, the function
resize()
can be used.
- resize(shape: Tuple[int, ...], dst: Optional[dimage] = None) dimage [source]
Resize the image
This function performs a naive resize of the image. The resize type is for now only
dolphin.dimage_resize_type.DOLPHIN_NEAREST
. In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32) src.resize((200, 200), dst)
The returned resized image aspect ratio might change as the new shape is not necessarily a multiple of the original shape.
If aspect ratio of the orginal image matters to you, use
resize_padding()
instead:src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32) src.resize_padding((200, 200), dst)
- Parameters:
shape (Tuple[int, ...]) – The new shape of the image
dst (dimage) – The destination image
- __module__ = 'dolphin.core.dimage'
- crop_and_resize(coordinates: darray, size: Tuple[int, int], dst: Optional[darray] = None) darray [source]
This method is well performing for cropping and resizing images. The use case would be to process a batch of images which would be the crops of a bigger image. The coordinates of the crops would, for instance, be the output of a detection model.
In terms of usability, the coordinates are expected to be in the following format: [[x1, y1, x2, y2], …] where x1, y1, x2, y2 are the absolute coordinates of the top left and bottom right corners of the crop in the original image.
Coordinates would thus be a
dolphin.darray
of shape (n, 4) where n is the number of crops, n >= 1.Also, for faster execution, the destination darray is expected to be preallocated and passed as an argument to the function.
- Parameters:
coordinates (dolphin.darray) – darray of coordinates
size (Tuple[int, int]) – Tuple of the desired shape (width, height)
dst (dolphin.darray, optional) – Destination darray for faster execution, defaults to None
- Returns:
darray of cropped and resized images
- Return type:
- crop_and_resize_padding(coordinates: darray, size: Tuple[int, int], padding: Union[int, float] = 127, dst: Optional[darray] = None) darray [source]
This method is well performing for cropping and resizing images using padded resize. The use case would be to process a batch of images which would be the crops of a bigger image. The coordinates of the crops would, for instance, be the output of a detection model.
In terms of usability, the coordinates are expected to be in the following format: [[x1, y1, x2, y2], …] where x1, y1, x2, y2 are the absolute coordinates of the top left and bottom right corners of the crop in the original image.
Coordinates would thus be a
dolphin.darray
of shape (n, 4) where n is the number of crops, n >= 1.Also, for faster execution, the destination darray is expected to be preallocated and passed as an argument to the function.
- Parameters:
coordinates (dolphin.darray) – darray of coordinates
size (Tuple[int, int]) – Tuple of the desired shape (width, height)
dst (dolphin.darray, optional) – Destination darray for faster execution, defaults to None
- Returns:
darray of cropped and resized images
- Return type:
- normalize(normalize_type: dimage_normalize_type = dimage_normalize_type.DOLPHIN_255, mean: Optional[List[Union[int, float]]] = None, std: Optional[List[Union[int, float]]] = None, dtype: dtype = dtype.float32, dst: Optional[dimage] = None) dimage [source]
Normalize the image
This function is a function to efficiently normalize an image in different manners.
The mean and std values must be passed as a list of values if you want to normalize the image using the
dolphin.dimage_normalize_type.DOLPHIN_MEAN_STD
normalization type. To use this function efficiently, the destination image must be preallocated and passed as an argument to the function:src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32) src.normalize(dimage_normalize_type.DOLPHIN_MEAN_STD, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], dst=dst)
or:
src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32) src.normalize(dimage_normalize_type.DOLPHIN_255, dst=dst)
- Parameters:
normalize_type (dimage_normalize_type) – The type of normalization
mean (List[Union[int, float]]) – The mean values
std (List[Union[int, float]]) – The std values
dst (dimage) – The destination image
- cvtColor(color_format: dimage_channel_format, dst: Optional[dimage] = None) dimage [source]
Transforms the image to the specified color format.
This function transforms the image to the specified color format. The supported color formats are:
- dolphin.dimage_channel_format.DOLPHIN_RGB - dolphin.dimage_channel_format.DOLPHIN_BGR - dolphin.dimage_channel_format.DOLPHIN_GRAY_SCALE
- Parameters:
color_format (dimage_channel_format) – The color format of the output image
dst (dimage) – The destination image
- transpose(*axes: int) dimage [source]
Transpose the image.
This function transposes the image. The axes are specified as a sequence of axis numbers.
To be used efficiently, the destination image must be provided and must have the same shape as the source image:
src = dimage(shape=(2, 3, 4), dtype=np.float32) dst = dimage(shape=(4, 3, 2), dtype=np.float32) src.transpose(2, 1, 0)
dimage Enumerations
- class dolphin.dimage_dim_format(value)[source]
Bases:
Enum
Image dimension format.
- DOLPHIN_CHW: int = 0
- DOLPHIN_HWC: int = 1
- DOLPHIN_HW: int = 2
- class dolphin.dimage_channel_format(value)[source]
Bases:
Enum
Image channel format.
- DOLPHIN_RGB: int = 0
- DOLPHIN_BGR: int = 1
- DOLPHIN_GRAY_SCALE: int = 2
- class dolphin.dimage_resize_type(value)[source]
Bases:
Enum
Image resize type.
DOLPHIN_NEAREST stands for nearest neighbor interpolation resize. Its equivalent in opencv is INTER_NEAREST:
cv2.resize(src, (width, height), interpolation=cv2.INTER_NEAREST)
DOLPHIN_PADDED stands for padded nearest neighbor interpolation resize. Its equivalent can be found here. # noqa
- DOLPHIN_NEAREST: int = 0
- DOLPHIN_PADDED: int = 1
- class dolphin.dimage_normalize_type(value)[source]
Bases:
Enum
Image normalize type.
DOLPHIN_MEAN_STD stands for normalization by mean and std. Which is:
image = (image - mean) / std
DOLPHIN_255 stands for normalization by 255. Which is:
image = image / 255.0
DOLPHIN_TF stands for tensorflow normalization. Which is:
image = image / 127.5 - 1.0
- DOLPHIN_MEAN_STD: int = 0
- DOLPHIN_255: int = 1
- DOLPHIN_TF: int = 2
Built-in functions
- dolphin.resize_padding(src: dimage, shape: Tuple[int, ...], padding_value: Union[int, float] = 127, dst: Optional[dimage] = None) Tuple[dimage, float, Tuple[float, float]] [source]
Padded resize the image
This function resizes the image to a new shape with padding. It means that the image is resized to the new shape and the remaining pixels are filled with the padding value (127 by default).
If for instance the image is resized from (50, 100) to (200, 200), the aspect ratio of the image is preserved and the image is resized to (100, 200). The remaining pixels are filled with the padding value. In this scenario, the padding would appear on the left and right side of the image, with a width of 50 pixels.
In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:
src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.uint8) src.resize_padding((200, 200), dst)
If aspect ratio does not matter, the function
resize()
can be used.- Parameters:
shape (Tuple[int, ...]) – The new shape of the image
dst (dimage) – The destination image
padding_value (int or float) – The padding value
- dolphin.resize(src: dimage, shape: Tuple[int, ...], dst: Optional[dimage] = None) dimage [source]
Resize the image
This function performs a naive resize of the image. The resize type is for now only
dolphin.dimage_resize_type.DOLPHIN_NEAREST
. In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32) dolphin.resize(src, (200, 200), dst)
The returned resized image aspect ratio might change as the new shape is not necessarily a multiple of the original shape.
If aspect ratio of the orginal image matters to you, use
resize_padding()
instead:src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32) dolphin.resize_padding(src, (200, 200), dst)
- Parameters:
shape (Tuple[int, ...]) – The new shape of the image
dst (dimage) – The destination image
- dolphin.normalize(src: dimage, normalize_type: dimage_normalize_type, mean: Optional[List[Union[int, float]]] = None, std: Optional[List[Union[int, float]]] = None, dtype: Optional[dtype] = None, dst: Optional[dimage] = None) None [source]
Normalize the image
This function is a wrapper for the normalize function of the dimage class. The mean and std values must be passed as a list of values if you want to normalize the image using the
dolphin.dimage_normalize_type.DOLPHIN_MEAN_STD
normalization type. To use this function efficiently, the destination image must be preallocated and passed as an argument to the function:src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32) normalize(src, dimage_normalize_type.DOLPHIN_MEAN_STD, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], dst=dst)
or:
src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8) dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32) normalize(src, dimage_normalize_type.DOLPHIN_255, dst=dst)
- Parameters:
src (dimage) – Source image
normalize_type (dimage_normalize_type) – The type of normalization
mean (List[Union[int, float]]) – The mean values
std (List[Union[int, float]]) – The std values
dst (dimage) – The destination image
- dolphin.cvtColor(src: dimage, color_format: dimage_channel_format, dst: Optional[dimage] = None) None [source]
Transforms the image to the specified color format.
This function is a wrapper for the cvtColor function of the dimage class.
This function transforms the image to the specified color format. The supported color formats are:
- dolphin.dimage_channel_format.DOLPHIN_RGB - dolphin.dimage_channel_format.DOLPHIN_BGR - dolphin.dimage_channel_format.DOLPHIN_GRAY_SCALE
- Parameters:
color_format (dimage_channel_format) – The color format of the output image
dst (dimage) – The destination image
Dolphin Bufferizer
- class dolphin.Bufferizer(shape: tuple, buffer_size: int, dtype: dtype, stream: Optional[pycuda.driver.Stream] = None, flush_hook: Optional[callable] = None, allocate_hook: Optional[callable] = None, append_one_hook: Optional[callable] = None, append_multiple_hook: Optional[callable] = None, buffer_full_hook: Optional[callable] = None)[source]
Bases:
object
Bufferizer is a class that allows to easily bufferize data on the GPU. The purpose is to handle seamlessly batched data and to avoid unnecessary memory allocation but rather reuse the same memory buffer and favour copy operations.
Bufferizer can, through its append methods, append data to the buffer. It can be either one darray at a time, a list of darray`s or a single batched `darray.
In addition to bufferizing data, the class also allows to trigger hooks at different moments of its lifecycle.
flush_hook : callable triggered when buffer is flushed
allocate_hook : callable triggered when buffer is allocated
append_one_hook : callable triggered when buffer has a new element appended
append_multiple_hook : callable triggered when buffer has new elements appended
buffer_full_hook : callable triggered when the buffer is full after calling any append
- Parameters:
shape (tuple) – shape of element to bufferize
buffer_size (int) – size of the buffer
dtype (dolphin.dtype) – dtype of the element to bufferize
stream (cuda.Stream, optional) – stream to use for the buffer, defaults to None
flush_hook (callable, optional) – callable triggered when buffer is flushed, defaults to None
allocate_hook (callable, optional) – callable triggered when buffer is allocated, not triggered by the first allocation, defaults to None
append_one_hook (callable, optional) – callable triggered when buffer has a new element appended, defaults to None
append_multiple_hook (callable, optional) – callable triggered when buffer has new elements appended, defaults to None
buffer_full_hook (callable, optional) – callable triggered when the buffer is full after calling any append, defaults to None
- __init__(shape: tuple, buffer_size: int, dtype: dtype, stream: Optional[pycuda.driver.Stream] = None, flush_hook: Optional[callable] = None, allocate_hook: Optional[callable] = None, append_one_hook: Optional[callable] = None, append_multiple_hook: Optional[callable] = None, buffer_full_hook: Optional[callable] = None)[source]
Constructor of the Bufferizer class.
- allocate() None [source]
Method to allocate the buffer on the GPU. This method is called automatically when the class is instanciated.
Once the buffer is allocated, it is not possible to change the size and it the allocation is initialized to 0.
Also, this methods triggers the allocate_hook if it is not None.
- append(element: Union[darray, List[darray]]) None [source]
General purpose append method. You can provide either a single darray, a batched darray or a list of darray and the method will handle it.
For more details about the handling of each case, see the append_one and append_multiple methods.
- Parameters:
element (Union[dolphin.darray, List[dolphin.darray]]) – The element to append to the buffer.
- append_one(element: darray) None [source]
Method to append one element to the buffer. The element is copied and appended to the buffer. element must be a darray of the same shape and dtype as the bufferizer. The size of the buffer is increased by one.
Appending one element triggers the append_one_hook if it is not None. Once the buffer is full, the buffer_full_hook is triggered.
- Parameters:
element (dolphin.darray) – The element to append to the buffer.
- append_multiple(element: Union[darray, List[darray]]) None [source]
Function used in order to append multiple darray to the buffer at once. The darray must be of the same shape and dtype as the bufferizer with the exception of the first dimension which can be different in case of a batched darray. Otherwise, the darray must be a list of darray of the same shape and dtype as the bufferizer.
The size of the buffer is increased by the number of elements in element.
It is assumed that the number of elements in element is defined as the first dimension of its shape. For instance:
element.shape = (batch_size, *self._shape)
Appending multiple elements triggers the append_multiple_hook if it is not None. Once the buffer is full, the buffer_full_hook is triggered if it is not None.
- Parameters:
element (darray) – The element to append to the buffer.
- flush(value: Any = 0) None [source]
Set the buffer to a given value. Useful in order to get rid of any residual data in the buffer.
Calling this method triggers the flush_hook if it is not None.
- Parameters:
stream (cuda.Stream, optional) – Cuda stream, defaults to None
- flush_hook(hook: callable)[source]
Method to set the flush hook. This hook is called when the flush method is called.
- Parameters:
hook (callable) – Callable function called each time the flush method is called.
- allocate_hook(hook: callable)[source]
Method to set the allocate hook. This hook is called when the allocate method is called.
- Parameters:
hook (callable) – Callable function called each time the allocate method is called.
- append_one_hook(hook: callable)[source]
Method to set the append one hook.
- Parameters:
hook (callable) – Callable function called each time the append_one method is called.
- append_multiple_hook(hook: callable)[source]
Method to set the append multiple hook.
- Parameters:
hook (callable) – Callable function called each time the append_multiple method is called.
- buffer_full_hook(hook: callable)[source]
Method to set the buffer full hook.
- Parameters:
hook (callable) – Callable function called each time the buffer is full.
- property allocation: pycuda.driver.DeviceAllocation
Property in order to get the allocation of the buffer.
- Returns:
Allocation of the buffer.
- Return type:
cuda.DeviceAllocation
- property darray: darray
Property in order to convert a bufferizer to a darray.
Important note : The darray returned by this property is not a copy of the bufferizer. It is a view of the bufferizer. Any change to the darray will be reflected in the bufferizer and vice-versa.
- Returns:
darray of bufferizer
- Return type:
- property element_nbytes: int
Property in order to get the number of bytes of a single element in the buffer.
- Returns:
Number of bytes of a single element in the buffer.
- Return type:
int
- property nbytes: int
Property in order to get the number of bytes of the buffer.
- Returns:
Number of bytes of the buffer.
- Return type:
int
- __module__ = 'dolphin.core.bufferizer'
- property full: bool
Property in order to know if the buffer is full.
- Returns:
True if the buffer is full, False otherwise.
- Return type:
bool
- property shape: tuple
Property in order to get the shape of the the buffer.
- Returns:
Shape of the buffer.
- Return type:
tuple
- property element_shape: tuple
Property in order to get the shape of a single element in the buffer.
- Returns:
Shape of a single element in the buffer.
- Return type:
tuple
Dolphin dtype
- class dolphin.dtype(value)[source]
Bases:
Enum
Dolphin data types In order to manage the data types in Dolphin, bind dolphin types the numpy data types as well as the CUDA data types. To do so, each element from the Enum class is a tuple containing the numpy data type (numpy.dtype) and the CUDA data type (str).
- uint8 = (<class 'numpy.uint8'>, 'uint8_t')
- uint16 = (<class 'numpy.uint16'>, 'uint16_t')
- uint32 = (<class 'numpy.uint32'>, 'uint32_t')
- int8 = (<class 'numpy.int8'>, 'int8_t')
- int16 = (<class 'numpy.int16'>, 'int16_t')
- int32 = (<class 'numpy.int32'>, 'int32_t')
- float32 = (<class 'numpy.float32'>, 'float')
- float64 = (<class 'numpy.float64'>, 'double')
- __call__(value: Union[int, float, number]) dtype [source]
In order to use the data type as a function to cast a value into a particular type, we need to implement the __call__ method.
Example:
a = dtype.uint8 a(4) -> numpy.uint8(4)
- Parameters:
value (Union[int, float, numpy.number]) – The value to cast to the data type
- Returns:
The numpy casted number passed as value
- Return type:
numpy.dtype
- property numpy_dtype: dtype
Since Dolphin data types are tuples, we need to access the first element which is the numpy data type.
- Returns:
The equivalent numpy data type of Dolphin data type
- Return type:
numpy.dtype
- property cuda_dtype: str
Since Dolphin data types are tuples, we need to access the second element which is the CUDA data type. Which as well are standard C types.
- Returns:
The equivalent CUDA data type of Dolphin data type
- Return type:
str
- property itemsize: int
Returns the size of the data type in bytes. Uses the numpy data type to get the size :
>>> @property >>> def itemsize(self) -> int: >>> return self.numpy_dtype.itemsize
- static from_numpy_dtype(numpy_dtype: dtype) dtype [source]
Returns the equivalent Dolphin data type from the numpy data type.
- Parameters:
numpy_dtype (numpy.dtype) – The numpy data type
- Returns:
The equivalent Dolphin data type
- Return type:
- __getitem__(key: Union[str, int]) Union[dtype, str] [source]
In order to dynamically access the numpy and CUDA data types, we also need to implement the __getitem__ method. if key is an integer, it will return one of the tuple element as long as the key is either 0 or 1. if key is a string, it will return the numpy or CUDA data type as long as the key is either ‘numpy_dtype’ or ‘cuda_dtype’.
Usage:
a = dtype.uint8 a[0] # numpy.uint8 a[1] # 'uint8_t' a["numpy_dtype"] # numpy.uint8 a["cuda_dtype"] # 'uint8_t'
- Parameters:
key (Union[str, int]) – ‘numpy_dtype’ or ‘cuda_dtype’ or a int 0 or 1
- Raises:
KeyError – If the key is not valid as described above
- Returns:
The numpy or CUDA data type
- Return type:
Union[numpy.dtype, str]
- __module__ = 'dolphin.core.dtype'
Dolphin CudaBase
CudaBase
- class dolphin.CudaBase[source]
Bases:
object
This class is mainly used to access device information, such as maximum number of threads per block, size of grid etc… This is a base class used by many other classes. It has a lot of class attributes in order not to load the same things again and again in order to speed up execution.
- device: pycuda._driver.Device
Used device
- max_threads_per_block: int
Maximum number of threads per blocks. Usually, it is 1024
- max_grid_dim_x: int
Maximum number of blocks per grid x on dim
- max_grid_dim_y: int
Maximum number of blocks per grid y on dim
- max_grid_dim_z: int
Maximum number of blocks per grid z on dim
- warp_size: int
Warp size
- multiprocessor_count: int
Number of MP
- threads_blocks_per_mp: int
Number of threads per MP
- static GET_BLOCK_GRID_1D(n: int) Tuple[Tuple[int, int, int], Tuple[int, int]] [source]
In ordert to perform memory coalescing on 1D iterations, we need to efficiently compute the block & grid sizes.
- Parameters:
n (int) – Number of elements to process
- Returns:
block, grid
- Return type:
Tuple[Tuple[int, int, int], Tuple[int, int]]
- static GET_BLOCK_X_Y(Z: int) Tuple[int, int, int] [source]
Get the block size for a given Z. The block size is calculated using the following formula: (max(ceil(sqrt(MAX_THREADS_PER_BLOCKS/Z)),1), max(ceil(sqrt(MAX_THREADS_PER_BLOCKS/Z)),1), Z)
It is useful to quickly compute the block size that suits self._max_threads_per_block for a given Z which can be channels, depth, batch size, etc.
- Parameters:
Z (int) – Size of the third dimension
- Returns:
Optimal block size that ensure block[0]*block[1]*block[2] <= self._max_threads_per_block
- Return type:
tuple
- static GET_GRID_SIZE(size: tuple, block: tuple) Tuple[int, int] [source]
Get the grid size for a given size and block size. The grid size is calculated using the following formula: (max(ceil(sqrt(size/block[0])),1), max(ceil(sqrt(size/block[1])),1))
This function should be used when the width and height of the image are the same or can be swapped.
- Parameters:
size (tuple) – Total size of data
block (tuple) – Current value of the block size
- Returns:
Grid size
- Return type:
tuple
- static GET_GRID_SIZE_HW(size: tuple, block: tuple) Tuple[int, int] [source]
Get the grid size for a given size and block size. The grid size is calculated using the following formula: (max(ceil(sqrt(size[0]/block[0])),1), max(ceil(sqrt(size[1]/block[1])),1))
This function should be used when the width and height of the image are different and matter.
- Parameters:
size (tuple) – Height and width of the image
block (tuple) – Current value of the block size
- Returns:
Grid size
- Return type:
tuple
Inference Engine
- class dolphin.Engine(onnx_file_path: ~typing.Optional[str] = None, engine_path: ~typing.Optional[str] = None, mode: str = 'fp16', optimisation_profile: ~typing.Optional[~typing.Tuple[int, ...]] = None, verbosity: bool = False, explicit_batch: bool = False, direct_io: bool = False, stdout: object = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr: object = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, calib_cache: ~typing.Optional[str] = None)[source]
Bases:
EEngine
,IEngine
Class to manage TensorRT engines. It is able to read an engine from a file or to create one from an onnx file.
This class is using the
CudaTrtBuffers
class to manage batched buffers. Find more details on the documentation ofinfer()
.TensorRT Github official Repository: https://github.com/NVIDIA/TensorRT
TensorRT official documentation: https://developer.nvidia.com/tensorrt
- Parameters:
onnx_file_path (str, optional) – Path to the onnx file to use, defaults to None
engine_path (str, optional) – Path to the engine to read or to save, defaults to None
mode (str, optional) – Can be fp32, fp16 or int8, defaults to fp16
optimisation_profile (Tuple[int, ...], optional) – Tuple defining the optimisation profile to use, defaults to None
verbosity (bool, optional) – Boolean to activate verbose mode, defaults to False
explicit_batch (bool, optional) – Sets explicit_batch flag, defaults to False
direct_io (bool, optional) – Sets direct_io flag, defaults to False
stdout (object, optional) – Out stream to write standard output, defaults to sys.stdout
stderr (object, optional) – Out stream to write errors output, defaults to sys.stderr
calib_cache (str, optional) – Where to write or read calibration cache file, defaults to None
- Raises:
ValueError – If no engine nor onnx file is provided
- allocate_buffers() CudaTrtBuffers [source]
Creates buffers for the engine. For now, only implicit dimensions are supported. Meaning, that dynamic shapes are not supported yet.
- Returns:
_description_
- Return type:
- load_engine(engine_file_path: str) ICudaEngine [source]
Loads a tensorRT engine from a file, using TensorRT.Runtime.
- Parameters:
engine_file_path (str) – Path to the engine file
- Returns:
TensorRT engine
- Return type:
trt.ICudaEngine
- is_dynamic(onnx_file_path: str) bool [source]
Returns True if the model is dynamic, False otherwise. By dynamic we mean that the model has at least one input with a dynamic dimension.
- Parameters:
onnx_file_path (str) – Path to the onnx file
- Returns:
True if the model is dynamic, False otherwise
- Return type:
bool
- create_context() IExecutionContext [source]
Creates a tensorRT execution context from the engine.
- Parameters:
tensorrt_engine (trt.ICudaEngine) – TensorRT engine
- Returns:
TensorRT execution context
- Return type:
trt.IExecutionContext
- build_engine(onnx_file_path: Optional[str] = None, engine_file_path: Optional[str] = None, mode: str = 'fp16', max_workspace_size: int = 30) ICudaEngine [source]
Builds a tensorRT engine from an onnx file. If the onnx model is detected as dynamic, then a dynamic engine is built, otherwise a static engine is built.
- Parameters:
onnx_file_path (str, optional) – Path to an onnx file, defaults to None
engine_file_path (str, optional) – Path to the engine file to save ,defaults to None
mode (str, optional) – Datatype mode fp32, fp16 or int8, defaults to “fp16”
max_workspace_size (int, optional) – maximum workspace size to use, defaults to 30
- Returns:
TensorRT engine
- Return type:
trt.ICudaEngine
- do_inference(stream: Optional[pycuda.driver.Stream] = None) None [source]
Executes the inference on the engine. This function assumes that the buffers are already filled with the input data.
- Parameters:
stream (cuda.Stream, optional) – Cuda Stream, defaults to None
- infer(inputs: Dict[str, darray], batched_input: bool = False, force_infer: bool = False, stream: Optional[pycuda.driver.Stream] = None) Optional[Dict[str, Bufferizer]] [source]
Method to call to perform inference on the engine. This method will automatically fill the buffers with the input data and execute the inference if the buffers are full. You can still force the inference by setting force_infer to True.
This expected inputs argument expects a dictionary of
dolphin.darray
objects or a dict of list ofdolphin.darray
. The keys of the dictionary must match the names of the inputs of the model.- Parameters:
inputs (Dict[str, darray]) – Dictionary of inputs
batched_input (bool, optional) – Consider input as batched, defaults to False
stream (cuda.Stream, optional) – Cuda stream to use, defaults to None
- Returns:
Output of the model
- Return type:
Union[Dict[str, dolphin.Bufferizer], None]
- property output: Dict[str, Bufferizer]
Returns the output of the
dolphin.CudaTrtBuffers
of the engine.- Returns:
Output bufferizer of the engine.
- Return type:
Dict[str, dolphin.Bufferizer]
- property input_shape: Dict[str, tuple]
Returns the shape of the inputs of the engine.
- Returns:
Shape of the inputs
- Return type:
dict
- property input_dtype: Dict[str, dtype]
Returns the datatype of the inputs of the engine.
- Returns:
Datatype of the inputs
- Return type:
Dict[str, dolphin.dtype]
- property output_shape: Dict[str, tuple]
Returns the shape of the outputs of the engine.
- Returns:
Shape of the outputs
- Return type:
dict
TensorRT CUDA Buffers
- class dolphin.CudaTrtBuffers(stream: Optional[pycuda.driver.Stream] = None)[source]
Bases:
object
To be used with the
darray
class. This class actually manages thedarray
used by the engine, both for inputs and outputs.To ease the use of the
darray
, this class can be understood as a dict in order to name inputs and outputs.Note that the names of inputs and outputs have to match the names of the inputs and outputs of the engine.
The constructor of this class takes an optional cuda.Stream as an argument.
- allocate_input(name: str, shape: tuple, buffer_size: int, dtype: object, buffer_full_hook: Optional[callable] = None, flush_hook: Optional[callable] = None, allocate_hook: Optional[callable] = None, append_one_hook: Optional[callable] = None, append_multiple_hook: Optional[callable] = None) None [source]
Method to allocate an input buffer. This methods creates a dolphin.Bufferizer and adds it to the inputs dict with the given name.
- Parameters:
name (str) – Name of the input.
shape (tuple) – Shape of a single element in the buffer.
buffer_size (int) – Size of the buffer.
dtype (object) – Dtype of the buffer.
buffer_full_hook (callable) – callable function called each time the buffer is full.
flush_hook (callable) – callable function called each time the buffer is flushed.
allocate_hook (callable) – callable function called each time the allocate method is called.
append_one_hook (callable) – callable function called each time the append_one method is called.
append_multiple_hook (callable) – callable function called each time the append_multiple method is called.
- allocate_output(name: str, shape: tuple, dtype: dtype) None [source]
Method to allocate an output buffer. Oppositely to the inputs, the outputs are not dolphin.Bufferizer. They are darray.
- Parameters:
name (str) – Name of the output.
shape (tuple) – Shape of a single element in the buffer.
dtype (dolphin.dtype) – Dtype of the buffer.
- flush(value: Any = 0) None [source]
Method to flush all the input buffers. Note that this method will trigger the flush_hook of each dolphin.Bufferizer.
- Parameters:
value (int, optional) – value to initialize the inputs with, defaults to 0
- append_one_input(name: str, data: darray)[source]
Method to append a single element to the input buffer. Note that this method will trigger the append_one_hook of the dolphin.Bufferizer. It will also trigger the buffer_full_hook if the buffer is full.
- Parameters:
name (str) – Name of the input.
data (dolphin.darray) – Data to append.
- append_multiple_input(name: str, data: darray)[source]
Method to append multiple elements to the input buffer. Note that this method will trigger the append_multiple_hook of the dolphin.Bufferizer. It will also trigger the buffer_full_hook if the buffer is full.
- Parameters:
name (str) – Name of the input.
data (dolphin.darray) – Data to append.
- property input_shape: Dict[str, tuple]
Property to get the shape of the inputs. Returns a dict with the name of the input as key and the shape as value.
- Returns:
Shape of the inputs.
- Return type:
Dict[str, tuple]
- property input_dtype: Dict[str, dtype]
Property to get the dtype of the inputs. Returns a dict with the name of the input as key and the dtype as value.
- Returns:
Dtype of the inputs.
- Return type:
Dict[str, dolphin.dtype]
- property output_shape: Dict[str, tuple]
Property to get the shape of the outputs. Returns a dict with the name of the output as key and the shape as value.
- Returns:
Shape of the outputs.
- Return type:
Dict[str, tuple]
- property output_dtype
Property to get the dtype of the outputs. Returns a dict with the name of the output as key and the dtype as value.
- Returns:
Dtype of the outputs.
- Return type:
Dict[str, dolphin.dtype]
- property full: bool
Property to check if the buffer is full. Returns True if at least one of the input buffer is full.
- Returns:
True if at least one of the input buffer is full.
- Return type:
bool
- property output: Dict[str, darray]
Property to get the output of the buffer. Returns a dict with the name of the output as key and the output as value.
- Returns:
Output of the buffer.
- Return type:
Dict[str, dolphin.darray]
- property input_bindings: List[darray]
Property to get the input bindings. ‘input bindings’ refers here to the list of input allocations
- Returns:
Input bindings.
- Return type:
List[dolphin.darray]
- property output_bindings: List[int]
Property to get the output bindings. ‘output bindings’ refers here to the list of output allocations
- Returns:
Output bindings.
- Return type:
List[int]
- property bindings
Property to get the bindings.
- Returns:
Bindings.
- Return type:
List[int]
Advanced Utilisation of Dolphin
Dolphin is based on CUDA and therefore it has some limitations that you should be aware of. For instance, memory allocation, memory copy, kernel launch, etc. are all consuming time that is perhaps shorter than the time needed by CPU only operations depending on the complexicity of computation required and the amount of data you want to process.
There is a general rule that you should always keep in mind when using Dolphin: The more data, the more complex the computation, the more Dolphin is efficient. There are still ways to speed up the execution of Dolphin functions. We will go through a few of them in this section.
Memory Management
You will have noticed that some Dolphin functions have a parameter usually called dst which is optionnal. This parameter is used to specify the destination of the result of the function. As mentionned in the introduction of this section, memory allocation and memory copy are consuming time, if you already have allocated a memory space to store a result, you can use it as a destination and save more time. Memory allocation can represent up to 95% of the total execution time of a function, it is not negligible.
Let’s take an example, we want to perform a addition between two matrices A and B. In this first code snippet, we run a naive addition, in the second one we use the dst parameter.
Naive approach:
import dolphin as dp
import time
N_ITER = int(1e3)
a = dp.zeros((100, 100))
b = dp.ones((100, 100))
t1 = time.time()
for i in range(N_ITER)
c = a + b
print(f"Naive approach: {1000*(time.time() - t1)/N_ITER}ms/iter")
Optimized approach:
import dolphin as dp
import time
N_ITER = int(1e3)
a = dp.zeros((100, 100))
b = dp.ones((100, 100))
c = dp.zeros((100, 100))
t1 = time.time()
for i in range(N_ITER)
dp.add(src=a, other=b, dst=c)
print(f"Optimized approach: {1000*(time.time() - t1)/N_ITER}ms/iter")
When your application is based on a loop or the consecutive execution of several functions, you should always try to use the dst parameter to save time. It can really be a game changer in some cases.
Usage of allocations
Dolphin by default allocates CUDA memory and operates on it. Also, any modification made on
a view of this array will imact the array itself. For instance, dolphin.darray.__getitem__()
returns a view of the current array, any in-place modification of the values on it will modify
the array, exactly like Numpy does:
import dolphin as dp
a = dp.zeros(shape=(2, 2), dtype=dp.float32)
a[:,0].fill(1)
print(a)
# array([[1., 0.],
# [1., 0.]], dtype=float32)
Usage of Cuda Stream
Dolphin is based on CUDA and therefore it is possible to use CUDA streams.
To create a cuda stream, you can use the dolphin.Stream()
function.
I recommend to read this pdf
about CUDA streams and concurrency.
To use CUDA streams with Dolphin, you can use the stream parameter of functions and constructors.
import dolphin as dp
stream = dp.Stream()
a = dp.zeros((100, 100), stream=stream)
b = dp.ones((100, 100), stream=stream)
c = dp.add(src=a, other=b)
# Wait for the stream to finish
stream.synchronize()
# Do something with c
With dolphin.Engine.infer()
you can provide a stream as an argument.
import dolphin as dp
stream = dp.Stream()
a = dp.zeros((100, 100), stream=stream)
b = dp.ones((100, 100), stream=stream)
c = dp.add(src=a, other=b)
# Wait for the stream to finish
stream.synchronize()
# Do something with c
# Run the inference on the stream
output = engine.infer(inputs={"input":c}, stream=stream)
# Wait for the stream to finish
stream.synchronize()
# Do something with the output