Welcome to Dolphin’s documentation!

Documentation version 0.0.10

Dolphin

A python package for GPU-accelerated data processing for TensorRT inference. Dolphin notably provides a set of functions to manipulate GPU arrays (dolphin.darray) and images (dolphin.dimage) and TensorRT functions (dolphin.Engine).

Official documentation :

https://dolphin-python.readthedocs.io/en/latest/

This package is strongly relying on the CUDA Python bindings PyCuda, available at https://github.com/inducer/pycuda, https://documen.tician.de/pycuda/. And TensorRT, available at https://developer.nvidia.com/tensorrt.

Installation

Dolphin is available on PyPI. You can install it using pip: pip install dolphin-python

and import it in your python code using import dolphin.

Core Concepts

Dolphin is a tool for CUDA-accelerated computing in a deep learning context. It is designed to be used as a library. Its purpose is to gather the most common optimisation techniques for deep learning inference and make them available in a simple and easy to use interface.

Overview

Dolphin provides a set of classes implementing CUDA-accelerated operations. The base class dolphin.darray is an object manipulating a CUDA array. It provides a set of numpy like methods to perform operations on the array.

The dolphin.dimage part is the part of the library dedicated to image processing. It provides a set of methods to manipulate images which are the ones providing the most speed up compared to common CPU implementations.

Simplicity

Dolphin is meant to provide a simple and easy to use interface for deep learning inference application, centralizing the most common optimisation techniques in a single library in order to have an easy-to-use optimized library.

Disclaimer

This library is currently under development. The API might not be stable yet. Some features might be missing, some might be broken, some might be optimized. You are vert welcome to contribute to this project. Be kind, be constructive, be open.

Getting Started with Dolphin

Manipulating dolphin.dtype :

dolphin.dtype is a class that represents the data type of a dolphin.darray object. It is similar to numpy’s dtype. It is used to create a gate between numpy types and cuda types. It currently supports the following operations :

Creating dolphin.dtype :

There are several ways to create a dolphin.dtype object :

import dolphin as dp
import numpy as np

d = dp.dtype.float32
print(d)  # float

# Create a dtype from a numpy dtype
d = dp.dtype.from_numpy_dtype(np.float32)

Manipulating dolphin.darray :

Creating dolphin.darray :

There are several ways to create a dolphin.darray object :

import dolphin as dp
import numpy as np

# Create a darray from a numpy array
a = np.arange(10).astype(np.float32)
d = dp.darray(array=a)

# Create a zero-filled darray
d = dp.zeros(shape=(10,), dtype=dp.float32)

# Create an empty darray
d = dp.empty(shape=(10,), dtype=dp.float32)

# or
d = dp.darray(shape=(10,), dtype=dp.float32)

# Create a zeros darray like another
d = dp.zeros_like(d)

# Create an empty darray like another
d = dp.empty_like(d)

# Create a ones darray
d = dp.ones(shape=(10,), dtype=dp.float32)

# Create a ones darray like another
d = dp.ones_like(d)
Numpy-Dolphin interoperability :

You can convert a dolphin.darray object to a numpy array using the method dolphin.darray.to_numpy(). You can also convert a numpy array to a dolphin.darray object using the function dolphin.from_numpy().

import dolphin as dp
import numpy as np

# numpy to darray using dolphin constructor
a = np.arange(10).astype(np.float32)
d = dp.darray(array=a)

# Convert a darray to a numpy array
a = d.to_numpy()

# Convert a numpy array to a darray
# numpy array and darray need to
# have the same dtype and shape.
d = dp.from_numpy(a)
Transpose dolphin.darray :

Transpose a dolphin.darray object is easy and works like numpy. You can use the method dolphin.darray.transpose(), the shortcut dolphin.darray.T or the function dolphin.transpose().

import dolphin as dp

d = dp.darray(shape=(4, 3, 2), dtype=dp.float32)
print(d.shape)  # (4, 3, 2)

t = d.transpose(1, 0, 2)
print(d.shape)  # (3, 4, 2)

# You can also use the shortcut
t = d.T
print(d.shape)  # (2, 4, 3)

# Or dp.transpose
t = dp.transpose(src=d, axes=(2, 1, 0))
Cast dolphin.darray :

As numpy implements astype operation, Dolphin also implements it. You can use the method dolphin.darray.astype(). Also, take a look at dolphin.dtype to see the supported types.

import dolphin as dp

d = dp.darray(shape=(4, 3, 2), dtype=dp.float32)
print(d.dtype)  # float32

d = d.astype(dp.int32)
print(d.dtype)  # int32
Indexing dolphin.darray :

Indexing a dolphin.darray object is easy and works like numpy.

import dolphin as dp
import numpy as np

n = np.random.rand(10, 10).astype(np.float32)
d = dp.darray(array=n)

d_1 = d[0:5, 0:5]
d_2 = d[5:10, 5:10]

Indexing works in read and write mode :

import dolphin as dp
import numpy as np

d = dp.zeros((4, 4))

d[0:2, 0:2] = 10
d[2:4, 2:4] = 20

print(d)
#  array([[10., 10.,  0.,  0.],
#         [10., 10.,  0.,  0.],
#         [ 0.,  0., 20., 20.],
#         [ 0.,  0., 20., 20.]])
Operations with dolphin.darray :

Dolphin implements several operations with dolphin.darray objects :

import dolphin as dp

d = dp.zeros((4, 4))
z = dp.ones((4, 4))

# Addition
d = d + z
d += 5

# Subtraction
d = d - z
d -= 5

# Multiplication
d = d * z
d *= 5

# Division
d = d / z
d /= 5

Manipulating dolphin.dimage :

As dolphin.dimage is a subclass of dolphin.darray, you can use all the methods and functions of dolphin.darray. On top of that, dolphin.dimage implements several methods and functions to manipulate images as well as image specific attributes.

Creating dolphin.dimage :

Creating a dolphin.dimage object is easy and works like dolphin.darray. The difference comes from is the argument dimage_channel_format. This argument is used to specify the channel format of the image. It has to be dolphin.dimage_channel_format, by default : py:attr:dolphin.dimage_channel_format.DOLPHIN_BGR.

import dolphin as dp
import cv2

image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)

# or
d = dp.dimage(array=image, channel_format=dp.dimage_channel_format.DOLPHIN_BGR)
Resizing dolphin.dimage :

With Dolphin, you can resize a dolphin.dimage object using 2 methods dolphin.dimage.resize() and dolphin.dimage.resize_padding(). The first one resizes the image without padding. The second one resizes the image with padding. The padding is computed to keep the aspect ratio of the image.

import dolphin as dp
import cv2

image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)

# Resize without padding
a = d.resize((100, 100))
print(a.shape)  # (100, 100, 3)

# Resize with padding
b = d.resize_padding((100, 100), padding_value=0)
print(b.shape)  # (100, 100, 3)
Normalization dolphin.dimage :

With Dolphin, you can normalize a dolphin.dimage object using the method dolphin.dimage.normalize(). You have Normalization modes defined by the Enum class dolphin.dimage_normalize_type. By default, the mode is dolphin.dimage_normalize_type.DOLPHIN_255. This method is optimized.

import dolphin as dp
import cv2

image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)

# image/255
a = d.normalize(dp.DOLPHIN_255)

# image/127.5 - 1
b = d.normalize(dp.DOLPHIN_TF)

# image - mean/std
c = d.normalize(dp.DOLPHIN_MEAN_STD, mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
Change channel format dolphin.dimage :

The equivalent of cv2.cvtColor is dolphin.dimage.cvtColor() which converts a dolphin.dimage object from one channel format to another. The channel formats are defined by the Enum class dolphin.dimage_channel_format.

import dolphin as dp
import cv2

image = cv2.imread("your_image.jpg")
d = dp.dimage(array=image)

a = dp.dimage.cvtColor(d, dp.dimage_channel_format.DOLPHIN_GRAY_SCALE) # BGR to GRAY
b = dp.dimage.cvtColor(d, dp.dimage_channel_format.DOLPHIN_RGB) # BGR to RGB

Manipulating dolphin.Engine :

Creating dolphin.Engine :

dolphin.Engine is a TensorRT based object. It is used to create, manage and run TensorRT engines. To create an dolphin.Engine object, you need to specify the path to an onnx model or a TensorRT engine. You can also specify different other arguments in order to customize the engine built.

import dolphin as dp

# Create an engine from an onnx model
engine = dp.Engine(onnx_file_path="your_model.onnx")

# Create an engine from a TensorRT engine
engine = dp.Engine(engine_path="your_engine.trt")

# Create an engine from an onnx model and specify different arguments
engine = dp.Engine(onnx_file_path="your_model.onnx",
                   engine_path="your_engine.trt",
                   mode="fp16",
                   explicit_batch=True,
                   direct_io=False)
Running a dolphin.Engine :

Once a dolphin.Engine is created, you can run it using the method dolphin.Engine.infer(). This method takes a dictionary as argument, this dictionary defines the inputs of the engine. The keys of the dictionary are the names of the inputs of the engine. The values of the dictionary are dolphin.darray objects. The method returns a dictionary with the outputs of the engine or None (see below). The keys of the dictionary are the names of the outputs of the engine. The values of the dictionary are dolphin.darray.

dolphin.Engine implements internally dolphin.CudaTrtBuffers which are used to efficiently bufferize the inputs of the engine. The purpose is to memory copy between host and device and to rather do device to device copies which is faster. By default, calling dolphin.Engine.infer() will be batch-blocking, meaning that the method will not infer the engine if the buffer is not full, it allows the user to fill the buffer automatically. You can still force infer with the argument force_infer=True.

Here are some use cases of dolphin.Engine.infer().

import dolphin as dp

engine = dp.Engine(engine_path="your_engine.trt") # batch size = 1

input_dict = {
    "image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}

output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred

print(output) # {"output": darray(shape=(1000,), dtype=float32)}

In case you want to use a batch size greater than 1.

import dolphin as dp

engine = dp.Engine(engine_path="your_engine.trt") # batch size = 2


input_dict = {
    "image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}

output = engine.infer(inputs=input_dict) # batch-blocking

print(output) # None

output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred

print(output) # {"output": darray(shape=(2,1000), dtype=float32)}

# or you can force infer

import dolphin as dp

engine = dp.Engine(engine_path="your_engine.trt") # batch size = 2


input_dict = {
    "image": dp.zeros(shape=(224,224,3), dtype=dp.float32)
}

output = engine.infer(inputs=input_dict, force_infer=True) # batch-blocking

print(output) # {"output": darray(shape=(2,1000), dtype=float32)}

You can also use batched inferences.

import dolphin as dp

engine = dp.Engine(engine_path="your_engine.trt") # batch size = 16


input_dict = {
    "image": dp.zeros(shape=(16, 224, 224, 3), dtype=dp.float32)
}

output = engine.infer(inputs=input_dict) # The buffer is full, the engine is inferred

print(output) # {"output": darray(shape=(16,1000), dtype=float32)}

Full example

You can go to the examples folder to see a full example of how to use the library. Here, we will go step by step through Yolov7 inference using Dolphin.

1. Preprocessing

Most of the time, we underestimate the latency of preprocessing and try to find ways to accelerate the inference part which would make a lot of sense if the bottleneck was indeed the inference time. In reality, in real-time applications, it often happens that your fps are drastically decreased compared to your expectations due to pre/post processing. In this example, Yolov7 needs images to be resized using dp.dimage.resize_padding() method in order to keep the orginal aspect ratio of the image as well as it needs to be normalized. A good practice would be to resize your image first before doing any further processings in order to limit the amount of data processed at a time.

Keep in mind that it is much better to pre-allocate the dp.darray and dp.dimage in order not to perform memory allocation during the core of your application. This is what we will be doing here.

import cv2
import dolphin as dp

stream = cv2.VideoCapture("your_video.mp4")

# We need to know the size of the frame
width = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT))

# As OpenCV reads HWC uint8_t images, we allocate the
# corresponding dp.dimage
d_frame = dp.dimage(shape=(height, width, 3), dtype=dp.uint8)

# Yolov7 is processing directly CHW images, we thus have to
# transpose the array, meaning, pre-allocate where we will
# store the transposed reordered data
transposed_frame = dp.dimage(shape=(3, height, width),
                            dtype=dp.uint8,
                            stream=stream)

# We also pre-allocate the image once resized in order
# (640, 640) is the size Yolov7 works with
resized_frame = dp.dimage(shape=(3, 640, 640),
                          dtype=dp.uint8,
                          stream=stream)

# Once the image is correctly formatted, meaning :
# 3x640x640 uint8, we need to normalize the image
# between 0<=image<=1. To do so, we need to use
# dp.DOLPHIN_255 flag which will write float32
# data
inference_frame = dp.dimage(shape=(3, 640, 640),
                            dtype=dp.float32,
                            stream=stream)
2. Inference

We thus have pre-allocated 18MB to speed up the preprocessing by avoiding on-the-fly allocations. Shall we now go through the inference part of all of this.

# We now instanciate our AI model as a TensorRT engine
engine = dp.Engine("your_model.onnx",
                   "your_model.engine",
                   mode="fp16",
                   verbosity=True)

while(True):
    # We copy the OpenCV frame onto the GPU
    d_frame.from_numpy(frame)

    # We process the frame
    # 1. We transpose the frame
    d_frame = d_frame.transpose(2, 0, 1)

    # 2. We perform padding resize
    _, r, dwdh = dp.resize_padding(src=transposed_frame,
                                   shape=(640, 640),
                                   dst=resized_frame)

    # 3. We do channel swapping in order to transform
    # our BGR image into RGB
    dp.cvtColor(src=resized_frame,
                color_format=dp.DOLPHIN_RGB,
                dst=resized_frame)

    # 4. We normalize the frame as described just above
    dp.normalize(src=resized_frame,
                 dst=inference_frame,
                 normalize_type=dp.DOLPHIN_255)

    # 5. We finally infer our model
    output = engine.infer({"images": inference_frame})

Dolphin darray

class dolphin.darray(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.float32, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[pycuda.driver.DeviceAllocation] = None)[source]

Bases: object

This class implements a generic numpy style array that can be used with the dolphin library. It implements common features available with numpy arrays such as astype, transpose, copy

darray is made with the same philosophy as numpy.ndarray. The usability is really close to numpy arrays. However, darray is meant to be much more performant than numpy.ndarray since it is GPU accelerated.

Parameters:
  • shape (Tuple[int, ...], optional) – Shape of the darray, defaults to None

  • dtype (dolphin.dtype, optional) – dtype of the darray, defaults to None

  • stream (cuda.Stream, optional) – CUDA stream to use, defaults to None

  • array (numpy.ndarray, optional) – numpy array to copy, defaults to None

  • strides (Tuple[int, ...], optional) – strides of the darray, defaults to None

  • allocation (cuda.DeviceAllocation, optional) – CUDA allocation to use, defaults to None

  • allocation_size (int, optional) – Size of the allocation, defaults to None

__init__(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.float32, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[pycuda.driver.DeviceAllocation] = None) None[source]
static broadcastable(shape_1: Tuple[int, ...], shape_2: Tuple[int, ...]) bool[source]

Checks if two shapes are broadcastable.

Parameters:
  • shape_1 (Tuple[int, ...]) – First shape

  • shape_2 (Tuple[int, ...]) – Second shape

Returns:

True if the shapes are broadcastable, False otherwise

Return type:

bool

static compute_strides(shape: Tuple[int, ...]) Tuple[int, ...][source]

Computes the strides of an array from the shape. The strides are the number of elements to skip to get to the next element. Also, the strides are in elements, not bytes.

Parameters:

shape (Tuple[int, ...]) – shape of the ndarray

Returns:

Strides

Return type:

Tuple[int, …]

property shape_allocation: pycuda.driver.DeviceAllocation

Property to access the cuda allocation of the shape.

Returns:

The cuda allocation of the shape

Return type:

cuda.DeviceAllocation

property strides_allocation: pycuda.driver.DeviceAllocation

Property to access the cuda allocation of the strides.

Returns:

The cuda allocation of the strides

Return type:

cuda.DeviceAllocation

property ndim: uint32

Computes the number of dimensions of the array.

Returns:

Number of dimensions of the array

Return type:

numpy.uint32

property strides: Tuple[int, ...]

Property to access the strides of the array.

Returns:

Strides of the array

Return type:

Tuple[int, …]

property size: uint32

Property to access the size of the array. Size is defined as the number of elements in the array.

Returns:

The size of the array, in terms of number of elements

Return type:

numpy.uint32

property dtype: dtype

Property to access the dolphin.dtype of the array

Returns:

dolphin.dtype of the array

Return type:

dolphin.dtype

property shape: Tuple[int, ...]

Property to access the shape of the array

Returns:

Shape of the array

Return type:

Tuple[int, …]

property nbytes: int

Property to access the number of bytes of the array

Returns:

Number of bytes of the array

Return type:

int

property T: darray

Performs a transpose operation on the darray. This transpose reverse the order of the axes:

a = darray(shape=(2, 3, 4))
a.T.shape
>>> (4, 3, 2)

Also, please note that this function is not efficient as it performs a copy of the darray.

Returns:

Transposed darray

Return type:

darray

property stream: pycuda.driver.Stream

Property to access (Read/Write) the cuda stream of the array

Returns:

Stream used by the darray

Return type:

cuda.Stream

property allocation: pycuda.driver.DeviceAllocation

Property to access (Read/Write) the cuda allocation of the array

Returns:

The cuda allocation of the array

Return type:

cuda.DeviceAllocation

property np: ndarray

Alias for to_numpy()

Returns:

numpy.ndarray of the darray

Return type:

numpy.ndarray

to_numpy() ndarray[source]

Converts the darray to a numpy.ndarray. Note that a copy from the device to the host is performed.

Returns:

numpy.ndarray of the darray

Return type:

numpy.ndarray

from_numpy(array: ndarray) None[source]

Writes allocation from a numpy array. If the array is not the same shape or dtype as the darray, an error is raised.

Parameters:

array (numpy.ndarray) – Numpy array create the darray from

astype(dtype: dtype, dst: Optional[darray] = None) darray[source]

Converts the darray to a different dtype. Note that a copy from device to device is performed.

Parameters:
  • dtype (dolphin.dtype) – dtype to convert to

  • dst (darray, optional) – darray to write the result of the operation, defaults to None

Raises:

ValueError – In case the dst shape or dtype doesn’t match

Returns:

darray with the new dtype

Return type:

darray

transpose(*axes: int) darray[source]

Transposes the darray according to the axes.

Parameters:

axes (Tuple[int]) – Axes to permute

Returns:

Transposed darray

Return type:

darray

copy() darray[source]

Returns a copy of the current darray. Note that a copy from device to device is performed.

Returns:

Copy of the array with another cuda allocation

Return type:

darray

__getitem__(index: Union[int, slice, tuple]) darray[source]

Returns a view of the darray with the given index.

Parameters:

index (Union[int, slice, tuple]) – Index to use

Raises:
  • IndexError – Too many indexes. The number of indexes must be less than the number of dimensions of the array.

  • IndexError – Axes must be in the range of the number of dimensions of the array.

Returns:

View of the darray

Return type:

darray

__setitem__(index: Union[int, slice, tuple], other: Union[int, float, number, darray]) None[source]

Sets the value of the darray with the given index.

Parameters:
  • index (Union[int, slice, tuple]) – Index to use

  • other (Union[int, float, numpy.number, darray]) – Value to set

Returns:

View of the darray

Return type:

darray

__str__() str[source]

Returns the string representation of the numpy array. Note that a copy from the device to the host is performed.

Returns:

String representation of the numpy array

Return type:

str

__repr__() str[source]

Returns the representation of the numpy array. Note that a copy from the device to the host is performed.

Returns:

Representation of the numpy array

Return type:

str

flatten(dst: Optional[darray] = None) darray[source]

Returns a flattened view of the darray. Order = C.

Parameters:

dst (darray) – Destination darray

Returns:

Flattened view of the darray

Return type:

darray

fill(value: Union[int, float, number]) darray[source]

Fills the darray with the value of value.

Parameters:

value (Union[int, float, numpy.number]) – Value to fill the array with

Returns:

Filled darray

Return type:

darray

add(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Efficient addition of a darray with another object. Can be a darray or a scalar. If dst is None, normal __add__ is called. This method is much more efficient than the __add__ method because __add__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • dst (darray) – darray where to write the result

  • other (Union[int, float, numpy.number, 'darray']) – numpy.ndarray or scalar to add

Raises:

ValueError – If the size, dtype or shape of dst is not matching

Returns:

darray where the result is written. Can be dst or self

Return type:

darray

__add__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient addition of a darray with another object. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to add

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

__radd__(other: Union[int, float, number, darray]) darray

Non-Efficient addition of a darray with another object. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to add

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

__iadd__(other: Union[int, float, number, darray]) darray[source]

Implements += operator. As __add__, this method is not efficient because it implies a copy of the array where the usage of cuda.memalloc which is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to add

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

substract(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Efficient substraction of a darray with another object.

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract

  • dst (darray) – darray where the result is written

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

sub(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray

Efficient substraction of a darray with another object.

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract

  • dst (darray) – darray where the result is written

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

__sub__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient substraction of a darray with another object. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

reversed_substract(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Efficient reverse substraction of an object with darray. It is efficient if dst is provided because it does not invoke cuda.memalloc. If dst is not provided, normal __rsub__ is called.

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract

  • dst (darray) – darray where the result is written

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

__rsub__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient substraction of another object with darray. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

__isub__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient -= operation. It is not efficient because it implies a copy of the array where the result is written. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to substract

Raises:

ValueError – If other is not a scalar or a darray

Returns:

A copy of the darray where the result is written

Return type:

darray

multiply(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Efficient multiplication of a darray with another object. Can be a darray or a scalar. If dst is None, normal __add__ is called. This method is much more efficient than the __mul__ method because __mul__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • dst (darray) – darray where to write the result

  • other (Union[int, float, numpy.number, 'darray']) – numpy.ndarray or scalar to multiply

Raises:

ValueError – If the size, dtype or shape of dst is not matching

Returns:

darray where the result is written. Can be dst or self

Return type:

darray

mul(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray

Efficient multiplication of a darray with another object. Can be a darray or a scalar. If dst is None, normal __add__ is called. This method is much more efficient than the __mul__ method because __mul__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • dst (darray) – darray where to write the result

  • other (Union[int, float, numpy.number, 'darray']) – numpy.ndarray or scalar to multiply

Raises:

ValueError – If the size, dtype or shape of dst is not matching

Returns:

darray where the result is written. Can be dst or self

Return type:

darray

__mul__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient multiplication of a darray with another object. This multiplication is element-wise multiplication, not matrix multiplication. For matrix multiplication please refer to matmul. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to multiply

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__rmul__(other: Union[int, float, number, darray]) darray

Non-Efficient multiplication of a darray with another object. This multiplication is element-wise multiplication, not matrix multiplication. For matrix multiplication please refer to matmul. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to multiply

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__imul__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient multiplication of a darray with another object. This multiplication is element-wise multiplication, not matrix multiplication. For matrix multiplication please refer to matmul. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to multiply

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

divide(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Efficient division of a darray with another object. Can be a darray or a scalar. If dst is None, normal __div__ is called. This method is much more efficient than the __div__ method because __div__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

  • dst (darray) – darray where to write the result

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

div(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray

Efficient division of a darray with another object. Can be a darray or a scalar. If dst is None, normal __div__ is called. This method is much more efficient than the __div__ method because __div__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

  • dst (darray) – darray where to write the result

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__div__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

reversed_divide(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Efficient division of a darray with another object.

Can be a darray or a scalar. If dst is None, normal __rdiv__ is called. This method is much more efficient than the __rdiv__ method because __rdiv__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

  • dst (darray) – darray where to write the result

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

rdiv(other: Union[int, float, number, darray], dst: Optional[darray] = None) darray

Efficient division of a darray with another object.

Can be a darray or a scalar. If dst is None, normal __rdiv__ is called. This method is much more efficient than the __rdiv__ method because __rdiv__ implies a copy of the array. cuda.memalloc is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only).

Parameters:
  • other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

  • dst (darray) – darray where to write the result

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__rdiv__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient reverse division of an object by darray. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__idiv__(other: Union[int, float, number, darray]) darray[source]

Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__truediv__(other: Union[int, float, number, darray]) darray

Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__itruediv__(other: Union[int, float, number, darray]) darray

Non-Efficient division of a darray with another object. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__rtruediv__(other: Union[int, float, number, darray]) darray

Non-Efficient reverse division of an object by darray. This division is element-wise. This operation is not efficient because it implies a copy of the array using cuda.memalloc. This is really time consuming (up to 95% of the total latency is spent in cuda.memalloc only)

Parameters:

other (Union[int, float, numpy.number, 'darray']) – scalar or darray to divide by

Raises:

ValueError – If other is not a scalar or a darray

Returns:

The darray where the result is written

Return type:

darray

__len__() int[source]

Returns the length of the first dimension of the array.

Returns:

Size of the first dimension of the array

Return type:

int

__abs__() darray[source]

Returns the absolute value of the array.

Returns:

Absolute value of the array

Return type:

darray

absolute(dst: Optional[darray] = None) darray[source]

Returns the absolute value of the array.

Returns:

Absolute value of the array

Return type:

darray

__module__ = 'dolphin.core.darray'

Built-in functions

dolphin.zeros(shape: Tuple[int, ...], dtype: dtype = dtype.float32) darray[source]

Returns a darray for a given shape and dtype filled with zeros.

This function is a creation function, thus, it does not take an optional destination darray as argument.

Parameters:
  • shape (Tuple[int, ...]) – Shape of the array

  • dtype (dolphin.dtype) – Type of the array

Returns:

darray filled with zeros

Return type:

darray

dolphin.zeros_like(other: Union[darray, array]) darray[source]

Returns a darray filled with zeros with the same shape and dtype as another darray.

This function is a creation function, thus, it does not take an optional destination darray as argument.

Parameters:

other (darray) – darray to copy the shape and type from

Returns:

darray filled with zeros

Return type:

darray

dolphin.empty(shape: Tuple[int, ...], dtype: dtype = dtype.float32) darray[source]

Returns a darray of a given shape and dtype without initializing entries.

This function is a creation function, thus, it does not take an optional destination darray as argument.

Parameters:
  • shape (Tuple[int, ...]) – Shape of the array

  • dtype (dolphin.dtype) – Type of the array

Returns:

darray filled with random values

Return type:

darray

dolphin.empty_like(other: Union[darray, array]) darray[source]

Returns a darray without initializing entries with the same shape and dtype as another darray.

This function is a creation function, thus, it does not take an optional destination darray as argument.

Parameters:

other (darray) – darray to copy the shape and type from

Returns:

darray filled with random values

Return type:

darray

dolphin.ones(shape: Tuple[int, ...], dtype: dtype = dtype.float32) darray[source]

Returns a darray for a given shape and dtype filled with ones.

This function is a creation function, thus, it does not take an optional destination darray as argument.

Parameters:
  • shape (Tuple[int, ...]) – Shape of the array

  • dtype (dolphin.dtype) – Type of the array

Returns:

darray filled with ones

Return type:

darray

dolphin.ones_like(other: Union[darray, array]) darray[source]

Returns a darray filled with ones with the same shape and dtype as another darray.

This function is a creation function, thus, it does not take an optional destination darray as argument.

Parameters:

other (darray) – darray to copy the shape and type from

Returns:

darray filled with ones

Return type:

darray

dolphin.transpose(src: darray, axes: Tuple[int, ...]) darray[source]

Returns a darray with the axes transposed.

Parameters:
  • axes (Tuple[int, ...]) – Axes to transpose

  • src (darray) – darray to transpose

Returns:

Transposed darray

Return type:

darray

dolphin.add(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Returns the addition of two darrays. It works that way:

result = src + other
Parameters:
  • src (darray) – First darray

  • other (Union[int, float, numpy.number, 'darray']) – Second darray or scalar

Returns:

Addition of the two darrays

Return type:

darray

dolphin.multiply(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Returns the multiplication of two darrays. It works that way:

result = src * other
Parameters:
  • src (darray) – First darray

  • other (Union[int, float, numpy.number, 'darray']) – Second darray or scalar

Returns:

Multiplication of the two darrays

Return type:

darray

dolphin.divide(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Returns the division of a darray by an object. It works that way:

result = src / other
Parameters:
  • src (darray) – First darray

  • other (Union[int, float, numpy.number, 'darray']) – Second darray or scalar

Returns:

Division of the two darrays

Return type:

darray

dolphin.reversed_divide(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Returns the division of a darray and an object. It works that way:

result = other / src
Parameters:
  • src (darray) – First darray

  • other (Union[int, float, numpy.number, 'darray']) – Second darray or scalar

Returns:

Division of the two darrays

Return type:

darray

dolphin.substract(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Returns the substraction of two darrays. It works that way:

result = src - other
Parameters:
  • src (darray) – First darray

  • other (Union[int, float, numpy.number, 'darray']) – Second darray or scalar

Returns:

Substraction of the two darrays

Return type:

darray

dolphin.reversed_substract(src: darray, other: Union[int, float, number, darray], dst: Optional[darray] = None) darray[source]

Returns the substraction of a darray and an object. It works that way:

result = other - src
Parameters:
  • src (darray) – First darray

  • other (Union[int, float, numpy.number, 'darray']) – Second darray or scalar

Returns:

Substraction of the two darrays

Return type:

darray

dolphin.absolute(array: darray, dst: Optional[darray] = None) darray[source]

Returns the absolute value of a darray.

Parameters:

array (darray) – darray to take the absolute value of

Returns:

Absolute value of the darray

Return type:

darray

Dolphin dimage

class dolphin.dimage(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.uint8, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, channel_format: Optional[dimage_channel_format] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[int] = None)[source]

Bases: darray

This class inherits from darray in order to provide image processing functionalities.

dimage is made with the same philosophy as darray but with additionnal functionalities for image processing gpu-accelerated. It supports all the methods defined in darray but has some specific attributes in order to better handle images.

We tried to partly follow the same philosophy as OpenCV in order to make the transition easier.

Important : dimage is assuming to follow (height, width) order as per defined in OpenCV.

Parameters:
  • shape (Tuple[int, ...][int, ...], optional) – Shape of the darray, defaults to None

  • dtype (dolphin.dtype, optional) – dtype of the darray, defaults to None

  • stream (cuda.Stream, optional) – CUDA stream to use, defaults to None

  • array (numpy.ndarray, optional) – numpy array to copy, defaults to None

  • channel_format (dimage_channel_format, optional) – Channel format of the image, defaults to None

  • strides (Tuple[int, ...][int, ...], optional) – strides of the darray, defaults to None

  • allocation (cuda.DeviceAllocation, optional) – CUDA allocation to use, defaults to None

  • allocation_size (int, optional) – Size of the allocation, defaults to None

__init__(shape: Optional[Tuple[int, ...]] = None, dtype: dtype = dtype.uint8, stream: Optional[pycuda.driver.Stream] = None, array: Optional[ndarray] = None, channel_format: Optional[dimage_channel_format] = None, strides: Optional[Tuple[int, ...]] = None, allocation: Optional[pycuda.driver.DeviceAllocation] = None, allocation_size: Optional[int] = None) None[source]
copy() dimage[source]

Returns a copy of the image.

Returns:

The copy of the image

Return type:

dimage

property image_channel_format: dimage_channel_format

Returns the image channel format.

Returns:

The image channel format

Return type:

dimage_channel_format

property image_dim_format: dimage_dim_format

Returns the image dimension format.

Returns:

The image dimension format

Return type:

dimage_dim_format

property height: uint16

Returns the height of the image.

Returns:

The height of the image

Return type:

numpy.uint16

property width: uint16

Returns the width of the image.

Returns:

The width of the image

Return type:

numpy.uint16

property channel: uint8

Returns the number of channels of the image.

Returns:

The number of channels of the image

Return type:

numpy.uint8

astype(dtype: dtype, dst: Optional[dimage] = None) dimage[source]

Convert the image to a different type

This function converts the image to a different type. To use it efficiently, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32)
src.astype(dolphin.dtype.float32, dst)
Parameters:
resize_padding(shape: Tuple[int, ...], dst: Optional[dimage] = None, padding_value: Union[int, float] = 127) Tuple[dimage, float, Tuple[int, int]][source]

Padded resize the image

This function resizes the image to a new shape with padding. It means that the image is resized to the new shape and the remaining pixels are filled with the padding value (127 by default).

If for instance the image is resized from (50, 100) to (200, 200), the aspect ratio of the image is preserved and the image is resized to (100, 200). The remaining pixels are filled with the padding value. In this scenario, the padding would appear on the left and right side of the image, with a width of 50 pixels.

In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.uint8)
src.resize_padding((200, 200), dst)

If aspect ratio does not matter, the function resize() can be used.

Parameters:
  • shape (Tuple[int, ...]) – The new shape of the image

  • dst (dimage) – The destination image

  • padding_value (int or float) – The padding value

Returns:

The resized image and the offset parameters of the image

Return type:

Tuple[dimage, float, Tuple[int, int]]

resize(shape: Tuple[int, ...], dst: Optional[dimage] = None) dimage[source]

Resize the image

This function performs a naive resize of the image. The resize type is for now only dolphin.dimage_resize_type.DOLPHIN_NEAREST. In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32)
src.resize((200, 200), dst)

The returned resized image aspect ratio might change as the new shape is not necessarily a multiple of the original shape.

If aspect ratio of the orginal image matters to you, use resize_padding() instead:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32)
src.resize_padding((200, 200), dst)
Parameters:
  • shape (Tuple[int, ...]) – The new shape of the image

  • dst (dimage) – The destination image

__module__ = 'dolphin.core.dimage'
crop_and_resize(coordinates: darray, size: Tuple[int, int], dst: Optional[darray] = None) darray[source]

This method is well performing for cropping and resizing images. The use case would be to process a batch of images which would be the crops of a bigger image. The coordinates of the crops would, for instance, be the output of a detection model.

In terms of usability, the coordinates are expected to be in the following format: [[x1, y1, x2, y2], …] where x1, y1, x2, y2 are the absolute coordinates of the top left and bottom right corners of the crop in the original image.

Coordinates would thus be a dolphin.darray of shape (n, 4) where n is the number of crops, n >= 1.

Also, for faster execution, the destination darray is expected to be preallocated and passed as an argument to the function.

Parameters:
  • coordinates (dolphin.darray) – darray of coordinates

  • size (Tuple[int, int]) – Tuple of the desired shape (width, height)

  • dst (dolphin.darray, optional) – Destination darray for faster execution, defaults to None

Returns:

darray of cropped and resized images

Return type:

dolphin.darray

crop_and_resize_padding(coordinates: darray, size: Tuple[int, int], padding: Union[int, float] = 127, dst: Optional[darray] = None) darray[source]

This method is well performing for cropping and resizing images using padded resize. The use case would be to process a batch of images which would be the crops of a bigger image. The coordinates of the crops would, for instance, be the output of a detection model.

In terms of usability, the coordinates are expected to be in the following format: [[x1, y1, x2, y2], …] where x1, y1, x2, y2 are the absolute coordinates of the top left and bottom right corners of the crop in the original image.

Coordinates would thus be a dolphin.darray of shape (n, 4) where n is the number of crops, n >= 1.

Also, for faster execution, the destination darray is expected to be preallocated and passed as an argument to the function.

Parameters:
  • coordinates (dolphin.darray) – darray of coordinates

  • size (Tuple[int, int]) – Tuple of the desired shape (width, height)

  • dst (dolphin.darray, optional) – Destination darray for faster execution, defaults to None

Returns:

darray of cropped and resized images

Return type:

dolphin.darray

normalize(normalize_type: dimage_normalize_type = dimage_normalize_type.DOLPHIN_255, mean: Optional[List[Union[int, float]]] = None, std: Optional[List[Union[int, float]]] = None, dtype: dtype = dtype.float32, dst: Optional[dimage] = None) dimage[source]

Normalize the image

This function is a function to efficiently normalize an image in different manners.

The mean and std values must be passed as a list of values if you want to normalize the image using the dolphin.dimage_normalize_type.DOLPHIN_MEAN_STD normalization type. To use this function efficiently, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32)
src.normalize(dimage_normalize_type.DOLPHIN_MEAN_STD,
              mean=[0.5, 0.5, 0.5],
              std=[0.5, 0.5, 0.5], dst=dst)

or:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32)
src.normalize(dimage_normalize_type.DOLPHIN_255, dst=dst)
Parameters:
  • normalize_type (dimage_normalize_type) – The type of normalization

  • mean (List[Union[int, float]]) – The mean values

  • std (List[Union[int, float]]) – The std values

  • dst (dimage) – The destination image

cvtColor(color_format: dimage_channel_format, dst: Optional[dimage] = None) dimage[source]

Transforms the image to the specified color format.

This function transforms the image to the specified color format. The supported color formats are:

- dolphin.dimage_channel_format.DOLPHIN_RGB
- dolphin.dimage_channel_format.DOLPHIN_BGR
- dolphin.dimage_channel_format.DOLPHIN_GRAY_SCALE
Parameters:
transpose(*axes: int) dimage[source]

Transpose the image.

This function transposes the image. The axes are specified as a sequence of axis numbers.

To be used efficiently, the destination image must be provided and must have the same shape as the source image:

src = dimage(shape=(2, 3, 4), dtype=np.float32)
dst = dimage(shape=(4, 3, 2), dtype=np.float32)
src.transpose(2, 1, 0)
Parameters:

axes (*int) – The permutation of the axes

Returns:

The transposed image

Return type:

dimage

dimage Enumerations

class dolphin.dimage_dim_format(value)[source]

Bases: Enum

Image dimension format.

DOLPHIN_CHW: int = 0
DOLPHIN_HWC: int = 1
DOLPHIN_HW: int = 2
class dolphin.dimage_channel_format(value)[source]

Bases: Enum

Image channel format.

DOLPHIN_RGB: int = 0
DOLPHIN_BGR: int = 1
DOLPHIN_GRAY_SCALE: int = 2
class dolphin.dimage_resize_type(value)[source]

Bases: Enum

Image resize type.

DOLPHIN_NEAREST stands for nearest neighbor interpolation resize. Its equivalent in opencv is INTER_NEAREST:

cv2.resize(src, (width, height), interpolation=cv2.INTER_NEAREST)

DOLPHIN_PADDED stands for padded nearest neighbor interpolation resize. Its equivalent can be found here. # noqa

DOLPHIN_NEAREST: int = 0
DOLPHIN_PADDED: int = 1
class dolphin.dimage_normalize_type(value)[source]

Bases: Enum

Image normalize type.

DOLPHIN_MEAN_STD stands for normalization by mean and std. Which is:

image = (image - mean) / std

DOLPHIN_255 stands for normalization by 255. Which is:

image = image / 255.0

DOLPHIN_TF stands for tensorflow normalization. Which is:

image = image / 127.5 - 1.0
DOLPHIN_MEAN_STD: int = 0
DOLPHIN_255: int = 1
DOLPHIN_TF: int = 2

Built-in functions

dolphin.resize_padding(src: dimage, shape: Tuple[int, ...], padding_value: Union[int, float] = 127, dst: Optional[dimage] = None) Tuple[dimage, float, Tuple[float, float]][source]

Padded resize the image

This function resizes the image to a new shape with padding. It means that the image is resized to the new shape and the remaining pixels are filled with the padding value (127 by default).

If for instance the image is resized from (50, 100) to (200, 200), the aspect ratio of the image is preserved and the image is resized to (100, 200). The remaining pixels are filled with the padding value. In this scenario, the padding would appear on the left and right side of the image, with a width of 50 pixels.

In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.uint8)
src.resize_padding((200, 200), dst)

If aspect ratio does not matter, the function resize() can be used.

Parameters:
  • shape (Tuple[int, ...]) – The new shape of the image

  • dst (dimage) – The destination image

  • padding_value (int or float) – The padding value

dolphin.resize(src: dimage, shape: Tuple[int, ...], dst: Optional[dimage] = None) dimage[source]

Resize the image

This function performs a naive resize of the image. The resize type is for now only dolphin.dimage_resize_type.DOLPHIN_NEAREST. In order to use this function in an efficient way, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32)
dolphin.resize(src, (200, 200), dst)

The returned resized image aspect ratio might change as the new shape is not necessarily a multiple of the original shape.

If aspect ratio of the orginal image matters to you, use resize_padding() instead:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(200, 200, 3), dtype=dolphin.dtype.float32)
dolphin.resize_padding(src, (200, 200), dst)
Parameters:
  • shape (Tuple[int, ...]) – The new shape of the image

  • dst (dimage) – The destination image

dolphin.normalize(src: dimage, normalize_type: dimage_normalize_type, mean: Optional[List[Union[int, float]]] = None, std: Optional[List[Union[int, float]]] = None, dtype: Optional[dtype] = None, dst: Optional[dimage] = None) None[source]

Normalize the image

This function is a wrapper for the normalize function of the dimage class. The mean and std values must be passed as a list of values if you want to normalize the image using the dolphin.dimage_normalize_type.DOLPHIN_MEAN_STD normalization type. To use this function efficiently, the destination image must be preallocated and passed as an argument to the function:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32)
normalize(src,
          dimage_normalize_type.DOLPHIN_MEAN_STD,
          mean=[0.5, 0.5, 0.5],
          std=[0.5, 0.5, 0.5], dst=dst)

or:

src = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.uint8)
dst = dimage(shape=(100, 100, 3), dtype=dolphin.dtype.float32)
normalize(src, dimage_normalize_type.DOLPHIN_255, dst=dst)
Parameters:
  • src (dimage) – Source image

  • normalize_type (dimage_normalize_type) – The type of normalization

  • mean (List[Union[int, float]]) – The mean values

  • std (List[Union[int, float]]) – The std values

  • dst (dimage) – The destination image

dolphin.cvtColor(src: dimage, color_format: dimage_channel_format, dst: Optional[dimage] = None) None[source]

Transforms the image to the specified color format.

This function is a wrapper for the cvtColor function of the dimage class.

This function transforms the image to the specified color format. The supported color formats are:

- dolphin.dimage_channel_format.DOLPHIN_RGB
- dolphin.dimage_channel_format.DOLPHIN_BGR
- dolphin.dimage_channel_format.DOLPHIN_GRAY_SCALE
Parameters:

Dolphin Bufferizer

class dolphin.Bufferizer(shape: tuple, buffer_size: int, dtype: dtype, stream: Optional[pycuda.driver.Stream] = None, flush_hook: Optional[callable] = None, allocate_hook: Optional[callable] = None, append_one_hook: Optional[callable] = None, append_multiple_hook: Optional[callable] = None, buffer_full_hook: Optional[callable] = None)[source]

Bases: object

Bufferizer is a class that allows to easily bufferize data on the GPU. The purpose is to handle seamlessly batched data and to avoid unnecessary memory allocation but rather reuse the same memory buffer and favour copy operations.

Bufferizer can, through its append methods, append data to the buffer. It can be either one darray at a time, a list of darray`s or a single batched `darray.

In addition to bufferizing data, the class also allows to trigger hooks at different moments of its lifecycle.

  • flush_hook : callable triggered when buffer is flushed

  • allocate_hook : callable triggered when buffer is allocated

  • append_one_hook : callable triggered when buffer has a new element appended

  • append_multiple_hook : callable triggered when buffer has new elements appended

  • buffer_full_hook : callable triggered when the buffer is full after calling any append

Parameters:
  • shape (tuple) – shape of element to bufferize

  • buffer_size (int) – size of the buffer

  • dtype (dolphin.dtype) – dtype of the element to bufferize

  • stream (cuda.Stream, optional) – stream to use for the buffer, defaults to None

  • flush_hook (callable, optional) – callable triggered when buffer is flushed, defaults to None

  • allocate_hook (callable, optional) – callable triggered when buffer is allocated, not triggered by the first allocation, defaults to None

  • append_one_hook (callable, optional) – callable triggered when buffer has a new element appended, defaults to None

  • append_multiple_hook (callable, optional) – callable triggered when buffer has new elements appended, defaults to None

  • buffer_full_hook (callable, optional) – callable triggered when the buffer is full after calling any append, defaults to None

__init__(shape: tuple, buffer_size: int, dtype: dtype, stream: Optional[pycuda.driver.Stream] = None, flush_hook: Optional[callable] = None, allocate_hook: Optional[callable] = None, append_one_hook: Optional[callable] = None, append_multiple_hook: Optional[callable] = None, buffer_full_hook: Optional[callable] = None)[source]

Constructor of the Bufferizer class.

allocate() None[source]

Method to allocate the buffer on the GPU. This method is called automatically when the class is instanciated.

Once the buffer is allocated, it is not possible to change the size and it the allocation is initialized to 0.

Also, this methods triggers the allocate_hook if it is not None.

append(element: Union[darray, List[darray]]) None[source]

General purpose append method. You can provide either a single darray, a batched darray or a list of darray and the method will handle it.

For more details about the handling of each case, see the append_one and append_multiple methods.

Parameters:

element (Union[dolphin.darray, List[dolphin.darray]]) – The element to append to the buffer.

append_one(element: darray) None[source]

Method to append one element to the buffer. The element is copied and appended to the buffer. element must be a darray of the same shape and dtype as the bufferizer. The size of the buffer is increased by one.

Appending one element triggers the append_one_hook if it is not None. Once the buffer is full, the buffer_full_hook is triggered.

Parameters:

element (dolphin.darray) – The element to append to the buffer.

append_multiple(element: Union[darray, List[darray]]) None[source]

Function used in order to append multiple darray to the buffer at once. The darray must be of the same shape and dtype as the bufferizer with the exception of the first dimension which can be different in case of a batched darray. Otherwise, the darray must be a list of darray of the same shape and dtype as the bufferizer.

The size of the buffer is increased by the number of elements in element.

It is assumed that the number of elements in element is defined as the first dimension of its shape. For instance:

element.shape = (batch_size, *self._shape)

Appending multiple elements triggers the append_multiple_hook if it is not None. Once the buffer is full, the buffer_full_hook is triggered if it is not None.

Parameters:

element (darray) – The element to append to the buffer.

flush(value: Any = 0) None[source]

Set the buffer to a given value. Useful in order to get rid of any residual data in the buffer.

Calling this method triggers the flush_hook if it is not None.

Parameters:

stream (cuda.Stream, optional) – Cuda stream, defaults to None

flush_hook(hook: callable)[source]

Method to set the flush hook. This hook is called when the flush method is called.

Parameters:

hook (callable) – Callable function called each time the flush method is called.

allocate_hook(hook: callable)[source]

Method to set the allocate hook. This hook is called when the allocate method is called.

Parameters:

hook (callable) – Callable function called each time the allocate method is called.

append_one_hook(hook: callable)[source]

Method to set the append one hook.

Parameters:

hook (callable) – Callable function called each time the append_one method is called.

append_multiple_hook(hook: callable)[source]

Method to set the append multiple hook.

Parameters:

hook (callable) – Callable function called each time the append_multiple method is called.

buffer_full_hook(hook: callable)[source]

Method to set the buffer full hook.

Parameters:

hook (callable) – Callable function called each time the buffer is full.

property allocation: pycuda.driver.DeviceAllocation

Property in order to get the allocation of the buffer.

Returns:

Allocation of the buffer.

Return type:

cuda.DeviceAllocation

property darray: darray

Property in order to convert a bufferizer to a darray.

Important note : The darray returned by this property is not a copy of the bufferizer. It is a view of the bufferizer. Any change to the darray will be reflected in the bufferizer and vice-versa.

Returns:

darray of bufferizer

Return type:

dolphin.darray

property element_nbytes: int

Property in order to get the number of bytes of a single element in the buffer.

Returns:

Number of bytes of a single element in the buffer.

Return type:

int

property nbytes: int

Property in order to get the number of bytes of the buffer.

Returns:

Number of bytes of the buffer.

Return type:

int

__module__ = 'dolphin.core.bufferizer'
property full: bool

Property in order to know if the buffer is full.

Returns:

True if the buffer is full, False otherwise.

Return type:

bool

property shape: tuple

Property in order to get the shape of the the buffer.

Returns:

Shape of the buffer.

Return type:

tuple

property element_shape: tuple

Property in order to get the shape of a single element in the buffer.

Returns:

Shape of a single element in the buffer.

Return type:

tuple

property dtype: dtype

Property in order to get the dtype of the buffer.

Returns:

Dtype of the buffer.

Return type:

dolphin.dtype

__len__() int[source]

Method in order to get the number of elements in the buffer.

Returns:

Number of elements in the buffer.

Return type:

int

Dolphin dtype

class dolphin.dtype(value)[source]

Bases: Enum

Dolphin data types In order to manage the data types in Dolphin, bind dolphin types the numpy data types as well as the CUDA data types. To do so, each element from the Enum class is a tuple containing the numpy data type (numpy.dtype) and the CUDA data type (str).

uint8 = (<class 'numpy.uint8'>, 'uint8_t')
uint16 = (<class 'numpy.uint16'>, 'uint16_t')
uint32 = (<class 'numpy.uint32'>, 'uint32_t')
int8 = (<class 'numpy.int8'>, 'int8_t')
int16 = (<class 'numpy.int16'>, 'int16_t')
int32 = (<class 'numpy.int32'>, 'int32_t')
float32 = (<class 'numpy.float32'>, 'float')
float64 = (<class 'numpy.float64'>, 'double')
__call__(value: Union[int, float, number]) dtype[source]

In order to use the data type as a function to cast a value into a particular type, we need to implement the __call__ method.

Example:

a = dtype.uint8
a(4) -> numpy.uint8(4)
Parameters:

value (Union[int, float, numpy.number]) – The value to cast to the data type

Returns:

The numpy casted number passed as value

Return type:

numpy.dtype

property numpy_dtype: dtype

Since Dolphin data types are tuples, we need to access the first element which is the numpy data type.

Returns:

The equivalent numpy data type of Dolphin data type

Return type:

numpy.dtype

property cuda_dtype: str

Since Dolphin data types are tuples, we need to access the second element which is the CUDA data type. Which as well are standard C types.

Returns:

The equivalent CUDA data type of Dolphin data type

Return type:

str

property itemsize: int

Returns the size of the data type in bytes. Uses the numpy data type to get the size :

>>> @property
>>> def itemsize(self) -> int:
>>>     return self.numpy_dtype.itemsize
static from_numpy_dtype(numpy_dtype: dtype) dtype[source]

Returns the equivalent Dolphin data type from the numpy data type.

Parameters:

numpy_dtype (numpy.dtype) – The numpy data type

Returns:

The equivalent Dolphin data type

Return type:

dtype

__getitem__(key: Union[str, int]) Union[dtype, str][source]

In order to dynamically access the numpy and CUDA data types, we also need to implement the __getitem__ method. if key is an integer, it will return one of the tuple element as long as the key is either 0 or 1. if key is a string, it will return the numpy or CUDA data type as long as the key is either ‘numpy_dtype’ or ‘cuda_dtype’.

Usage:

a = dtype.uint8
a[0]                  # numpy.uint8
a[1]                  # 'uint8_t'
a["numpy_dtype"]      # numpy.uint8
a["cuda_dtype"]       # 'uint8_t'
Parameters:

key (Union[str, int]) – ‘numpy_dtype’ or ‘cuda_dtype’ or a int 0 or 1

Raises:

KeyError – If the key is not valid as described above

Returns:

The numpy or CUDA data type

Return type:

Union[numpy.dtype, str]

__module__ = 'dolphin.core.dtype'

Dolphin CudaBase

CudaBase

class dolphin.CudaBase[source]

Bases: object

This class is mainly used to access device information, such as maximum number of threads per block, size of grid etc… This is a base class used by many other classes. It has a lot of class attributes in order not to load the same things again and again in order to speed up execution.

device: pycuda._driver.Device

Used device

max_threads_per_block: int

Maximum number of threads per blocks. Usually, it is 1024

max_grid_dim_x: int

Maximum number of blocks per grid x on dim

max_grid_dim_y: int

Maximum number of blocks per grid y on dim

max_grid_dim_z: int

Maximum number of blocks per grid z on dim

warp_size: int

Warp size

multiprocessor_count: int

Number of MP

threads_blocks_per_mp: int

Number of threads per MP

static GET_BLOCK_GRID_1D(n: int) Tuple[Tuple[int, int, int], Tuple[int, int]][source]

In ordert to perform memory coalescing on 1D iterations, we need to efficiently compute the block & grid sizes.

Parameters:

n (int) – Number of elements to process

Returns:

block, grid

Return type:

Tuple[Tuple[int, int, int], Tuple[int, int]]

static GET_BLOCK_X_Y(Z: int) Tuple[int, int, int][source]

Get the block size for a given Z. The block size is calculated using the following formula: (max(ceil(sqrt(MAX_THREADS_PER_BLOCKS/Z)),1), max(ceil(sqrt(MAX_THREADS_PER_BLOCKS/Z)),1), Z)

It is useful to quickly compute the block size that suits self._max_threads_per_block for a given Z which can be channels, depth, batch size, etc.

Parameters:

Z (int) – Size of the third dimension

Returns:

Optimal block size that ensure block[0]*block[1]*block[2] <= self._max_threads_per_block

Return type:

tuple

static GET_GRID_SIZE(size: tuple, block: tuple) Tuple[int, int][source]

Get the grid size for a given size and block size. The grid size is calculated using the following formula: (max(ceil(sqrt(size/block[0])),1), max(ceil(sqrt(size/block[1])),1))

This function should be used when the width and height of the image are the same or can be swapped.

Parameters:
  • size (tuple) – Total size of data

  • block (tuple) – Current value of the block size

Returns:

Grid size

Return type:

tuple

static GET_GRID_SIZE_HW(size: tuple, block: tuple) Tuple[int, int][source]

Get the grid size for a given size and block size. The grid size is calculated using the following formula: (max(ceil(sqrt(size[0]/block[0])),1), max(ceil(sqrt(size[1]/block[1])),1))

This function should be used when the width and height of the image are different and matter.

Parameters:
  • size (tuple) – Height and width of the image

  • block (tuple) – Current value of the block size

Returns:

Grid size

Return type:

tuple

Inference Engine

class dolphin.Engine(onnx_file_path: ~typing.Optional[str] = None, engine_path: ~typing.Optional[str] = None, mode: str = 'fp16', optimisation_profile: ~typing.Optional[~typing.Tuple[int, ...]] = None, verbosity: bool = False, explicit_batch: bool = False, direct_io: bool = False, stdout: object = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr: object = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, calib_cache: ~typing.Optional[str] = None)[source]

Bases: EEngine, IEngine

Class to manage TensorRT engines. It is able to read an engine from a file or to create one from an onnx file.

This class is using the CudaTrtBuffers class to manage batched buffers. Find more details on the documentation of infer().

TensorRT Github official Repository: https://github.com/NVIDIA/TensorRT

TensorRT official documentation: https://developer.nvidia.com/tensorrt

Parameters:
  • onnx_file_path (str, optional) – Path to the onnx file to use, defaults to None

  • engine_path (str, optional) – Path to the engine to read or to save, defaults to None

  • mode (str, optional) – Can be fp32, fp16 or int8, defaults to fp16

  • optimisation_profile (Tuple[int, ...], optional) – Tuple defining the optimisation profile to use, defaults to None

  • verbosity (bool, optional) – Boolean to activate verbose mode, defaults to False

  • explicit_batch (bool, optional) – Sets explicit_batch flag, defaults to False

  • direct_io (bool, optional) – Sets direct_io flag, defaults to False

  • stdout (object, optional) – Out stream to write standard output, defaults to sys.stdout

  • stderr (object, optional) – Out stream to write errors output, defaults to sys.stderr

  • calib_cache (str, optional) – Where to write or read calibration cache file, defaults to None

Raises:

ValueError – If no engine nor onnx file is provided

allocate_buffers() CudaTrtBuffers[source]

Creates buffers for the engine. For now, only implicit dimensions are supported. Meaning, that dynamic shapes are not supported yet.

Returns:

_description_

Return type:

dolphin.CudaTrtBuffers

load_engine(engine_file_path: str) ICudaEngine[source]

Loads a tensorRT engine from a file, using TensorRT.Runtime.

Parameters:

engine_file_path (str) – Path to the engine file

Returns:

TensorRT engine

Return type:

trt.ICudaEngine

is_dynamic(onnx_file_path: str) bool[source]

Returns True if the model is dynamic, False otherwise. By dynamic we mean that the model has at least one input with a dynamic dimension.

Parameters:

onnx_file_path (str) – Path to the onnx file

Returns:

True if the model is dynamic, False otherwise

Return type:

bool

create_context() IExecutionContext[source]

Creates a tensorRT execution context from the engine.

Parameters:

tensorrt_engine (trt.ICudaEngine) – TensorRT engine

Returns:

TensorRT execution context

Return type:

trt.IExecutionContext

build_engine(onnx_file_path: Optional[str] = None, engine_file_path: Optional[str] = None, mode: str = 'fp16', max_workspace_size: int = 30) ICudaEngine[source]

Builds a tensorRT engine from an onnx file. If the onnx model is detected as dynamic, then a dynamic engine is built, otherwise a static engine is built.

Parameters:
  • onnx_file_path (str, optional) – Path to an onnx file, defaults to None

  • engine_file_path (str, optional) – Path to the engine file to save ,defaults to None

  • mode (str, optional) – Datatype mode fp32, fp16 or int8, defaults to “fp16”

  • max_workspace_size (int, optional) – maximum workspace size to use, defaults to 30

Returns:

TensorRT engine

Return type:

trt.ICudaEngine

do_inference(stream: Optional[pycuda.driver.Stream] = None) None[source]

Executes the inference on the engine. This function assumes that the buffers are already filled with the input data.

Parameters:

stream (cuda.Stream, optional) – Cuda Stream, defaults to None

infer(inputs: Dict[str, darray], batched_input: bool = False, force_infer: bool = False, stream: Optional[pycuda.driver.Stream] = None) Optional[Dict[str, Bufferizer]][source]

Method to call to perform inference on the engine. This method will automatically fill the buffers with the input data and execute the inference if the buffers are full. You can still force the inference by setting force_infer to True.

This expected inputs argument expects a dictionary of dolphin.darray objects or a dict of list of dolphin.darray. The keys of the dictionary must match the names of the inputs of the model.

Parameters:
  • inputs (Dict[str, darray]) – Dictionary of inputs

  • batched_input (bool, optional) – Consider input as batched, defaults to False

  • stream (cuda.Stream, optional) – Cuda stream to use, defaults to None

Returns:

Output of the model

Return type:

Union[Dict[str, dolphin.Bufferizer], None]

property output: Dict[str, Bufferizer]

Returns the output of the dolphin.CudaTrtBuffers of the engine.

Returns:

Output bufferizer of the engine.

Return type:

Dict[str, dolphin.Bufferizer]

property input_shape: Dict[str, tuple]

Returns the shape of the inputs of the engine.

Returns:

Shape of the inputs

Return type:

dict

property input_dtype: Dict[str, dtype]

Returns the datatype of the inputs of the engine.

Returns:

Datatype of the inputs

Return type:

Dict[str, dolphin.dtype]

property output_shape: Dict[str, tuple]

Returns the shape of the outputs of the engine.

Returns:

Shape of the outputs

Return type:

dict

property output_dtype: Dict[str, dtype]

Returns the datatype of the outputs of the engine.

Returns:

Datatype of the outputs

Return type:

dict

TensorRT CUDA Buffers

class dolphin.CudaTrtBuffers(stream: Optional[pycuda.driver.Stream] = None)[source]

Bases: object

To be used with the darray class. This class actually manages the darray used by the engine, both for inputs and outputs.

To ease the use of the darray, this class can be understood as a dict in order to name inputs and outputs.

Note that the names of inputs and outputs have to match the names of the inputs and outputs of the engine.

The constructor of this class takes an optional cuda.Stream as an argument.

allocate_input(name: str, shape: tuple, buffer_size: int, dtype: object, buffer_full_hook: Optional[callable] = None, flush_hook: Optional[callable] = None, allocate_hook: Optional[callable] = None, append_one_hook: Optional[callable] = None, append_multiple_hook: Optional[callable] = None) None[source]

Method to allocate an input buffer. This methods creates a dolphin.Bufferizer and adds it to the inputs dict with the given name.

Parameters:
  • name (str) – Name of the input.

  • shape (tuple) – Shape of a single element in the buffer.

  • buffer_size (int) – Size of the buffer.

  • dtype (object) – Dtype of the buffer.

  • buffer_full_hook (callable) – callable function called each time the buffer is full.

  • flush_hook (callable) – callable function called each time the buffer is flushed.

  • allocate_hook (callable) – callable function called each time the allocate method is called.

  • append_one_hook (callable) – callable function called each time the append_one method is called.

  • append_multiple_hook (callable) – callable function called each time the append_multiple method is called.

allocate_output(name: str, shape: tuple, dtype: dtype) None[source]

Method to allocate an output buffer. Oppositely to the inputs, the outputs are not dolphin.Bufferizer. They are darray.

Parameters:
  • name (str) – Name of the output.

  • shape (tuple) – Shape of a single element in the buffer.

  • dtype (dolphin.dtype) – Dtype of the buffer.

flush(value: Any = 0) None[source]

Method to flush all the input buffers. Note that this method will trigger the flush_hook of each dolphin.Bufferizer.

Parameters:

value (int, optional) – value to initialize the inputs with, defaults to 0

append_one_input(name: str, data: darray)[source]

Method to append a single element to the input buffer. Note that this method will trigger the append_one_hook of the dolphin.Bufferizer. It will also trigger the buffer_full_hook if the buffer is full.

Parameters:
  • name (str) – Name of the input.

  • data (dolphin.darray) – Data to append.

append_multiple_input(name: str, data: darray)[source]

Method to append multiple elements to the input buffer. Note that this method will trigger the append_multiple_hook of the dolphin.Bufferizer. It will also trigger the buffer_full_hook if the buffer is full.

Parameters:
  • name (str) – Name of the input.

  • data (dolphin.darray) – Data to append.

property input_shape: Dict[str, tuple]

Property to get the shape of the inputs. Returns a dict with the name of the input as key and the shape as value.

Returns:

Shape of the inputs.

Return type:

Dict[str, tuple]

property input_dtype: Dict[str, dtype]

Property to get the dtype of the inputs. Returns a dict with the name of the input as key and the dtype as value.

Returns:

Dtype of the inputs.

Return type:

Dict[str, dolphin.dtype]

property output_shape: Dict[str, tuple]

Property to get the shape of the outputs. Returns a dict with the name of the output as key and the shape as value.

Returns:

Shape of the outputs.

Return type:

Dict[str, tuple]

property output_dtype

Property to get the dtype of the outputs. Returns a dict with the name of the output as key and the dtype as value.

Returns:

Dtype of the outputs.

Return type:

Dict[str, dolphin.dtype]

property full: bool

Property to check if the buffer is full. Returns True if at least one of the input buffer is full.

Returns:

True if at least one of the input buffer is full.

Return type:

bool

property output: Dict[str, darray]

Property to get the output of the buffer. Returns a dict with the name of the output as key and the output as value.

Returns:

Output of the buffer.

Return type:

Dict[str, dolphin.darray]

property input_bindings: List[darray]

Property to get the input bindings. ‘input bindings’ refers here to the list of input allocations

Returns:

Input bindings.

Return type:

List[dolphin.darray]

property output_bindings: List[int]

Property to get the output bindings. ‘output bindings’ refers here to the list of output allocations

Returns:

Output bindings.

Return type:

List[int]

property bindings

Property to get the bindings.

Returns:

Bindings.

Return type:

List[int]

Advanced Utilisation of Dolphin

Dolphin is based on CUDA and therefore it has some limitations that you should be aware of. For instance, memory allocation, memory copy, kernel launch, etc. are all consuming time that is perhaps shorter than the time needed by CPU only operations depending on the complexicity of computation required and the amount of data you want to process.

There is a general rule that you should always keep in mind when using Dolphin: The more data, the more complex the computation, the more Dolphin is efficient. There are still ways to speed up the execution of Dolphin functions. We will go through a few of them in this section.

Memory Management

You will have noticed that some Dolphin functions have a parameter usually called dst which is optionnal. This parameter is used to specify the destination of the result of the function. As mentionned in the introduction of this section, memory allocation and memory copy are consuming time, if you already have allocated a memory space to store a result, you can use it as a destination and save more time. Memory allocation can represent up to 95% of the total execution time of a function, it is not negligible.

Let’s take an example, we want to perform a addition between two matrices A and B. In this first code snippet, we run a naive addition, in the second one we use the dst parameter.

Naive approach:

import dolphin as dp
import time

N_ITER = int(1e3)

a = dp.zeros((100, 100))
b = dp.ones((100, 100))

t1 = time.time()
for i in range(N_ITER)
    c = a + b
print(f"Naive approach: {1000*(time.time() - t1)/N_ITER}ms/iter")

Optimized approach:

import dolphin as dp
import time

N_ITER = int(1e3)

a = dp.zeros((100, 100))
b = dp.ones((100, 100))
c = dp.zeros((100, 100))

t1 = time.time()
for i in range(N_ITER)
    dp.add(src=a, other=b, dst=c)
print(f"Optimized approach: {1000*(time.time() - t1)/N_ITER}ms/iter")

When your application is based on a loop or the consecutive execution of several functions, you should always try to use the dst parameter to save time. It can really be a game changer in some cases.

Usage of allocations

Dolphin by default allocates CUDA memory and operates on it. Also, any modification made on a view of this array will imact the array itself. For instance, dolphin.darray.__getitem__() returns a view of the current array, any in-place modification of the values on it will modify the array, exactly like Numpy does:

import dolphin as dp

a = dp.zeros(shape=(2, 2), dtype=dp.float32)
a[:,0].fill(1)

print(a)
# array([[1., 0.],
#        [1., 0.]], dtype=float32)

Usage of Cuda Stream

Dolphin is based on CUDA and therefore it is possible to use CUDA streams. To create a cuda stream, you can use the dolphin.Stream() function. I recommend to read this pdf about CUDA streams and concurrency.

To use CUDA streams with Dolphin, you can use the stream parameter of functions and constructors.

import dolphin as dp

stream = dp.Stream()

a = dp.zeros((100, 100), stream=stream)
b = dp.ones((100, 100), stream=stream)

c = dp.add(src=a, other=b)

# Wait for the stream to finish
stream.synchronize()

# Do something with c

With dolphin.Engine.infer() you can provide a stream as an argument.

import dolphin as dp

stream = dp.Stream()

a = dp.zeros((100, 100), stream=stream)
b = dp.ones((100, 100), stream=stream)

c = dp.add(src=a, other=b)

# Wait for the stream to finish
stream.synchronize()

# Do something with c

# Run the inference on the stream
output = engine.infer(inputs={"input":c}, stream=stream)

# Wait for the stream to finish
stream.synchronize()

# Do something with the output