Inference Engine

class dolphin.Engine(onnx_file_path: ~typing.Optional[str] = None, engine_path: ~typing.Optional[str] = None, mode: str = 'fp16', optimisation_profile: ~typing.Optional[~typing.Tuple[int, ...]] = None, verbosity: bool = False, explicit_batch: bool = False, direct_io: bool = False, stdout: object = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr: object = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, calib_cache: ~typing.Optional[str] = None)[source]

Bases: EEngine, IEngine

Class to manage TensorRT engines. It is able to read an engine from a file or to create one from an onnx file.

This class is using the CudaTrtBuffers class to manage batched buffers. Find more details on the documentation of infer().

TensorRT Github official Repository: https://github.com/NVIDIA/TensorRT

TensorRT official documentation: https://developer.nvidia.com/tensorrt

Parameters:

onnx_file_path (str, optional) – Path to the onnx file to use, defaults to None
engine_path (str, optional) – Path to the engine to read or to save, defaults to None
mode (str, optional) – Can be fp32, fp16 or int8, defaults to fp16
optimisation_profile (Tuple[int, ...], optional) – Tuple defining the optimisation profile to use, defaults to None
verbosity (bool, optional) – Boolean to activate verbose mode, defaults to False
explicit_batch (bool, optional) – Sets explicit_batch flag, defaults to False
direct_io (bool, optional) – Sets direct_io flag, defaults to False
stdout (object, optional) – Out stream to write standard output, defaults to sys.stdout
stderr (object, optional) – Out stream to write errors output, defaults to sys.stderr
calib_cache (str, optional) – Where to write or read calibration cache file, defaults to None

Raises:

ValueError – If no engine nor onnx file is provided

allocate_buffers() → CudaTrtBuffers[source]

Creates buffers for the engine. For now, only implicit dimensions are supported. Meaning, that dynamic shapes are not supported yet.

Returns:: _description_
Return type:: dolphin.CudaTrtBuffers

load_engine(engine_file_path: str) → ICudaEngine[source]

Loads a tensorRT engine from a file, using TensorRT.Runtime.

Parameters:: engine_file_path (str) – Path to the engine file
Returns:: TensorRT engine
Return type:: trt.ICudaEngine

is_dynamic(onnx_file_path: str) → bool[source]

Returns True if the model is dynamic, False otherwise. By dynamic we mean that the model has at least one input with a dynamic dimension.

Parameters:: onnx_file_path (str) – Path to the onnx file
Returns:: True if the model is dynamic, False otherwise
Return type:: bool

create_context() → IExecutionContext[source]

Creates a tensorRT execution context from the engine.

Parameters:: tensorrt_engine (trt.ICudaEngine) – TensorRT engine
Returns:: TensorRT execution context
Return type:: trt.IExecutionContext

build_engine(onnx_file_path: Optional[str] = None, engine_file_path: Optional[str] = None, mode: str = 'fp16', max_workspace_size: int = 30) → ICudaEngine[source]

Builds a tensorRT engine from an onnx file. If the onnx model is detected as dynamic, then a dynamic engine is built, otherwise a static engine is built.

Parameters:

onnx_file_path (str, optional) – Path to an onnx file, defaults to None
engine_file_path (str, optional) – Path to the engine file to save ,defaults to None
mode (str, optional) – Datatype mode fp32, fp16 or int8, defaults to “fp16”
max_workspace_size (int, optional) – maximum workspace size to use, defaults to 30

Returns:

TensorRT engine

Return type:

trt.ICudaEngine

do_inference(stream: Optional[pycuda.driver.Stream] = None) → None[source]

Executes the inference on the engine. This function assumes that the buffers are already filled with the input data.

Parameters:: stream (cuda.Stream, optional) – Cuda Stream, defaults to None

infer(inputs: Dict[str, darray], batched_input: bool = False, force_infer: bool = False, stream: Optional[pycuda.driver.Stream] = None) → Optional[Dict[str, Bufferizer]][source]

Method to call to perform inference on the engine. This method will automatically fill the buffers with the input data and execute the inference if the buffers are full. You can still force the inference by setting force_infer to True.

This expected inputs argument expects a dictionary of dolphin.darray objects or a dict of list of dolphin.darray. The keys of the dictionary must match the names of the inputs of the model.

Parameters:

inputs (Dict[str, darray]) – Dictionary of inputs
batched_input (bool, optional) – Consider input as batched, defaults to False
stream (cuda.Stream, optional) – Cuda stream to use, defaults to None

Returns:

Output of the model

Return type:

Union[Dict[str, dolphin.Bufferizer], None]

property output: Dict[str, Bufferizer]

Returns the output of the dolphin.CudaTrtBuffers of the engine.

Returns:: Output bufferizer of the engine.
Return type:: Dict[str, dolphin.Bufferizer]

property input_shape: Dict[str, tuple]

Returns the shape of the inputs of the engine.

Returns:: Shape of the inputs
Return type:: dict

property input_dtype: Dict[str, dtype]

Returns the datatype of the inputs of the engine.

Returns:: Datatype of the inputs
Return type:: Dict[str, dolphin.dtype]

property output_shape: Dict[str, tuple]

Returns the shape of the outputs of the engine.

Returns:: Shape of the outputs
Return type:: dict

property output_dtype: Dict[str, dtype]

Returns the datatype of the outputs of the engine.

Returns:: Datatype of the outputs
Return type:: dict