Inference Engine

class dolphin.Engine(onnx_file_path: ~typing.Optional[str] = None, engine_path: ~typing.Optional[str] = None, mode: str = 'fp16', optimisation_profile: ~typing.Optional[~typing.Tuple[int, ...]] = None, verbosity: bool = False, explicit_batch: bool = False, direct_io: bool = False, stdout: object = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr: object = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, calib_cache: ~typing.Optional[str] = None)[source]

Bases: EEngine, IEngine

Class to manage TensorRT engines. It is able to read an engine from a file or to create one from an onnx file.

This class is using the CudaTrtBuffers class to manage batched buffers. Find more details on the documentation of infer().

TensorRT Github official Repository: https://github.com/NVIDIA/TensorRT

TensorRT official documentation: https://developer.nvidia.com/tensorrt

Parameters:
  • onnx_file_path (str, optional) – Path to the onnx file to use, defaults to None

  • engine_path (str, optional) – Path to the engine to read or to save, defaults to None

  • mode (str, optional) – Can be fp32, fp16 or int8, defaults to fp16

  • optimisation_profile (Tuple[int, ...], optional) – Tuple defining the optimisation profile to use, defaults to None

  • verbosity (bool, optional) – Boolean to activate verbose mode, defaults to False

  • explicit_batch (bool, optional) – Sets explicit_batch flag, defaults to False

  • direct_io (bool, optional) – Sets direct_io flag, defaults to False

  • stdout (object, optional) – Out stream to write standard output, defaults to sys.stdout

  • stderr (object, optional) – Out stream to write errors output, defaults to sys.stderr

  • calib_cache (str, optional) – Where to write or read calibration cache file, defaults to None

Raises:

ValueError – If no engine nor onnx file is provided

allocate_buffers() CudaTrtBuffers[source]

Creates buffers for the engine. For now, only implicit dimensions are supported. Meaning, that dynamic shapes are not supported yet.

Returns:

_description_

Return type:

dolphin.CudaTrtBuffers

load_engine(engine_file_path: str) ICudaEngine[source]

Loads a tensorRT engine from a file, using TensorRT.Runtime.

Parameters:

engine_file_path (str) – Path to the engine file

Returns:

TensorRT engine

Return type:

trt.ICudaEngine

is_dynamic(onnx_file_path: str) bool[source]

Returns True if the model is dynamic, False otherwise. By dynamic we mean that the model has at least one input with a dynamic dimension.

Parameters:

onnx_file_path (str) – Path to the onnx file

Returns:

True if the model is dynamic, False otherwise

Return type:

bool

create_context() IExecutionContext[source]

Creates a tensorRT execution context from the engine.

Parameters:

tensorrt_engine (trt.ICudaEngine) – TensorRT engine

Returns:

TensorRT execution context

Return type:

trt.IExecutionContext

build_engine(onnx_file_path: Optional[str] = None, engine_file_path: Optional[str] = None, mode: str = 'fp16', max_workspace_size: int = 30) ICudaEngine[source]

Builds a tensorRT engine from an onnx file. If the onnx model is detected as dynamic, then a dynamic engine is built, otherwise a static engine is built.

Parameters:
  • onnx_file_path (str, optional) – Path to an onnx file, defaults to None

  • engine_file_path (str, optional) – Path to the engine file to save ,defaults to None

  • mode (str, optional) – Datatype mode fp32, fp16 or int8, defaults to “fp16”

  • max_workspace_size (int, optional) – maximum workspace size to use, defaults to 30

Returns:

TensorRT engine

Return type:

trt.ICudaEngine

do_inference(stream: Optional[pycuda.driver.Stream] = None) None[source]

Executes the inference on the engine. This function assumes that the buffers are already filled with the input data.

Parameters:

stream (cuda.Stream, optional) – Cuda Stream, defaults to None

infer(inputs: Dict[str, darray], batched_input: bool = False, force_infer: bool = False, stream: Optional[pycuda.driver.Stream] = None) Optional[Dict[str, Bufferizer]][source]

Method to call to perform inference on the engine. This method will automatically fill the buffers with the input data and execute the inference if the buffers are full. You can still force the inference by setting force_infer to True.

This expected inputs argument expects a dictionary of dolphin.darray objects or a dict of list of dolphin.darray. The keys of the dictionary must match the names of the inputs of the model.

Parameters:
  • inputs (Dict[str, darray]) – Dictionary of inputs

  • batched_input (bool, optional) – Consider input as batched, defaults to False

  • stream (cuda.Stream, optional) – Cuda stream to use, defaults to None

Returns:

Output of the model

Return type:

Union[Dict[str, dolphin.Bufferizer], None]

property output: Dict[str, Bufferizer]

Returns the output of the dolphin.CudaTrtBuffers of the engine.

Returns:

Output bufferizer of the engine.

Return type:

Dict[str, dolphin.Bufferizer]

property input_shape: Dict[str, tuple]

Returns the shape of the inputs of the engine.

Returns:

Shape of the inputs

Return type:

dict

property input_dtype: Dict[str, dtype]

Returns the datatype of the inputs of the engine.

Returns:

Datatype of the inputs

Return type:

Dict[str, dolphin.dtype]

property output_shape: Dict[str, tuple]

Returns the shape of the outputs of the engine.

Returns:

Shape of the outputs

Return type:

dict

property output_dtype: Dict[str, dtype]

Returns the datatype of the outputs of the engine.

Returns:

Datatype of the outputs

Return type:

dict