Inference Engine
- class dolphin.Engine(onnx_file_path: ~typing.Optional[str] = None, engine_path: ~typing.Optional[str] = None, mode: str = 'fp16', optimisation_profile: ~typing.Optional[~typing.Tuple[int, ...]] = None, verbosity: bool = False, explicit_batch: bool = False, direct_io: bool = False, stdout: object = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr: object = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, calib_cache: ~typing.Optional[str] = None)[source]
Bases:
EEngine
,IEngine
Class to manage TensorRT engines. It is able to read an engine from a file or to create one from an onnx file.
This class is using the
CudaTrtBuffers
class to manage batched buffers. Find more details on the documentation ofinfer()
.TensorRT Github official Repository: https://github.com/NVIDIA/TensorRT
TensorRT official documentation: https://developer.nvidia.com/tensorrt
- Parameters:
onnx_file_path (str, optional) – Path to the onnx file to use, defaults to None
engine_path (str, optional) – Path to the engine to read or to save, defaults to None
mode (str, optional) – Can be fp32, fp16 or int8, defaults to fp16
optimisation_profile (Tuple[int, ...], optional) – Tuple defining the optimisation profile to use, defaults to None
verbosity (bool, optional) – Boolean to activate verbose mode, defaults to False
explicit_batch (bool, optional) – Sets explicit_batch flag, defaults to False
direct_io (bool, optional) – Sets direct_io flag, defaults to False
stdout (object, optional) – Out stream to write standard output, defaults to sys.stdout
stderr (object, optional) – Out stream to write errors output, defaults to sys.stderr
calib_cache (str, optional) – Where to write or read calibration cache file, defaults to None
- Raises:
ValueError – If no engine nor onnx file is provided
- allocate_buffers() CudaTrtBuffers [source]
Creates buffers for the engine. For now, only implicit dimensions are supported. Meaning, that dynamic shapes are not supported yet.
- Returns:
_description_
- Return type:
- load_engine(engine_file_path: str) ICudaEngine [source]
Loads a tensorRT engine from a file, using TensorRT.Runtime.
- Parameters:
engine_file_path (str) – Path to the engine file
- Returns:
TensorRT engine
- Return type:
trt.ICudaEngine
- is_dynamic(onnx_file_path: str) bool [source]
Returns True if the model is dynamic, False otherwise. By dynamic we mean that the model has at least one input with a dynamic dimension.
- Parameters:
onnx_file_path (str) – Path to the onnx file
- Returns:
True if the model is dynamic, False otherwise
- Return type:
bool
- create_context() IExecutionContext [source]
Creates a tensorRT execution context from the engine.
- Parameters:
tensorrt_engine (trt.ICudaEngine) – TensorRT engine
- Returns:
TensorRT execution context
- Return type:
trt.IExecutionContext
- build_engine(onnx_file_path: Optional[str] = None, engine_file_path: Optional[str] = None, mode: str = 'fp16', max_workspace_size: int = 30) ICudaEngine [source]
Builds a tensorRT engine from an onnx file. If the onnx model is detected as dynamic, then a dynamic engine is built, otherwise a static engine is built.
- Parameters:
onnx_file_path (str, optional) – Path to an onnx file, defaults to None
engine_file_path (str, optional) – Path to the engine file to save ,defaults to None
mode (str, optional) – Datatype mode fp32, fp16 or int8, defaults to “fp16”
max_workspace_size (int, optional) – maximum workspace size to use, defaults to 30
- Returns:
TensorRT engine
- Return type:
trt.ICudaEngine
- do_inference(stream: Optional[pycuda.driver.Stream] = None) None [source]
Executes the inference on the engine. This function assumes that the buffers are already filled with the input data.
- Parameters:
stream (cuda.Stream, optional) – Cuda Stream, defaults to None
- infer(inputs: Dict[str, darray], batched_input: bool = False, force_infer: bool = False, stream: Optional[pycuda.driver.Stream] = None) Optional[Dict[str, Bufferizer]] [source]
Method to call to perform inference on the engine. This method will automatically fill the buffers with the input data and execute the inference if the buffers are full. You can still force the inference by setting force_infer to True.
This expected inputs argument expects a dictionary of
dolphin.darray
objects or a dict of list ofdolphin.darray
. The keys of the dictionary must match the names of the inputs of the model.- Parameters:
inputs (Dict[str, darray]) – Dictionary of inputs
batched_input (bool, optional) – Consider input as batched, defaults to False
stream (cuda.Stream, optional) – Cuda stream to use, defaults to None
- Returns:
Output of the model
- Return type:
Union[Dict[str, dolphin.Bufferizer], None]
- property output: Dict[str, Bufferizer]
Returns the output of the
dolphin.CudaTrtBuffers
of the engine.- Returns:
Output bufferizer of the engine.
- Return type:
Dict[str, dolphin.Bufferizer]
- property input_shape: Dict[str, tuple]
Returns the shape of the inputs of the engine.
- Returns:
Shape of the inputs
- Return type:
dict
- property input_dtype: Dict[str, dtype]
Returns the datatype of the inputs of the engine.
- Returns:
Datatype of the inputs
- Return type:
Dict[str, dolphin.dtype]
- property output_shape: Dict[str, tuple]
Returns the shape of the outputs of the engine.
- Returns:
Shape of the outputs
- Return type:
dict