Skip to content


YOLOv5 is the one of the most popular object detection models. You can find more details at



import cv2
import numpy as np

from import YOLOv5l
from furiosa.runtime import session

yolov5l = YOLOv5l.load()

with session.create(yolov5l.enf) as sess:
    image = cv2.imread("tests/assets/yolov5-test.jpg")
    inputs, contexts = yolov5l.preprocess([image])
    output =[0], axis=0)).numpy()
    yolov5l.postprocess(output, contexts=contexts)


The input is a 3-channel image of 640, 640 (height, width).

  • Data Type: numpy.uint8
  • Tensor Shape: [1, 640, 640, 3]
  • Memory Format: NHWC, where
    • N - batch size
    • H - image height
    • W - image width
    • C - number of channels
  • Color Order: RGB
  • Optimal Batch Size (minimum: 1): <= 2


The outputs are 3 numpy.float32 tensors in various shapes as the following. You can refer to postprocess() function to learn how to decode boxes, classes, and confidence scores.

Tensor Shape Data Type Data Type Description
0 (1, 45, 80, 80) float32 NCHW
1 (1, 45, 40, 40) float32 NCHW
2 (1, 45, 20, 20) float32 NCHW

Pre/Postprocessing class provides preprocess and postprocess methods. preprocess method converts input images to input tensors, and postprocess method converts model output tensors to a list of bounding boxes, scores and labels. You can find examples at YOLOv5l Usage.

Preprocess input images to a batch of input tensors


Name Type Description Default
images Sequence[Union[str, np.ndarray]]

Color images have (NCHW: Batch, Channel, Height, Width) dimensions.

with_quantize bool

Whether to put quantize operator in front of the model or not.



Type Description
Tuple[np.ndarray, List[Dict[str, Any]]]

a pre-processed image, scales and padded sizes(width,height) per images. The first element is a stacked numpy array containing a batch of images. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Inputs or YOLOv5m Inputs.

The second element is a list of dict objects about the original images. Each dict object has the following keys. 'scale' key of the returned dict has a rescaled ratio per width(=target/width) and height(=target/height), and the 'pad' key has padded width and height pixels. Specially, the last dictionary element of returning tuple will be passed to postprocessing as a parameter to calculate predicted coordinates on normalized coordinates back to an input image coordinator.

Convert the outputs of this model to a list of bounding boxes, scores and labels


Name Type Description Default
model_outputs Sequence[np.ndarray]

P3/8, P4/16, P5/32 features from yolov5l model. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Outputs or YOLOv5m Outputs.

contexts Sequence[Dict[str, Any]]

A configuration for each image generated by the preprocessor. For example, it could be the reduction ratio of the image, the actual image width and height.

conf_thres float

Confidence score threshold. The default to 0.25

iou_thres float

IoU threshold value for the NMS processing. The default to 0.45.



Type Description

Detected Bounding Box and its score and label represented as ObjectDetectionResult. The details of ObjectDetectionResult can be found below.

Definition of ObjectDetectionResult and LtrbBoundingBox
Source code in furiosa/models/vision/
class LtrbBoundingBox:
    left: float
    top: float
    right: float
    bottom: float

    def __iter__(self) -> Iterator[float]:
        return iter([self.left,, self.right, self.bottom])
Source code in furiosa/models/vision/
class ObjectDetectionResult:
    boundingbox: LtrbBoundingBox
    score: float
    label: str
    index: int