Skip to content

YOLOv5L

YOLOv5 is the one of the most popular object detection models. You can find more details at https://github.com/ultralytics/yolov5.

Overall

Usage

import cv2
import numpy as np

from furiosa.models.vision import YOLOv5l
from furiosa.runtime import session

yolov5l = YOLOv5l.load()

with session.create(yolov5l.enf) as sess:
    image = cv2.imread("tests/assets/yolov5-test.jpg")
    inputs, contexts = yolov5l.preprocess([image])
    output = sess.run(np.expand_dims(inputs[0], axis=0)).numpy()
    yolov5l.postprocess(output, contexts=contexts)

Inputs

The input is a 3-channel image of 640, 640 (height, width).

  • Data Type: numpy.uint8
  • Tensor Shape: [1, 640, 640, 3]
  • Memory Format: NHWC, where
    • N - batch size
    • H - image height
    • W - image width
    • C - number of channels
  • Color Order: RGB
  • Optimal Batch Size (minimum: 1): <= 2

Outputs

The outputs are 3 numpy.float32 tensors in various shapes as the following. You can refer to postprocess() function to learn how to decode boxes, classes, and confidence scores.

Tensor Shape Data Type Data Type Description
0 (1, 45, 80, 80) float32 NCHW
1 (1, 45, 40, 40) float32 NCHW
2 (1, 45, 20, 20) float32 NCHW

Pre/Postprocessing

furiosa.models.vision.YOLOv5l class provides preprocess and postprocess methods. preprocess method converts input images to input tensors, and postprocess method converts model output tensors to a list of bounding boxes, scores and labels. You can find examples at YOLOv5l Usage.

furiosa.models.vision.YOLOv5l.preprocess

Preprocess input images to a batch of input tensors

Parameters:

Name Type Description Default
images Sequence[Union[str, np.ndarray]]

Color images have (NCHW: Batch, Channel, Height, Width) dimensions.

required
with_quantize bool

Whether to put quantize operator in front of the model or not.

False

Returns:

Type Description
Tuple[np.ndarray, List[Dict[str, Any]]]

a pre-processed image, scales and padded sizes(width,height) per images. The first element is a stacked numpy array containing a batch of images. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Inputs or YOLOv5m Inputs.

The second element is a list of dict objects about the original images. Each dict object has the following keys. 'scale' key of the returned dict has a rescaled ratio per width(=target/width) and height(=target/height), and the 'pad' key has padded width and height pixels. Specially, the last dictionary element of returning tuple will be passed to postprocessing as a parameter to calculate predicted coordinates on normalized coordinates back to an input image coordinator.

furiosa.models.vision.YOLOv5l.postprocess

Convert the outputs of this model to a list of bounding boxes, scores and labels

Parameters:

Name Type Description Default
model_outputs Sequence[np.ndarray]

P3/8, P4/16, P5/32 features from yolov5l model. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Outputs or YOLOv5m Outputs.

required
contexts Sequence[Dict[str, Any]]

A configuration for each image generated by the preprocessor. For example, it could be the reduction ratio of the image, the actual image width and height.

required
conf_thres float

Confidence score threshold. The default to 0.25

0.25
iou_thres float

IoU threshold value for the NMS processing. The default to 0.45.

0.45

Returns:

Type Description
List[List[ObjectDetectionResult]]

Detected Bounding Box and its score and label represented as ObjectDetectionResult. The details of ObjectDetectionResult can be found below.

Definition of ObjectDetectionResult and LtrbBoundingBox
Source code in furiosa/models/vision/postprocess.py
@dataclass
class LtrbBoundingBox:
    left: float
    top: float
    right: float
    bottom: float

    def __iter__(self) -> Iterator[float]:
        return iter([self.left, self.top, self.right, self.bottom])
Source code in furiosa/models/vision/postprocess.py
@dataclass
class ObjectDetectionResult:
    boundingbox: LtrbBoundingBox
    score: float
    label: str
    index: int