YOLOv5L

YOLOv5 is the one of the most popular object detection models. You can find more details at https://github.com/ultralytics/yolov5.

Overall

Framework: PyTorch
Model format: ONNX
Model task: Object Detection
Source: https://github.com/ultralytics/yolov5

Usage

import cv2
import numpy as np

from furiosa.models.vision import YOLOv5l
from furiosa.runtime import session

yolov5l = YOLOv5l.load()

with session.create(yolov5l.enf) as sess:
    image = cv2.imread("tests/assets/yolov5-test.jpg")
    inputs, contexts = yolov5l.preprocess([image])
    output = sess.run(np.expand_dims(inputs[0], axis=0)).numpy()
    yolov5l.postprocess(output, contexts=contexts)

Inputs

The input is a 3-channel image of 640, 640 (height, width).

Data Type: numpy.uint8
Tensor Shape: [1, 640, 640, 3]
Memory Format: NHWC, where
- N - batch size
- H - image height
- W - image width
- C - number of channels
Color Order: RGB
Optimal Batch Size (minimum: 1): <= 2

Outputs

The outputs are 3 numpy.float32 tensors in various shapes as the following. You can refer to postprocess() function to learn how to decode boxes, classes, and confidence scores.

Tensor	Shape	Data Type	Data Type
0	(1, 45, 80, 80)	float32	NCHW
1	(1, 45, 40, 40)	float32	NCHW
2	(1, 45, 20, 20)	float32	NCHW

Pre/Postprocessing

furiosa.models.vision.YOLOv5l class provides preprocess and postprocess methods. preprocess method converts input images to input tensors, and postprocess method converts model output tensors to a list of bounding boxes, scores and labels. You can find examples at YOLOv5l Usage.

`furiosa.models.vision.YOLOv5l.preprocess`

Preprocess input images to a batch of input tensors

Parameters:

Name	Type	Description	Default
`images`	`Sequence[Union[str, np.ndarray]]`	Color images have (NCHW: Batch, Channel, Height, Width) dimensions.	required
`with_quantize`	`bool`	Whether to put quantize operator in front of the model or not.	`False`

Returns:

Type Description

Tuple[np.ndarray, List[Dict[str, Any]]]

a pre-processed image, scales and padded sizes(width,height) per images. The first element is a stacked numpy array containing a batch of images. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Inputs or YOLOv5m Inputs.

The second element is a list of dict objects about the original images. Each dict object has the following keys. 'scale' key of the returned dict has a rescaled ratio per width(=target/width) and height(=target/height), and the 'pad' key has padded width and height pixels. Specially, the last dictionary element of returning tuple will be passed to postprocessing as a parameter to calculate predicted coordinates on normalized coordinates back to an input image coordinator.

`furiosa.models.vision.YOLOv5l.postprocess`

Convert the outputs of this model to a list of bounding boxes, scores and labels

Parameters:

Name	Type	Description	Default
`model_outputs`	`Sequence[np.ndarray]`	P3/8, P4/16, P5/32 features from yolov5l model. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Outputs or YOLOv5m Outputs.	required
`contexts`	`Sequence[Dict[str, Any]]`	A configuration for each image generated by the preprocessor. For example, it could be the reduction ratio of the image, the actual image width and height.	required
`conf_thres`	`float`	Confidence score threshold. The default to 0.25	`0.25`
`iou_thres`	`float`	IoU threshold value for the NMS processing. The default to 0.45.	`0.45`

Returns:

Type	Description
`List[List[ObjectDetectionResult]]`	Detected Bounding Box and its score and label represented as `ObjectDetectionResult`. The details of `ObjectDetectionResult` can be found below.

Definition of ObjectDetectionResult and LtrbBoundingBox

Source code in furiosa/models/vision/postprocess.py

@dataclass
class LtrbBoundingBox:
    left: float
    top: float
    right: float
    bottom: float

    def __iter__(self) -> Iterator[float]:
        return iter([self.left, self.top, self.right, self.bottom])

Source code in furiosa/models/vision/postprocess.py

@dataclass
class ObjectDetectionResult:
    boundingbox: LtrbBoundingBox
    score: float
    label: str
    index: int