YOLOv5m
YOLOv5 is the one of the most popular object detection models. You can find more details at https://github.com/ultralytics/yolov5.
Overall
- Framework: PyTorch
- Model format: ONNX
- Model task: Object Detection
- Source: https://github.com/ultralytics/yolov5.
Usage
import cv2
import numpy as np
from furiosa.models.vision import YOLOv5m
from furiosa.runtime import session
yolov5m = YOLOv5m.load()
with session.create(yolov5m.enf) as sess:
image = cv2.imread("tests/assets/yolov5-test.jpg")
inputs, contexts = yolov5m.preprocess([image])
output = sess.run(np.expand_dims(inputs[0], axis=0)).numpy()
yolov5m.postprocess(output, contexts=contexts)
Inputs
The input is a 3-channel image of 640, 640 (height, width).
- Data Type:
numpy.uint8
- Tensor Shape:
[1, 640, 640, 3]
- Memory Format: NHWC, where
- N - batch size
- H - image height
- W - image width
- C - number of channels
- Color Order: RGB
- Optimal Batch Size (minimum: 1): <= 4
Outputs
The outputs are 3 numpy.float32
tensors in various shapes as the following.
You can refer to postprocess()
function to learn how to decode boxes, classes, and confidence scores.
Tensor | Shape | Data Type | Data Type | Description |
---|---|---|---|---|
0 | (1, 45, 80, 80) | float32 | NCHW | |
1 | (1, 45, 40, 40) | float32 | NCHW | |
2 | (1, 45, 20, 20) | float32 | NCHW |
Pre/Postprocessing
furiosa.models.vision.YOLOv5m
class provides preprocess
and postprocess
methods.
preprocess
method converts input images to input tensors, and postprocess
method converts
model output tensors to a list of bounding boxes, scores and labels.
You can find examples at YOLOv5m Usage.
furiosa.models.vision.YOLOv5m.preprocess
Preprocess input images to a batch of input tensors
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
Sequence[Union[str, np.ndarray]]
|
Color images have (NCHW: Batch, Channel, Height, Width) dimensions. |
required |
with_quantize |
bool
|
Whether to put quantize operator in front of the model or not. |
False
|
Returns:
Type | Description |
---|---|
Tuple[np.ndarray, List[Dict[str, Any]]]
|
a pre-processed image, scales and padded sizes(width,height) per images. The first element is a stacked numpy array containing a batch of images. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Inputs or YOLOv5m Inputs. The second element is a list of dict objects about the original images. Each dict object has the following keys. 'scale' key of the returned dict has a rescaled ratio per width(=target/width) and height(=target/height), and the 'pad' key has padded width and height pixels. Specially, the last dictionary element of returning tuple will be passed to postprocessing as a parameter to calculate predicted coordinates on normalized coordinates back to an input image coordinator. |
furiosa.models.vision.YOLOv5m.postprocess
Convert the outputs of this model to a list of bounding boxes, scores and labels
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_outputs |
Sequence[np.ndarray]
|
P3/8, P4/16, P5/32 features from yolov5l model. To learn more about the outputs of preprocess (i.e., model inputs), please refer to YOLOv5l Outputs or YOLOv5m Outputs. |
required |
contexts |
Sequence[Dict[str, Any]]
|
A configuration for each image generated by the preprocessor. For example, it could be the reduction ratio of the image, the actual image width and height. |
required |
conf_thres |
float
|
Confidence score threshold. The default to 0.25 |
0.25
|
iou_thres |
float
|
IoU threshold value for the NMS processing. The default to 0.45. |
0.45
|
Returns:
Type | Description |
---|---|
List[List[ObjectDetectionResult]]
|
Detected Bounding Box and its score and label represented as |