SSD MobileNet v1

SSD MobileNet v1 backbone model trained on COCO (300x300). This model has been used since MLCommons v0.5.

Overall

Framework: PyTorch
Model format: ONNX
Model task: Object detection
Source: This model is originated from SSD MobileNet v1 in ONNX available at MLCommons - Supported Models.

Usages

Python PostprocessorNative Postprocessor

from furiosa.models.vision import SSDMobileNet
from furiosa.runtime.sync import create_runner

image = ["tests/assets/cat.jpg"]

mobilenet = SSDMobileNet()
with create_runner(mobilenet.model_source()) as runner:
    inputs, contexts = mobilenet.preprocess(image)
    outputs = runner.run(inputs)
    mobilenet.postprocess(outputs, contexts[0])

from furiosa.models.types import Platform
from furiosa.models.vision import SSDMobileNet
from furiosa.runtime.sync import create_runner

image = ["tests/assets/cat.jpg"]

mobilenet = SSDMobileNet(postprocessor_type=Platform.RUST)
with create_runner(mobilenet.model_source()) as runner:
    inputs, contexts = mobilenet.preprocess(image)
    outputs = runner.run(inputs)
    mobilenet.postprocess(outputs, contexts[0])

Inputs

The input is a 3-channel image of 300x300 (height, width).

Data Type: numpy.float32
Tensor Shape: [1, 3, 300, 300]
Memory Format: NCHW, where:
- N - batch size
- C - number of channels
- H - image height
- W - image width
Color Order: RGB
Optimal Batch Size (minimum: 1): <= 8

Outputs

The outputs are 12 numpy.float32 tensors in various shapes as the following. You can refer to postprocess() function to learn how to decode boxes, classes, and confidence scores.

Tensor	Shape	Data Type	Data Type
0	(1, 273, 19, 19)	float32	NCHW
1	(1, 12, 19, 19)	float32	NCHW
2	(1, 546, 10, 10)	float32	NCHW
3	(1, 24, 10, 10)	float32	NCHW
4	(1, 546, 5, 5)	float32	NCHW
5	(1, 24, 5, 5)	float32	NCHW
6	(1, 546, 3, 3)	float32	NCHW
7	(1, 24, 3, 3)	float32	NCHW
8	(1, 546, 2, 2)	float32	NCHW
9	(1, 24, 2, 2)	float32	NCHW
10	(1, 546, 1, 1)	float32	NCHW
11	(1, 24, 1, 1)	float32	NCHW

Pre/Postprocessing

furiosa.models.vision.SSDMobileNet class provides preprocess and postprocess methods. preprocess method converts input images to input tensors, and postprocess method converts model output tensors to a list of bounding boxes, scores and labels. You can find examples at SSDMobileNet Usage.

`furiosa.models.vision.SSDMobileNet.preprocess`

Preprocess input images to a batch of input tensors.

Parameters:

Name	Type	Description	Default
`images`	`Sequence[Union[str, ndarray]]`	A list of paths of image files (e.g., JPEG, PNG) or a stacked image loaded as a numpy array in BGR order or gray order.	required
`with_scaling`	`bool`	Whether to apply model-specific techniques that involve scaling the model's input and converting its data type to float32. Refer to the code to gain a precise understanding of the techniques used. Defaults to False.	`False`

Returns:

Type	Description
`Tuple[ArrayLike, List[Dict[str, Any]]]`	The first element is 3-channel images of 300x300 in NCHW format, and the second element is a list of context about the original image metadata. This context data should be passed and utilized during post-processing. To learn more about the outputs of preprocess (i.e., model inputs), please refer to Inputs.

`furiosa.models.vision.SSDMobileNet.postprocess`

Convert the outputs of this model to a list of bounding boxes, scores and labels

Parameters:

Name	Type	Description	Default
`model_outputs`	`Sequence[numpy.ndarray]`	the outputs of the model. To learn more about the output of model, please refer to Outputs.	required
`contexts`	`Sequence[Dict[str, Any]]`	context coming from `preprocess()`	required

Returns:

Type	Description
`List[List[ObjectDetectionResult]]`	Detected Bounding Box and its score and label represented as `ObjectDetectionResult`. To learn more about `ObjectDetectionResult`, 'Definition of ObjectDetectionResult' can be found below.

Definitions of ObjectDetectionResult and LtrbBoundingBox

Source code in furiosa/models/vision/postprocess.py

@dataclass
class LtrbBoundingBox:
    left: float
    top: float
    right: float
    bottom: float

    def __iter__(self) -> Iterator[float]:
        return iter([self.left, self.top, self.right, self.bottom])

Source code in furiosa/models/vision/postprocess.py

@dataclass
class ObjectDetectionResult:
    boundingbox: LtrbBoundingBox
    score: float
    label: str
    index: int

Native Postprocessor

This class provides another version of the postprocessing implementation which is highly optimized for NPU. The implementation leverages the NPU IO architecture and runtime.

To use this implementation, when this model is called, the parameter postprocessor_type=Platform.RUST should be passed. The following is an example:

Example

from furiosa.models.types import Platform
from furiosa.models.vision import SSDMobileNet
from furiosa.runtime.sync import create_runner

image = ["tests/assets/cat.jpg"]

mobilenet = SSDMobileNet(postprocessor_type=Platform.RUST)
with create_runner(mobilenet.model_source()) as runner:
    inputs, contexts = mobilenet.preprocess(image)
    outputs = runner.run(inputs)
    mobilenet.postprocess(outputs, contexts[0])