SSD MobileNet v1
SSD MobileNet v1 backbone model trained on COCO (300x300). This model has been used since MLCommons v0.5.
Overall
- Framework: PyTorch
- Model format: ONNX
- Model task: Object detection
- Source: This model is originated from SSD MobileNet v1 in ONNX available at MLCommons - Supported Models.
Usages
from furiosa.models.vision import SSDMobileNet
from furiosa.runtime import session
image = ["tests/assets/cat.jpg"]
mobilenet = SSDMobileNet.load()
with session.create(mobilenet.enf) as sess:
inputs, contexts = mobilenet.preprocess(image)
outputs = sess.run(inputs).numpy()
mobilenet.postprocess(outputs, contexts)
from furiosa.models.vision import SSDMobileNet
from furiosa.runtime import session
image = ["tests/assets/cat.jpg"]
mobilenet = SSDMobileNet.load(use_native=True)
with session.create(mobilenet.enf) as sess:
inputs, contexts = mobilenet.preprocess(image)
outputs = sess.run(inputs).numpy()
mobilenet.postprocess(outputs, contexts[0])
Inputs
The input is a 3-channel image of 300x300 (height, width).
- Data Type:
numpy.float32
- Tensor Shape:
[1, 3, 300, 300]
- Memory Format: NCHW, where:
- N - batch size
- C - number of channels
- H - image height
- W - image width
- Color Order: RGB
- Optimal Batch Size (minimum: 1): <= 8
Outputs
The outputs are 12 numpy.float32
tensors in various shapes as the following.
You can refer to postprocess()
function to learn how to decode boxes, classes, and confidence scores.
Tensor | Shape | Data Type | Data Type | Description |
---|---|---|---|---|
0 | (1, 273, 19, 19) | float32 | NCHW | |
1 | (1, 12, 19, 19) | float32 | NCHW | |
2 | (1, 546, 10, 10) | float32 | NCHW | |
3 | (1, 24, 10, 10) | float32 | NCHW | |
4 | (1, 546, 5, 5) | float32 | NCHW | |
5 | (1, 24, 5, 5) | float32 | NCHW | |
6 | (1, 546, 3, 3) | float32 | NCHW | |
7 | (1, 24, 3, 3) | float32 | NCHW | |
8 | (1, 546, 2, 2) | float32 | NCHW | |
9 | (1, 24, 2, 2) | float32 | NCHW | |
10 | (1, 546, 1, 1) | float32 | NCHW | |
11 | (1, 24, 1, 1) | float32 | NCHW |
Pre/Postprocessing
furiosa.models.vision.SSDMobileNet
class provides preprocess
and postprocess
methods.
preprocess
method converts input images to input tensors, and postprocess
method converts
model output tensors to a list of bounding boxes, scores and labels.
You can find examples at SSDMobileNet Usage.
furiosa.models.vision.SSDMobileNet.preprocess
Preprocess input images to a batch of input tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
Sequence[Union[str, np.ndarray]]
|
A list of paths of image files (e.g., JPEG, PNG) or a stacked image loaded as a numpy array in BGR order or gray order. |
required |
Returns:
Type | Description |
---|---|
Tuple[npt.ArrayLike, List[Dict[str, Any]]]
|
The first element is 3-channel images of 300x300 in NCHW format, and the second element is a list of context about the original image metadata. This context data should be passed and utilized during post-processing. To learn more about the outputs of preprocess (i.e., model inputs), please refer to Inputs. |
furiosa.models.vision.SSDMobileNet.postprocess
Convert the outputs of this model to a list of bounding boxes, scores and labels
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_outputs |
Sequence[numpy.ndarray]
|
the outputs of the model. To learn more about the output of model, please refer to Outputs. |
required |
contexts |
Sequence[Dict[str, Any]]
|
context coming from |
required |
Returns:
Type | Description |
---|---|
List[List[ObjectDetectionResult]]
|
Detected Bounding Box and its score and label represented as |
Native Postprocessor
This class provides another version of the postprocessing implementation which is highly optimized for NPU. The implementation leverages the NPU IO architecture and runtime.
To use this implementation, when this model is loaded, the parameter use_native=True
should be passed to load()
or load_aync()
. The following is an example:
Example
from furiosa.models.vision import SSDMobileNet
from furiosa.runtime import session
image = ["tests/assets/cat.jpg"]
mobilenet = SSDMobileNet.load(use_native=True)
with session.create(mobilenet.enf) as sess:
inputs, contexts = mobilenet.preprocess(image)
outputs = sess.run(inputs).numpy()
mobilenet.postprocess(outputs, contexts[0])