Skip to content

SSD ResNet34

SSD ResNet34 backbone model trained on COCO (1200x1200). This model has been used since MLCommons v0.5.

Overall

  • Framework: PyTorch
  • Model format: ONNX
  • Model task: Object detection
  • Source: This model is originated from SSD ResNet34 in ONNX available at MLCommons - Supported Models.

Usages

from furiosa.models.vision import SSDResNet34
from furiosa.runtime import session

resnet34 = SSDResNet34.load()

with session.create(resnet34.enf) as sess:
    image, contexts = resnet34.preprocess(["tests/assets/cat.jpg"])
    output = sess.run(image).numpy()
    resnet34.postprocess(output, contexts=contexts)
from furiosa.models.vision import SSDResNet34
from furiosa.runtime import session

resnet34 = SSDResNet34.load(use_native=True)

with session.create(resnet34.enf) as sess:
    image, contexts = resnet34.preprocess(["tests/assets/cat.jpg"])
    output = sess.run(image).numpy()
    resnet34.postprocessor(output, contexts=contexts[0])

Inputs

The input is a 3-channel image of 300x300 (height, width).

  • Data Type: numpy.float32
  • Tensor Shape: [1, 3, 1200, 1200]
  • Memory Format: NCHW, where
    • N - batch size
    • C - number of channels
    • H - image height
    • W - image width
  • Color Order: RGB
  • Optimal Batch Size: 1

Outputs

The outputs are 12 numpy.float32 tensors in various shapes as the following. You can refer to postprocess() function to learn how to decode boxes, classes, and confidence scores.

Tensor Shape Data Type Data Type Description
0 (1, 324, 50, 50) float32 NCHW
1 (1, 486, 25, 25) float32 NCHW
2 (1, 486, 13, 13) float32 NCHW
3 (1, 486, 7, 7) float32 NCHW
4 (1, 324, 3, 3) float32 NCHW
5 (1, 324, 3, 3) float32 NCHW
6 (1, 16, 50, 50) float32 NCHW
7 (1, 24, 25, 25) float32 NCHW
8 (1, 24, 13, 13) float32 NCHW
9 (1, 24, 7, 7) float32 NCHW
10 (1, 16, 3, 3) float32 NCHW
11 (1, 16, 3, 3) float32 NCHW

Pre/Postprocessing

furiosa.models.vision.SSDResNet34 class provides preprocess and postprocess methods. preprocess method converts input images to input tensors, and postprocess method converts model output tensors to a list of bounding boxes, scores and labels. You can find examples at SSDResNet34 Usage.

furiosa.models.vision.SSDResNet34.preprocess

Preprocess input images to a batch of input tensors

Parameters:

Name Type Description Default
images Sequence[Union[str, np.ndarray]]

A list of paths of image files (e.g., JPEG, PNG) or a stacked image loaded as a numpy array in BGR order or gray order.

required

Returns:

Type Description
Tuple[npt.ArrayLike, List[Dict[str, Any]]]

The first element is a list of 3-channel images of 1200x1200 in NCHW format, and the second element is a list of context about the original image metadata. This context data should be passed and utilized during post-processing. To learn more about the outputs of preprocess (i.e., model inputs), please refer to Inputs.

furiosa.models.vision.SSDResNet34.postprocess

Convert the outputs of this model to a list of bounding boxes, scores and labels

Parameters:

Name Type Description Default
model_outputs Sequence[np.ndarray]

the outputs of the model. To learn more about the output of model, please refer to Outputs.

required
contexts Sequence[Dict[str, Any]]

context coming from preprocess()

required

Returns:

Type Description
List[List[ObjectDetectionResult]]

Detected Bounding Box and its score and label represented as ObjectDetectionResult. To learn more about ObjectDetectionResult, 'Definition of ObjectDetectionResult' can be found below.

Definition of ObjectDetectionResult and LtrbBoundingBox
Source code in furiosa/models/vision/postprocess.py
@dataclass
class LtrbBoundingBox:
    left: float
    top: float
    right: float
    bottom: float

    def __iter__(self) -> Iterator[float]:
        return iter([self.left, self.top, self.right, self.bottom])
Source code in furiosa/models/vision/postprocess.py
@dataclass
class ObjectDetectionResult:
    boundingbox: LtrbBoundingBox
    score: float
    label: str
    index: int

Native Postprocessor

This class provides another version of the postprocessing implementation which is highly optimized for NPU. The implementation leverages the NPU IO architecture and runtime.

To use this implementation, when this model is loaded, the parameter use_native=True should be passed to load() or load_aync(). The following is an example:

Example

from furiosa.models.vision import SSDResNet34
from furiosa.runtime import session

resnet34 = SSDResNet34.load(use_native=True)

with session.create(resnet34.enf) as sess:
    image, contexts = resnet34.preprocess(["tests/assets/cat.jpg"])
    output = sess.run(image).numpy()
    resnet34.postprocessor(output, contexts=contexts[0])