furiosa.runtime package

This package provides high-level Python APIs for Furiosa AI NPUs.

Runtime Variants

The package is divided into three main modules:

Module

Methods

Generation

furiosa.runtime

Async

Current

furiosa.runtime.sync

Sync

Current

furiosa.runtime.session

Sync

Legacy

The current generation API was first introduced in FuriosaRT 0.10.0. The legacy generation API (furiosa.runtime.session) is still supported for the backward compatibility but slated for removal in the future release. See Use of legacy modules for the more information.

Each module further contains two different sets of interfaces:

Runners

Runners provide a single class with run method.

Multiple run calls can be active at any time, which is either possible by multiple tasks (for async modules) or threads (for sync modules).

Queues

Queues provide two separate classes with send and recv methods respectively. Each send should be paired with a context value to distinguish different recv outputs.

While multiple inputs can be sent at any time, only a single task or thread should call recv.

Use of legacy modules

Deprecated since version 0.10.0: Any new use is strongly discouraged.

The package contains many historical modules, including furiosa.runtime.session and furiosa.runtime.errors (see Legacy Supports for the full list). As of 0.10.0 they are deprecated and will report deprecation warnings.

Legacy modules are largely wrappers around current APIs by default, as such there exist some slight incompatibilities, most notably the lack of error subclasses. Those intercompatibilities are marked as legacy only or current only.

If this is not desirable you may enable the [legacy] extra on install to force the old implementation and disable compatibility warnings. Please note that you cannot use current-only APIs when the extra is enabled. For example:

Availability

Example

Default

With [legacy]

Current only

furiosa.runtime.Runtime

Current & Legacy

furiosa.runtime.full_version

Legacy

furiosa.runtime.session.Session

⚠️

Legacy only

furiosa.runtime.errors.NativeError

Model Inputs

class furiosa.runtime.ModelSource

Represents a value that specifies how to load the model. This is not a real class, but a type alias for either:

  1. Any Path-like type or a string, representing a path to the model file.

  2. Bytes, byte array or any class with __bytes__ method, representing a raw model data.

Future versions may allow additional types.

New in version 0.10.

See Compiler for supported model formats and restrictions.

Tensor Inputs and Outputs

Tensors use a numpy.ndarray type from Numpy as a primary representation, but the following type aliases exist for documentation and typing purposes.

class furiosa.runtime.TensorArray

A type alias for a list of input tensors. This is not a real class, but can be either:

  • Any iterable of Tensors (but the iterable itself shouldn’t be an ndarray)

  • A single Tensor, if only one tensor is required

The corresponding output type is always a list of ndarrays.

New in version 0.10.

class furiosa.runtime.Tensor

A type alias for a single input tensor. This is not a real class, but can be either:

The corresponding output type is always a single ndarray.

New in version 0.10.

Legacy Interface

Legacy interfaces support the same input and output types, but due to the technical reasons, following concrete types are used for tensor outputs.

class furiosa.runtime.tensor.TensorArray

A subclass of list, so that all items should be Tensor and additional methods are provided.

Legacy only: This class used to be not subclassed from list and supported only len(array), array[i] and array[i] = tensor.

is_empty()

True if the array is empty.

Return type:

bool

view()

Returns a list of ndarray views to all tensors, as in Tensor.view.

Return type:

list[numpy.ndarray]

numpy()

Returns a list of copies of ndarray from all tensors, as in Tensor.numpy.

Return type:

list[numpy.ndarray]

class furiosa.runtime.tensor.Tensor

A subclass of numpy.ndarray providing additional methods.

Legacy only: This class used to be not subclassed from numpy.ndarray and no methods were available except documented here.

property shape: tuple[int, ...]

Same as tensor.view().shape.

property numpy_dtype: type

Same as tensor.view().dtype.type. Returns a Numpy type object like numpy.int8.

Deprecated since version 0.10: Contrary to its name, it didn’t return numpy.dtype which was misleading and thus deprecated. Use tensor.view().dtype instead.

copy_from(data)

Replaces the entire tensor with given data.

Parameters:

data (numpy.ndarray or numpy.generic) – What to replace this tensor

view()

Returns a ndarray view to this tensor. Multiple views to the same tensor refer to the same memory region.

Return type:

numpy.ndarray

numpy()

Returns a copy of ndarray from this tensor. Multiple copies to the same tensor are independent from each other and the original tensor.

Return type:

numpy.ndarray

Runtime Object

Runtime objects are associated with a set of NPU devices and can be used to create inference sessions.

New in version 0.10: Previously sessions could only be created directly via create and create_async functions.

class furiosa.runtime.Runtime(device=None)

Asynchronous runtime object.

Parameters:

device (str or None) – A textual device specification, see the section for defaults

This type is mainly used as a context manager:

async with Runtime() as runtime:
    async with runtime.create_runner("path/to/model.onnx") as runner:
        outputs = await runner.run(inputs)

The runtime acts as a scope for all subsequent inference sessions. Its lifetime starts when the object is successfully created, and ends when the close method is called for the first time. All other methods will fail when the runtime has been closed.

async close()

Tries to close the runtime if not yet closed. Waits until the runtime is indeed closed, or it takes too much time to close (in which case the runtime may still be open).

The context manager internally calls this method at the end, and warns when the timeout has been reached.

Returns:

True if the runtime has been closed in time.

Most other methods are documented in the Runner API and Queue API sections.

class furiosa.runtime.sync.Runtime(device=None)

Same as furiosa.runtime.Runtime, but all async methods are made synchronous.

with Runtime() as runtime:
    with runtime.create_runner("path/to/model.onnx") as runner:
        outputs = runner.run(inputs)

Device Specification

Runtime identifies a set of NPU devices by a textual string.

Implicit, any available device

ARCH(X)*Y denotes any set of available devices where:

  • ARCH is a target architecture and currently should be warboy.

  • X is the number of PEs to be used per each device.

  • Y is the number of devices to be used.

(1) can be omitted, so for example warboy*1 is a valid specification.

Device and PE index pair

npu:X:Y denotes a PE number Y in the device number X, where X and Y are 0-based indices.

npu:X:Y-Z denotes a fused PE made of two PEs npu:X:Y and npu:X:Z. Intermediate tensors may occupy multiple PE worth of memory in this mode.

Note

Device and PE indices are determined by the kernel driver, and so should not be heavily relied upon especially when there are multiple devices.

Raw device name

npuXpeY and npuXpeY-Z are same as npu:X:Y and npu:X:Y-Z respectively.

Those names are same as raw device file names in /dev.

Deprecated since version 0.10.0.

Multiple devices

Aforementioned specifications can be connected with ,. For example npu:1:0,npu:1:1 is a valid specification.

Warning

Implicit specifications will allocate devices in greedy manner, so the runtime may fail to initialize even when there exist valid allocations. Mixed specifications like warboy*1,npu:1:0 is not recommended for this reason.

Environment variables

Following environment variables will be used for the default device specification when no explicit device specification is given.

FURIOSA_DEVICES

New in version 0.10.

Takes precedence over NPU_DEVNAME if given.

This environment variable is current only.

NPU_DEVNAME

Deprecated since version 0.10: Use FURIOSA_DEVICES instead.

Note

Environment variables can never override the explicit specification.

Default devices

If there were no device specification or relevant environment variables, then it’s assumed that there is a single Warboy device with 2 fused PEs and the default specification warboy(2)*1 is used.

Model Metadata

Changed in version 0.10: Those types were provided by the furiosa.runtime.model module.

class furiosa.runtime.Axis

Represents an inferred axis of tensors.

This type is used only for compiler diagnostics and doesn’t affect the execution.

class property WIDTH: Axis
class property HEIGHT: Axis
class property CHANNEL: Axis
class property BATCH: Axis
class property UNKNOWN: Axis

Constants for well-known axes:

Property

Abbreviation

Description

WIDTH

W

Width

HEIGHT

H

Height

CHANNEL

C

Depth, or input channel for convolutions

BATCH

N

Batch, or output channel for convolutions

UNKNOWN

?

Other axes compiler has failed to infer

This enumeration also contains some private axes internally used by the compiler. Their names and meanings are not stable and can change at any time.

class furiosa.runtime.DataType(v)

Represents a data type for each item in the tensor.

Parameters:

v (dtype or ndarray) – A value to determine its data type

The constructor can be also used to determine an NPU-supported data type from other objects:

import numpy as np
DataType(np.float32)                       # => DataType.FLOAT32
DataType(np.zeros((3, 3), dtype='int8'))   # => DataType.INT8
class property FLOAT16: DataType
class property FLOAT32: DataType
class property BFLOAT16: DataType
class property INT8: DataType
class property INT16: DataType
class property INT32: DataType
class property INT64: DataType
class property UINT8: DataType

Constants for supported types:

Property

Description

FLOAT16

IEEE 754 half-precision (binary16) floating point type

FLOAT32

IEEE 754 single-precision (binary32) floating point type

BFLOAT16

Bfloat16 floating point type

INT8

8-bit signed integer type

INT16

16-bit signed integer type

INT32

32-bit signed integer type

INT64

64-bit signed integer type

UINT8

8-bit unsigned integer type

This enumeration also contains some private types internally used by the compiler. Their representations are not stable and can change at any time.

property numpy: numpy.dtype

Returns a corresponding Numpy data type object.

Raises:

ValueError – if this type has no Numpy equivalent

property numpy_dtype: type

Returns a Numpy type object corresponding to this data type, like numpy.int8.

Raises:

ValueError – if this type has no Numpy equivalent

Deprecated since version 0.10: Contrary to its name, it didn’t return numpy.dtype which was misleading and thus deprecated. Use the numpy property instead.

class furiosa.runtime.TensorDesc

Describes a single tensor in the input or output.

property name: str or None

A tensor name if any. This is only available for some model formats.

property ndim: int

The number of dimensions.

dim(idx)

The size of idx-th dimension.

Parameters:

idx (int) – 0-based dimension index

Return type:

int

property shape: tuple[int, ...]

The shape of given tensor.

desc.shape is conceptually same as (desc.dim(0), ..., desc.dim(desc.ndim - 1)).

axis(idx)

The compiler-inferred axis for idx-th dimension. Defaults to Axis.UNKNOWN.

Parameters:

idx (int) – 0-based dimension index

Return type:

Axis

property size: int

The byte size of given tensor.

stride(idx)

The stride of idx-th dimension in items. It’s a distance between two adjacent elements in given dimension; this convention notably differs from ndarray.strides which is in bytes.

Parameters:

idx (int) – 0-based dimension index

Return type:

int

property length: int

The number of total elements in given tensor.

property format: str

A concatenation of abbreviated axes in given tensor, e.g. NCHW.

property dtype: DataType

A data type for each element.

property numpy_dtype: numpy.dtype

A Numpy data type for each element. Same as desc.dtype.numpy.

class furiosa.runtime.Model

Describes a single model with possibly multiple input and output tensors.

property input_num: int

The number of input tensors.

property output_num: int

The number of output tensors.

input(idx)

A description for idx-th input tensor.

Parameters:

idx (int) – 0-based tensor index

Return type:

TensorDesc

output(idx)

A description for idx-th output tensor.

Parameters:

idx (int) – 0-based tensor index

Return type:

TensorDesc

inputs()

A list of descriptions for each input tensor.

Return type:

list[TensorDesc]

outputs()

A list of descriptions for each output tensor.

Return type:

list[TensorDesc]

summary()

A human-readable summary of given model:

>>> print(model.summary())
Inputs:
{0: TensorDesc(shape=(1, 1, 28, 28), dtype=FLOAT32, format=NCHW, size=3136, len=784)}
Outputs:
{0: TensorDesc(shape=(1, 10), dtype=FLOAT32, format=??, size=40, len=10)}
Return type:

str

print_summary()

Same as print(model.summary()).

Runner API

New in version 0.10.

Runner APIs provide a simple functional interface via a single Runner class.

async furiosa.runtime.Runtime.create_runner(model, *, worker_num=None, batch_size=None)

Creates a new inference session for given model.

Parameters:
  • model (ModelSource) – Model path or data

  • worker_num (int or None) – The number of worker threads

  • batch_size (int or None) – The number of batches per each run

Return type:

Runner

async furiosa.runtime.create_runner(model, *, device=None, worker_num=None, batch_size=None)

Same as above, but the runtime is implicitly created and will be closed when the session couldn’t be created or gets closed.

See create_runner and Runtime for arguments.

class furiosa.runtime.Runner

An inference session returned by Runtime.create_runner or create_runner.

This type is mainly used as a context manager:

async with runtime.create_runner("path/to/model.onnx") as runner:
    outputs = await runner.run(inputs)
property model: Model

Informations about the associated model:

async with runtime.create_runner("path/to/model.onnx") as runner:
    model = session.model
    for i in range(model.num_inputs):
        print(f"Input tensor #{i}:", model.input(i))

When the batch size is given, the first dimension of all input and output tensors is multiplied by that batch size. This dimension generally corresponds to the BATCH axis.

async run(inputs)

Runs a single inference.

Parameters:

inputs (TensorArray) – Input tensors

Return type:

list[numpy.ndarray]

Input tensors are not copied to a new buffer. Modifications during the inference may result in an unexpected output; the runtime only ensures that such modifications do not cause a crash.

async close()

Tries to close the session if not yet closed. Waits until the session is indeed closed, or it takes too much time to close (in which case the runtime may still be open). If the session was created via the top-level create_runner function, the implicitly initialized runtime is also closed as well.

The context manager internally calls this method at the end, and warns when the timeout has been reached. It is also called at the unspecified point after Runner is subject to the garbage collection.

Returns:

True if the session (and the runtime if any) has been closed in time.

Synchronous versions of those interfaces are also available through furiosa.runtime.sync:

furiosa.runtime.sync.Runtime.create_runner(model, *, worker_num=None, batch_size=None)

Synchronous version of Runtime.create_runner above.

furiosa.runtime.sync.create_runner(model, *, worker_num=None, batch_size=None)

Synchronous version of create_runner above.

class furiosa.runtime.sync.Runner

Synchronous version of Runner above.

Legacy Interface

furiosa.runtime.session.create(model, *, device=None, worker_num=None, batch_size=None, compiler_hints=None)

Compiles given model if needed, allocate an NPU device, and initializes a new inference session.

Parameters:
  • model (ModelSource) – Model path or data

  • device (str or None) – A textual device specification, see the section for defaults

  • worker_num (int or None) – The number of worker threads

  • batch_size (int or None) – The number of batches per each run

  • compiler_hints (bool or None) – If True, compiler prints additional hint messages

Return type:

Session

Changed in version 0.10: All optional arguments are now keyword-only. Positional arguments are still accepted and behave identically for now, but will warn against the future incompatibility.

class furiosa.runtime.session.Session

An inference session returned by create.

This type is mainly used as a context manager:

with create("path/to/model.onnx") as session:
    outputs = session.run(inputs)

Queue API

New in version 0.10.

Queue APIs provide two objects Submitter and Receiver for separately handling inputs and outputs. These are named so because they represent two queues around the actual processing:

Input queue

Holds submitted input tensors until some worker is available and can process them.

Output queue

Holds output tensors that have been completed processed but not yet read.

Both have a configurable but finite size, so submitting inputs too quickly or failing to receive outputs in time will block further processing.

async furiosa.runtime.Runtime.create_queue(model, *, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)

Creates a new inference session for given model.

Parameters:
  • model (ModelSource) – Model path or data

  • worker_num (int or None) – The number of worker threads

  • batch_size (int or None) – The number of batches per each run

  • input_queue_size (int or None) – The input queue size

  • output_queue_size (int or None) – The output queue size

Return type:

tuple[Submitter, Receiver], but see below

It is also possible to call this function directly as an asynchronous context manager:

async with runtime.create_queue("path/to/model.onnx",
                                ) as (submitter, receiver):
    submit_task = asyncio.create_task(submit_with(submitter))
    recv_task = asyncio.create_task(recv_with(receiver))
    await submit_task
    await recv_task
async furiosa.runtime.create_queue(model, *, device=None, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)

Same as above, but the runtime is implicitly created and will be closed when the session couldn’t be created or gets closed.

See create_queue and Runtime for arguments.

class furiosa.runtime.Submitter

A submitting half of the inference session returned by Runtime.create_queue or create_queue.

This type is mainly used as a context manager:

submitter, receiver = await runtime.create_queue("path/to/model.onnx")
async with submitter:
    await submitter.submit(inputs)
async with receiver:
    _, outputs = await receiver.recv()
property model: Model

Informations about the associated model:

submitter, receiver = await runtime.create_queue("path/to/model.onnx")
async with submitter:
    model = submitter.model
    for i in range(model.num_inputs):
        print(f"Input tensor #{i}:", model.input(i))

When the batch size is given, the first dimension of all input and output tensors is multiplied by that batch size. This dimension generally corresponds to the BATCH axis.

allocate()

Returns a list of fresh tensors, suitable as input tensors. Their initial contents are not specified (but probably zeroed).

Return type:

list[numpy.ndarray]

While this is no different than creating tensors yourself, the runtime may allocate tensors in the device-friendly way so it is recommended to use this method whenever appropriate.

async submit(inputs, context=None)

Submits a single inference with an associated value.

Parameters:
  • inputs (TensorArray) – Input tensors

  • context – Associated value, to distinguish output tensors from Receiver

The method will return immediately, unless the input queue is full in which case the method will be blocked. Output tensors would be available through Receiver later.

Input tensors are not copied to a new buffer. Modifications during the inference may result in an unexpected output; the runtime only ensures that such modifications do not cause a crash.

Warning

Associated values should be simple values like an integer or a UUID, as they are retained as long as a progress can be made and can cause logical memory leaks.

async close()

Tries to close the submitter if not yet closed. Waits until the submitter is indeed closed, or it takes too much time to close.

This does not close the corresponding Receiver. Any remaining tensors in the input queue will be processed nevertheless, and then the output queue will be signaled that no more results would be returned.

The context manager internally calls this method at the end, and warns when the timeout has been reached. It is also called at the unspecified point after Submitter is subject to the garbage collection.

Returns:

True if the submitter has been closed in time.

class furiosa.runtime.Receiver

A receiving half of the inference session returned by Runtime.create_queue or create_queue.

This type is mainly used as a context manager:

submitter, receiver = await runtime.create_queue("path/to/model.onnx")
async with submitter:
    await submitter.submit(inputs)
async with receiver:
    _, outputs = await receiver.recv()
property model: Model

Informations about the associated model:

submitter, receiver = await runtime.create_queue("path/to/model.onnx")
async with receiver:
    model = receiver.model
    for i in range(model.num_outputs):
        print(f"Output tensor #{i}:", model.output(i))

The same remarks to Submitter apply when the batch size is given.

The type can be used as an asynchronous iterable, which only finishes when either the corresponding Submitter or the receiver itself has been closed:

submitter, receiver = await runtime.create_async("path/to/model.onnx")
task = asyncio.create_task(submit_task(submitter))
async for context, outputs in receiver:
    handle_output(context, outputs)

Note that in this usage a async with block is not strictly needed as Submitter should have been already closed when the loop finishes.

It is also possible to manually receive results:

async recv()

Waits for a single inference result, and returns it with the associated value.

Returns:

A tuple of the associated value and output tensors, in this order

Return type:

tuple[any, TensorArray]

The runtime guarantees that each inference result is received at most once, but the completion order may differ from the submission order. Put the associated value to Submitter.submit to recover the original order.

Multiple parallel recv calls are fine but do not have additional benefits. On the other hands, if no recv calls are made for a while, the output queue eventually fills up and would block any further processing.

Note

This method does not support timeout unlike others, because asyncio.wait_for provides an idiomatic way to do that:

try:
    async def recv():
        context, outputs = await receiver.recv()

    task = asyncio.create_task(recv())
    await asyncio.wait_for(task, timeout=10)
except asyncio.TimeoutError:  # Not the built-in `TimeoutError`!
    print('Timed out!')
async close()

Tries to close the receiver if not yet closed. This also notifies the corresponding Submitter and will block further submissions. Waits until the receiver is indeed closed, or it takes too much time to close.

If the receiver was created via the top-level create_queue function, the implicitly initialized runtime is also closed as well.

The context manager internally calls this method at the end, and warns when the timeout has been reached. It is also called at the unspecified point after Receiver is subject to the garbage collection.

Returns:

True if the receiver (and the runtime if any) has been closed in time.

Synchronous versions of those interfaces are also available through furiosa.runtime.sync:

furiosa.runtime.sync.Runtime.create_queue(model, *, device=None, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)

Synchronous version of Runtime.create_queue above.

furiosa.runtime.sync.create_queue(model, *, device=None, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)

Synchronous version of create_queue above.

class furiosa.runtime.sync.Submitter

Synchronous version of Submitter above.

class furiosa.runtime.sync.Receiver

Synchronous version of Receiver above, with the following exception:

recv(timeout=None)

Unlike an asynchronous version, this method does accept the timeout because it is otherwise impossible to specify the timeout in a synchronous context.

Parameters:

timeout (float or None) – The timeout in seconds

Raises:

TimeoutError – When timeout is given and the timeout has been reached

Note

Unlike CompletionQueue.recv, this method raises a standard Python exception.

Legacy Interface

furiosa.runtime.session.create_async(model, *, device=None, worker_num=None, batch_size=None, compiler_hints=None, input_queue_size=None, output_queue_size=None)

Compiles given model if needed, allocate an NPU device, and initializes a new inference session.

Parameters:
  • model (ModelSource) – Model path or data

  • device (str or None) – A textual device specification, see the section for defaults

  • worker_num (int or None) – The number of worker threads

  • batch_size (int or None) – The number of batches per each run

  • compiler_hints (bool or None) – If True, compiler prints additional hint messages

  • input_queue_size (int or None) – The input queue size

  • output_queue_size (int or None) – The output queue size

Return type:

tuple[AsyncSession, CompletionQueue]

Changed in version 0.10: All optional arguments are now keyword-only. Positional arguments are still accepted and behave identically for now, but will warn against the future incompatibility.

Legacy only: The input and output queue were unbounded when their sizes are not given. This was never documented and the current API doesn’t support unbounded queues. In order to faciliate migration, the legacy API will continue to use much larger default queue sizes though.

class furiosa.runtime.session.AsyncSession

A submitting half of the inference session returned by create_async.

This type is mainly used as a context manager:

session, queue = create_async("path/to/model.onnx")
with session:
    session.send(inputs)
with queue:
    _, outputs = queue.recv()
class furiosa.runtime.session.CompletionQueue

A receiving half of the inference session returned by create_async.

This type is mainly used as a context manager:

session, queue = create_async("path/to/model.onnx")
with session:
    session.send(inputs)
with queue:
    _, outputs = queue.recv()

Profiler

The furiosa.runtime.profiler module provides a basic profiling facility.

class furiosa.runtime.profiler.RecordFormat

Profiler format to record profile data.

class property ChromeTrace: RecordFormat
class property PandasDataFrame: RecordFormat
class furiosa.runtime.profiler.Resource

Profiler target resource to be recorded.

class property All: Resource
class property Cpu: Resource
class property Npu: Resource
class furiosa.runtime.profiler.profile(resource=Resource.All, format=RecordFormat.ChromeTrace, file=None)

Profile context manager:

from furiosa.runtime.profiler import RecordFormat
with open("profile.json", "w") as f:
    with profile(format=RecordFormat.ChromeTrace, file=f) as profiler:
        # Profiler enabled from here
        with profiler.record("Inference"):
            ... # Profiler recorded with span named 'Inference'

Please note that activating profiler incurs non-trivial performance overhead.

Parameters:
  • resource (Resource) – Target resource to be profiled. e.g. Cpu or Npu.

  • format (RecordFormat) – Profiler format. e.g. ChromeTrace.

  • file – File that traces will be written to. If the file is not given, temporary file will be used. Format of traces writter to the file depends on the record format: Chrome tracing for ChromeTrace and CSV for PandasDataFrame.

Raises:

runtime.ProfilerErorr – Raise when failed to create profiler with given configuration.

Changed in version 0.10.0: Record format-dependent parameter config is replaced by parameter file.

record(name: str)

Returns a profiler record context manager.

At the enter of the context, profiler span trace with specified name starts, and ends at the exit of the context. Traces for execution within the context will be recorded as a child of the created span.

Parameters:

name (str) – Profiler record span name.

Return type:

ProfilerRecordObject

pause()

Pause profiling temporarily within the profiling context.

Profiler stops recording traces and works with minimal overhead until resume is called. If the profiler is already in a paused state, it does nothing. This method can be called arbitrary number of times within the profiling context. Note that this method doesn’t affect the time measurment of the profiler, meaning that it just acts as if no events occured during pause, creating empty interval.

New in version 0.10.

resume()

Resume paused profiling.

If the profiler is not in a paused state, this method does nothing. For more details on paused state, see pause.

New in version 0.10.

get_pandas_dataframe()

Returns a DataFrame for recorded traces.

The returned dataframe will look like:

                            trace_id    parent_span_id           span_id                start                  end      cat                     name  thread.id  dram_base  pe_index  execution_index  instruction_index  operator_index        dur
0    6ffe9ac3080814bc134ae4c5e58269e0  0000000000000000  a61dd01a47ce8dee  1690798389820453606  1690798390204660478  Runtime                  Compile         35       <NA>      <NA>             <NA>               <NA>            <NA>  384206872
1    079f8437488528d5768780162ed59374  0000000000000000  2d18b0e17e760325  1690798390205840825  1690798390267819096  Runtime            ProgramBinary         26       <NA>      <NA>             <NA>               <NA>            <NA>   61978271
2    fb4610c2fd1be67e63e01ca9169b6fef  0000000000000000  2a092524d04a4077  1690798390267849007  1690798390267857471  Runtime             AllocateDram         26       <NA>      <NA>             <NA>               <NA>            <NA>       8464
3    009b425f06ca0065a64f0586d1a999b0  0000000000000000  cdac229f8d8569d7  1690798389793627190  1690798390268011450  Runtime                 Register          1       <NA>      <NA>             <NA>               <NA>            <NA>  474384260
4    348ee82fdf97fad9f782cc12a58d447d  0000000000000000  59b5a5d06439f9f1  1690798390270474367  1690798390270526470  Runtime                  Enqueue         32       <NA>      <NA>             <NA>               <NA>            <NA>      52103
5    27efb2c82a5ac93bed911142e9187c45  174b38c90d1f7a10  ff7c4f8798d75b63  1690798390270558295  1690798390270570293  Runtime               FeedInputs         32       <NA>      <NA>             <NA>               <NA>            <NA>      11998
Return type:

pandas.DataFrame

get_pandas_dataframe_with_filter(column: str, value: str)

Returns a DataFrame only containing rows whose column column has a value value in the DataFrame returned by get_pandas_dataframe.

Parameters:
  • column (str) – name of the column used for filtering.

  • value (str) – column value for target rows.

Return type:

pandas.DataFrame

get_cpu_pandas_dataframe()

Returns a DataFrame for execution on CPU whose category is “Runtime”.

Return type:

pandas.DataFrame

get_npu_pandas_dataframe()

Returns a DataFrame for execution on NPU whose category is “NPU”.

Return type:

pandas.DataFrame

print_npu_operators()

Print npu operator report to the terminal.

Each row contains information for different NPU operator.

The output will look like:

┌─────────────────────────┬──────────────────────┬───────┐
│ name                     average_elapsed (ns)  count │
╞═════════════════════════╪══════════════════════╪═══════╡
│ LowLevelConv2d           5119.9375             64    │
│ LowLevelDepthwiseConv2d  1091.0                56    │
│ LowLevelPad              561.482143            56    │
│ LowLevelExpand           2.0                   16    │
│ LowLevelSlice            2.0                   16    │
│ LowLevelReshape          2.0                   232   │
└─────────────────────────┴──────────────────────┴───────┘
print_npu_executions()

Print npu execution report to the terminal. Each row contains information for each NPU task execution.

The output will look like:

┌─────────────────┬─────────────────┬──────────┬────────────────┬───────────┬─────────┬────────────┐
│ trace_id         span_id          pe_index  execution_inde  NPU Total  NPU Run  NPU IoWait │
│                                             x                                              │
╞═════════════════╪═════════════════╪══════════╪════════════════╪═══════════╪═════════╪════════════╡
│ 39ffc55ef7b2177  555899badb3f8e5  0         0               116971     105186   11785      │
│ 5338e9fa2d1fb70  8                                                                         │
│ f1                                                                                         │
│ 9c8aa64bbb878e3  4e9a13e698f4fa1  0         0               117011     105186   11825      │
│ b62194f8dec670e  9                                                                         │
│ 3c                                                                                         │
│ 0ce2a8ce2c591e3  5cd8a081758f41c  0         0               116961     105185   11776      │
│ 4e92e0c421f3946  4                                                                         │
│ 14                                                                                         │
│ a941ace17a2c5e6  a3726d0ebb2705c  0         0               116909     105186   11723      │
│ 15a8f05d8872fa9  d                                                                         │
│ 32                                                                                         │
└─────────────────┴─────────────────┴──────────┴────────────────┴───────────┴─────────┴────────────┘
print_external_operators()

Print external operator report to the terminal.

The output will look like:

┌────────────────────────────┬──────────────────┬───────────┬────────────┬────────────────┬────────┐
│ trace_id                    span_id           thread.id  name        operator_index  dur    │
╞════════════════════════════╪══════════════════╪═══════════╪════════════╪════════════════╪════════╡
│ 7d65ff7ae5587d3345d5df5a77  53e3fb9c02964361  35         Quantize    0               175246 │
│ ebfaad                                                                                      │
│ 33371e09f89cfa06c41286df13  8d5a00c6e4e8c2c0  35         Lower       1               183803 │
│ 11a30f                                                                                      │
│ 9f7df939abc20da11431c18024  064dacd9a108c4a0  35         Unlower     2               60459  │
│ c41af1                                                                                      │
│ 1bda703f4ffc878a4294ec6253  cb2f103208d2fa45  35         Dequantize  3               19468  │
│ 3ac8d0                                                                                      │
│ 9f769c8951f39d98e6ee216e34  91c0bdd8c5b81327  35         Quantize    0               85724  │
│ 6bc7e5                                                                                      │
│ 048e5cab6d4d676e4e6b10e827  714834cb8dc59f4b  35         Lower       1               306893 │
│ 6b5489                                                                                      │
│ 6bb481ca3b1eab843b795a7865  46d538d7b4c72d25  35         Unlower     2               73313  │
│ 49558b                                                                                      │
│ e0f13a5fb0bf2942ed16171844  71a432e3e3dc55f6  35         Dequantize  3               37079  │
│ ccb293                                                                                      │
│ c3b2fdba80f16f781e4b313af3  066e3916590edf38  35         Quantize    0               67805  │
│ a571b6                                                                                      │
│ 4bebe5f61e84d502f5b5dc7d22  9dfb32069b2b5a98  35         Lower       1               310303 │
│ 1e4f5a                                                                                      │
│ b8cabf53ae39a4ad18144af26c  cb767fbdd718da89  35         Unlower     2               72378  │
│ e136c9                                                                                      │
│ e40956dda5ecc0a1774e39377b  090d9cbd5e60032a  35         Dequantize  3               33951  │
│ 1ef245                                                                                      │
│ 3d13f40c0966940439adcce4c1  4702a924e4b6d38b  35         Quantize    0               76999  │
│ 9981a4                                                                                      │
│ 53746b998038e994a5e378f9a2  522b7a9e354de2b3  35         Lower       1               339339 │
│ 8caa5a                                                                                      │
│ 76a2080bc0917db26b7313e29a  4b1b0bf55f344258  35         Unlower     2               74708  │
│ 81def3                                                                                      │
│ 4c0a04dc669b04416f18e781d6  8eb55fb2b618933a  35         Dequantize  3               33661  │
│ afc3c6                                                                                      │
└────────────────────────────┴──────────────────┴───────────┴────────────┴────────────────┴────────┘
print_inferences()

Print inference summary to the terminal.

The output will look like:

┌──────────────────────────────────┬──────────────────┬───────────┬─────────┐
│ trace_id                          span_id           thread.id  dur     │
╞══════════════════════════════════╪══════════════════╪═══════════╪═════════╡
│ b5edc4d40493df2028d186d4073d5487  a61af3b9ad70b956  1          4430749 │
│ 983e136f80e1c070dca3ad854f37cf97  f2dd4e899d52531d  1          4181392 │
│ dada8a5830272b5d255fda801568fc5e  cda7127619be5c33  1          4275757 │
│ 6ad054709f76095c86fba6dcd9254ca0  9d7f199a445003aa  1          4215571 │
└──────────────────────────────────┴──────────────────┴───────────┴─────────┘
print_summary()

Print overall summary to the terminal.

The output will look like:

================================================
  Inference Results Summary
================================================
Inference counts                : 4
Min latency (ns)                : 4181392
Max latency (ns)                : 4430749
Mean latency (ns)               : 4275867
Median latency (ns)             : 4245664
90.0 percentile Latency (ns)    : 4384251
95.0 percentile Latency (ns)    : 4407500
97.0 percentile Latency (ns)    : 4416800
99.0 percentile Latency (ns)    : 4426099
99.9 percentile Latency (ns)    : 4430284
export_chrome_trace(filename: str)

Export profiled data to a chrome tracing file.

Parameters:

filename (str) – filename to write.

Diagnostics

furiosa.runtime.full_version()

Returns a string for the full version of this package.

exception furiosa.runtime.FuriosaRuntimeError

A base class for all runtime exceptions.

Changed in version 0.10: Previously this was named NativeException and was a subclass of FuriosaError. A large number of subclasses were also removed, to make a room for the upcoming restructuring.

exception furiosa.runtime.FuriosaRuntimeWarning

A base class for all runtime warnings.

New in version 0.10.

This package currently doesn’t have a dedicated logging interface, but the following environment variable can be used to set the basic logging level:

FURIOSA_LOG_LEVEL

If set, should be either one of the following, in the order of decreasing verbosity:

  • INFO (default)

  • WARN

  • ERROR

  • OFF

Legacy Supports

The following submodules only exist to support legacy codes.

Deprecated since version 0.10: All legacy submodules will warn on import and any major incompatibilites will be reported. Also unless otherwise stated, all functions and types described here are either not available or will always fail. The [legacy] extra may be used to disable these behavior at the expense of the lack of the current APIs.

furiosa.runtime.compiler

furiosa.runtime.compiler.generate_compiler_log_path()

Generates a log path for compilation log.

Return type:

Path

This function is legacy only.

furiosa.runtime.consts

This module is empty.

furiosa.runtime.envs

furiosa.runtime.envs.current_npu_device()

Returns the current npu device name.

Returns:

NPU device name

Return type:

str

This function is legacy only.

furiosa.runtime.envs.is_compile_log_enabled()

Returns True or False whether the compile log is enabled or not.

Returns:

True if the compile log is enabled, or False.

Return type:

bool

This function is legacy only.

furiosa.runtime.envs.log_dir()

Returns FURIOSA_LOG_DIR where the logs are stored.

Returns:

The log directory of Furiosa SDK

Return type:

str

This function is legacy only.

furiosa.runtime.envs.profiler_output()

Returns FURIOSA_PROFILER_OUTPUT_PATH where profiler outputs written.

For compatibility, NUX_PROFILER_PATH is also currently being supported, but it will be deprecated by FURIOSA_PROFILER_OUTPUT_PATH later.

Returns:

The file path of profiler output if specified, or None.

Return type:

str or None

This function is legacy only.

furiosa.runtime.errors

exception furiosa.runtime.errors.IncompatibleModel
exception furiosa.runtime.errors.CompilationFailed
exception furiosa.runtime.errors.InternalError
exception furiosa.runtime.errors.UnsupportedTensorType
exception furiosa.runtime.errors.UnsupportedDataType
exception furiosa.runtime.errors.IncompatibleApiClientError
exception furiosa.runtime.errors.InvalidYamlException
exception furiosa.runtime.errors.ApiClientInitFailed
exception furiosa.runtime.errors.NoApiKeyException
exception furiosa.runtime.errors.InvalidSessionOption
exception furiosa.runtime.errors.QueueWaitTimeout
exception furiosa.runtime.errors.SessionTerminated
exception furiosa.runtime.errors.DeviceBusy
exception furiosa.runtime.errors.InvalidInput
exception furiosa.runtime.errors.TensorNameNotFound
exception furiosa.runtime.errors.UnsupportedFeature
exception furiosa.runtime.errors.InvalidCompilerConfig
exception furiosa.runtime.errors.SessionClosed

Specific subclasses of FuriosaRuntimeError.

They are legacy only and have been mostly replaced with standard Python exceptions like TypeError or ValueError, except for the following:

This module also reexports FuriosaRuntimeError as NativeException.

furiosa.runtime.model

This module reexports Model.

furiosa.runtime.session

This module reexports Session, AsyncSession, CompletionQueue, create and create_async.

furiosa.runtime.tensor

furiosa.runtime.tensor.numpy_dtype(value)

Returns numpy dtype from any eligible object.

furiosa.runtime.tensor.zeros(desc)

Returns a zero tensor matching given tensor description.

Parameters:

desc (TensorDesc) – Tensor description

furiosa.runtime.tensor.rand(desc)

Returns a random tensor matching given tensor description.

Parameters:

desc (TensorDesc) – Tensor description

This is meant to be a quick test function and no guarantees are made for quality, performance and correctness.

Legacy only: The function was only correctly defined for floating point types.

This module also contains Tensor and TensorArray which are described separately.

This module also reexports Axis, DataType and TensorDesc.