furiosa.runtime package
This package provides high-level Python APIs for Furiosa AI NPUs.
Runtime Variants
The package is divided into three main modules:
Module |
Methods |
Generation |
---|---|---|
Async |
Current |
|
|
Sync |
Current |
Sync |
The current generation API was first introduced in FuriosaRT 0.10.0.
The legacy generation API (furiosa.runtime.session
) is still supported for the backward compatibility
but slated for removal in the future release.
See Use of legacy modules for the more information.
Each module further contains two different sets of interfaces:
- Runners
Runners provide a single class with
run
method.Multiple
run
calls can be active at any time, which is either possible by multiple tasks (for async modules) or threads (for sync modules).- Queues
Queues provide two separate classes with
send
andrecv
methods respectively. Eachsend
should be paired with a context value to distinguish differentrecv
outputs.While multiple inputs can be sent at any time, only a single task or thread should call
recv
.
Use of legacy modules
Deprecated since version 0.10.0: Any new use is strongly discouraged.
The package contains many historical modules,
including furiosa.runtime.session
and furiosa.runtime.errors
(see Legacy Supports for the full list).
As of 0.10.0 they are deprecated and will report deprecation warnings.
Legacy modules are largely wrappers around current APIs by default, as such there exist some slight incompatibilities, most notably the lack of error subclasses. Those intercompatibilities are marked as legacy only or current only.
If this is not desirable you may enable the [legacy]
extra on install
to force the old implementation and disable compatibility warnings.
Please note that you cannot use current-only APIs when the extra is enabled.
For example:
Availability |
Example |
Default |
With |
---|---|---|---|
Current only |
✔ |
✘ |
|
Current & Legacy |
✔ |
✔ |
|
Legacy |
⚠️ |
✔ |
|
Legacy only |
✘ |
✔ |
Model Inputs
- class furiosa.runtime.ModelSource
Represents a value that specifies how to load the model. This is not a real class, but a type alias for either:
Any
Path
-like type or a string, representing a path to the model file.Bytes, byte array or any class with
__bytes__
method, representing a raw model data.
Future versions may allow additional types.
New in version 0.10.
See Compiler for supported model formats and restrictions.
Tensor Inputs and Outputs
Tensors use a numpy.ndarray
type from Numpy as a primary representation,
but the following type aliases exist for documentation and typing purposes.
- class furiosa.runtime.TensorArray
A type alias for a list of input tensors. This is not a real class, but can be either:
Any iterable of
Tensor
s (but the iterable itself shouldn’t be anndarray
)A single
Tensor
, if only one tensor is required
The corresponding output type is always a list of
ndarray
s.New in version 0.10.
- class furiosa.runtime.Tensor
A type alias for a single input tensor. This is not a real class, but can be either:
A single
numpy.ndarray
A single scalar value or
numpy.generic
, if the tensor was zero-dimensionalAny other value that implements the NumPy array interface
The corresponding output type is always a single
ndarray
.New in version 0.10.
Legacy Interface
Legacy interfaces support the same input and output types, but due to the technical reasons, following concrete types are used for tensor outputs.
- class furiosa.runtime.tensor.TensorArray
A subclass of
list
, so that all items should beTensor
and additional methods are provided.Legacy only: This class used to be not subclassed from
list
and supported onlylen(array)
,array[i]
andarray[i] = tensor
.- view()
Returns a list of
ndarray
views to all tensors, as inTensor.view
.- Return type:
- numpy()
Returns a list of copies of
ndarray
from all tensors, as inTensor.numpy
.- Return type:
- class furiosa.runtime.tensor.Tensor
A subclass of
numpy.ndarray
providing additional methods.Legacy only: This class used to be not subclassed from
numpy.ndarray
and no methods were available except documented here.- property numpy_dtype: type
Same as
tensor.view().dtype.type
. Returns a Numpy type object likenumpy.int8
.Deprecated since version 0.10: Contrary to its name, it didn’t return
numpy.dtype
which was misleading and thus deprecated. Usetensor.view().dtype
instead.
- copy_from(data)
Replaces the entire tensor with given data.
- Parameters:
data (numpy.ndarray or numpy.generic) – What to replace this tensor
- view()
Returns a
ndarray
view to this tensor. Multiple views to the same tensor refer to the same memory region.- Return type:
Runtime Object
Runtime objects are associated with a set of NPU devices and can be used to create inference sessions.
New in version 0.10: Previously sessions could only be created directly via
create
and create_async
functions.
- class furiosa.runtime.Runtime(device=None)
Asynchronous runtime object.
- Parameters:
device (str or None) – A textual device specification, see the section for defaults
This type is mainly used as a context manager:
async with Runtime() as runtime: async with runtime.create_runner("path/to/model.onnx") as runner: outputs = await runner.run(inputs)
The runtime acts as a scope for all subsequent inference sessions. Its lifetime starts when the object is successfully created, and ends when the
close
method is called for the first time. All other methods will fail when the runtime has been closed.- async close()
Tries to close the runtime if not yet closed. Waits until the runtime is indeed closed, or it takes too much time to close (in which case the runtime may still be open).
The context manager internally calls this method at the end, and warns when the timeout has been reached.
- Returns:
True
if the runtime has been closed in time.
Most other methods are documented in the Runner API and Queue API sections.
- class furiosa.runtime.sync.Runtime(device=None)
Same as
furiosa.runtime.Runtime
, but allasync
methods are made synchronous.with Runtime() as runtime: with runtime.create_runner("path/to/model.onnx") as runner: outputs = runner.run(inputs)
Device Specification
Runtime identifies a set of NPU devices by a textual string.
- Implicit, any available device
ARCH(X)*Y
denotes any set of available devices where:ARCH is a target architecture and currently should be
warboy
.X is the number of PEs to be used per each device.
Y is the number of devices to be used.
(1)
can be omitted, so for examplewarboy*1
is a valid specification.- Device and PE index pair
npu:X:Y
denotes a PE number Y in the device number X, where X and Y are 0-based indices.npu:X:Y-Z
denotes a fused PE made of two PEsnpu:X:Y
andnpu:X:Z
. Intermediate tensors may occupy multiple PE worth of memory in this mode.Note
Device and PE indices are determined by the kernel driver, and so should not be heavily relied upon especially when there are multiple devices.
- Raw device name
npuXpeY
andnpuXpeY-Z
are same asnpu:X:Y
andnpu:X:Y-Z
respectively.Those names are same as raw device file names in
/dev
.Deprecated since version 0.10.0.
- Multiple devices
Aforementioned specifications can be connected with
,
. For examplenpu:1:0,npu:1:1
is a valid specification.Warning
Implicit specifications will allocate devices in greedy manner, so the runtime may fail to initialize even when there exist valid allocations. Mixed specifications like
warboy*1,npu:1:0
is not recommended for this reason.- Environment variables
Following environment variables will be used for the default device specification when no explicit device specification is given.
- FURIOSA_DEVICES
New in version 0.10.
Takes precedence over
NPU_DEVNAME
if given.This environment variable is current only.
- NPU_DEVNAME
Deprecated since version 0.10: Use
FURIOSA_DEVICES
instead.
Note
Environment variables can never override the explicit specification.
- Default devices
If there were no device specification or relevant environment variables, then it’s assumed that there is a single Warboy device with 2 fused PEs and the default specification
warboy(2)*1
is used.
Model Metadata
Changed in version 0.10: Those types were provided by the furiosa.runtime.model
module.
- class furiosa.runtime.Axis
Represents an inferred axis of tensors.
This type is used only for compiler diagnostics and doesn’t affect the execution.
- class property WIDTH: Axis
- class property HEIGHT: Axis
- class property CHANNEL: Axis
- class property BATCH: Axis
- class property UNKNOWN: Axis
Constants for well-known axes:
Property
Abbreviation
Description
W
Width
H
Height
C
Depth, or input channel for convolutions
N
Batch, or output channel for convolutions
?
Other axes compiler has failed to infer
This enumeration also contains some private axes internally used by the compiler. Their names and meanings are not stable and can change at any time.
- class furiosa.runtime.DataType(v)
Represents a data type for each item in the tensor.
The constructor can be also used to determine an NPU-supported data type from other objects:
import numpy as np DataType(np.float32) # => DataType.FLOAT32 DataType(np.zeros((3, 3), dtype='int8')) # => DataType.INT8
- class property FLOAT16: DataType
- class property FLOAT32: DataType
- class property BFLOAT16: DataType
- class property INT8: DataType
- class property INT16: DataType
- class property INT32: DataType
- class property INT64: DataType
- class property UINT8: DataType
Constants for supported types:
Property
Description
IEEE 754 half-precision (binary16) floating point type
IEEE 754 single-precision (binary32) floating point type
Bfloat16 floating point type
8-bit signed integer type
16-bit signed integer type
32-bit signed integer type
64-bit signed integer type
8-bit unsigned integer type
This enumeration also contains some private types internally used by the compiler. Their representations are not stable and can change at any time.
- property numpy: numpy.dtype
Returns a corresponding Numpy data type object.
- Raises:
ValueError – if this type has no Numpy equivalent
- property numpy_dtype: type
Returns a Numpy type object corresponding to this data type, like
numpy.int8
.- Raises:
ValueError – if this type has no Numpy equivalent
Deprecated since version 0.10: Contrary to its name, it didn’t return
numpy.dtype
which was misleading and thus deprecated. Use thenumpy
property instead.
- class furiosa.runtime.TensorDesc
Describes a single tensor in the input or output.
- property name: str or None
A tensor name if any. This is only available for some model formats.
- dim(idx)
The size of
idx
-th dimension.
- property shape: tuple[int, ...]
The shape of given tensor.
desc.shape
is conceptually same as(desc.dim(0), ..., desc.dim(desc.ndim - 1))
.
- axis(idx)
The compiler-inferred axis for
idx
-th dimension. Defaults toAxis.UNKNOWN
.
- stride(idx)
The stride of
idx
-th dimension in items. It’s a distance between two adjacent elements in given dimension; this convention notably differs fromndarray.strides
which is in bytes.
- property numpy_dtype: numpy.dtype
A Numpy data type for each element. Same as
desc.dtype.numpy
.
- class furiosa.runtime.Model
Describes a single model with possibly multiple input and output tensors.
- input(idx)
A description for
idx
-th input tensor.- Parameters:
idx (int) – 0-based tensor index
- Return type:
- output(idx)
A description for
idx
-th output tensor.- Parameters:
idx (int) – 0-based tensor index
- Return type:
- inputs()
A list of descriptions for each input tensor.
- Return type:
- outputs()
A list of descriptions for each output tensor.
- Return type:
- summary()
A human-readable summary of given model:
>>> print(model.summary()) Inputs: {0: TensorDesc(shape=(1, 1, 28, 28), dtype=FLOAT32, format=NCHW, size=3136, len=784)} Outputs: {0: TensorDesc(shape=(1, 10), dtype=FLOAT32, format=??, size=40, len=10)}
- Return type:
- print_summary()
Same as
print(model.summary())
.
Runner API
New in version 0.10.
Runner APIs provide a simple functional interface via a single Runner
class.
- async furiosa.runtime.Runtime.create_runner(model, *, worker_num=None, batch_size=None)
Creates a new inference session for given model.
- Parameters:
model (ModelSource) – Model path or data
worker_num (int or None) – The number of worker threads
batch_size (int or None) – The number of batches per each run
- Return type:
- async furiosa.runtime.create_runner(model, *, device=None, worker_num=None, batch_size=None)
Same as above, but the runtime is implicitly created and will be closed when the session couldn’t be created or gets closed.
See
create_runner
andRuntime
for arguments.
- class furiosa.runtime.Runner
An inference session returned by
Runtime.create_runner
orcreate_runner
.This type is mainly used as a context manager:
async with runtime.create_runner("path/to/model.onnx") as runner: outputs = await runner.run(inputs)
- property model: Model
Informations about the associated model:
async with runtime.create_runner("path/to/model.onnx") as runner: model = session.model for i in range(model.num_inputs): print(f"Input tensor #{i}:", model.input(i))
When the batch size is given, the first dimension of all input and output tensors is multiplied by that batch size. This dimension generally corresponds to the
BATCH
axis.
- async run(inputs)
Runs a single inference.
- Parameters:
inputs (TensorArray) – Input tensors
- Return type:
Input tensors are not copied to a new buffer. Modifications during the inference may result in an unexpected output; the runtime only ensures that such modifications do not cause a crash.
- async close()
Tries to close the session if not yet closed. Waits until the session is indeed closed, or it takes too much time to close (in which case the runtime may still be open). If the session was created via the top-level
create_runner
function, the implicitly initialized runtime is also closed as well.The context manager internally calls this method at the end, and warns when the timeout has been reached. It is also called at the unspecified point after
Runner
is subject to the garbage collection.- Returns:
True
if the session (and the runtime if any) has been closed in time.
Synchronous versions of those interfaces are also available through furiosa.runtime.sync
:
- furiosa.runtime.sync.Runtime.create_runner(model, *, worker_num=None, batch_size=None)
Synchronous version of
Runtime.create_runner
above.
- furiosa.runtime.sync.create_runner(model, *, worker_num=None, batch_size=None)
Synchronous version of
create_runner
above.
Legacy Interface
- furiosa.runtime.session.create(model, *, device=None, worker_num=None, batch_size=None, compiler_hints=None)
Compiles given model if needed, allocate an NPU device, and initializes a new inference session.
- Parameters:
model (ModelSource) – Model path or data
device (str or None) – A textual device specification, see the section for defaults
worker_num (int or None) – The number of worker threads
batch_size (int or None) – The number of batches per each run
compiler_hints (bool or None) – If
True
, compiler prints additional hint messages
- Return type:
Changed in version 0.10: All optional arguments are now keyword-only. Positional arguments are still accepted and behave identically for now, but will warn against the future incompatibility.
Queue API
New in version 0.10.
Queue APIs provide two objects Submitter
and Receiver
for separately handling inputs and outputs.
These are named so because they represent two queues around the actual processing:
- Input queue
Holds submitted input tensors until some worker is available and can process them.
- Output queue
Holds output tensors that have been completed processed but not yet read.
Both have a configurable but finite size, so submitting inputs too quickly or failing to receive outputs in time will block further processing.
- async furiosa.runtime.Runtime.create_queue(model, *, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)
Creates a new inference session for given model.
- Parameters:
model (ModelSource) – Model path or data
worker_num (int or None) – The number of worker threads
batch_size (int or None) – The number of batches per each run
input_queue_size (int or None) – The input queue size
output_queue_size (int or None) – The output queue size
- Return type:
It is also possible to call this function directly as an asynchronous context manager:
async with runtime.create_queue("path/to/model.onnx", ) as (submitter, receiver): submit_task = asyncio.create_task(submit_with(submitter)) recv_task = asyncio.create_task(recv_with(receiver)) await submit_task await recv_task
- async furiosa.runtime.create_queue(model, *, device=None, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)
Same as above, but the runtime is implicitly created and will be closed when the session couldn’t be created or gets closed.
See
create_queue
andRuntime
for arguments.
- class furiosa.runtime.Submitter
A submitting half of the inference session returned by
Runtime.create_queue
orcreate_queue
.This type is mainly used as a context manager:
submitter, receiver = await runtime.create_queue("path/to/model.onnx") async with submitter: await submitter.submit(inputs) async with receiver: _, outputs = await receiver.recv()
- property model: Model
Informations about the associated model:
submitter, receiver = await runtime.create_queue("path/to/model.onnx") async with submitter: model = submitter.model for i in range(model.num_inputs): print(f"Input tensor #{i}:", model.input(i))
When the batch size is given, the first dimension of all input and output tensors is multiplied by that batch size. This dimension generally corresponds to the
BATCH
axis.
- allocate()
Returns a list of fresh tensors, suitable as input tensors. Their initial contents are not specified (but probably zeroed).
- Return type:
While this is no different than creating tensors yourself, the runtime may allocate tensors in the device-friendly way so it is recommended to use this method whenever appropriate.
- async submit(inputs, context=None)
Submits a single inference with an associated value.
- Parameters:
inputs (TensorArray) – Input tensors
context – Associated value, to distinguish output tensors from
Receiver
The method will return immediately, unless the input queue is full in which case the method will be blocked. Output tensors would be available through
Receiver
later.Input tensors are not copied to a new buffer. Modifications during the inference may result in an unexpected output; the runtime only ensures that such modifications do not cause a crash.
Warning
Associated values should be simple values like an integer or a
UUID
, as they are retained as long as a progress can be made and can cause logical memory leaks.
- async close()
Tries to close the submitter if not yet closed. Waits until the submitter is indeed closed, or it takes too much time to close.
This does not close the corresponding
Receiver
. Any remaining tensors in the input queue will be processed nevertheless, and then the output queue will be signaled that no more results would be returned.The context manager internally calls this method at the end, and warns when the timeout has been reached. It is also called at the unspecified point after
Submitter
is subject to the garbage collection.- Returns:
True
if the submitter has been closed in time.
- class furiosa.runtime.Receiver
A receiving half of the inference session returned by
Runtime.create_queue
orcreate_queue
.This type is mainly used as a context manager:
submitter, receiver = await runtime.create_queue("path/to/model.onnx") async with submitter: await submitter.submit(inputs) async with receiver: _, outputs = await receiver.recv()
- property model: Model
Informations about the associated model:
submitter, receiver = await runtime.create_queue("path/to/model.onnx") async with receiver: model = receiver.model for i in range(model.num_outputs): print(f"Output tensor #{i}:", model.output(i))
The same remarks to
Submitter
apply when the batch size is given.
The type can be used as an asynchronous iterable, which only finishes when either the corresponding
Submitter
or the receiver itself has been closed:submitter, receiver = await runtime.create_async("path/to/model.onnx") task = asyncio.create_task(submit_task(submitter)) async for context, outputs in receiver: handle_output(context, outputs)
Note that in this usage a
async with
block is not strictly needed asSubmitter
should have been already closed when the loop finishes.It is also possible to manually receive results:
- async recv()
Waits for a single inference result, and returns it with the associated value.
- Returns:
A tuple of the associated value and output tensors, in this order
- Return type:
tuple[any, TensorArray]
The runtime guarantees that each inference result is received at most once, but the completion order may differ from the submission order. Put the associated value to
Submitter.submit
to recover the original order.Multiple parallel
recv
calls are fine but do not have additional benefits. On the other hands, if norecv
calls are made for a while, the output queue eventually fills up and would block any further processing.Note
This method does not support
timeout
unlike others, becauseasyncio.wait_for
provides an idiomatic way to do that:try: async def recv(): context, outputs = await receiver.recv() task = asyncio.create_task(recv()) await asyncio.wait_for(task, timeout=10) except asyncio.TimeoutError: # Not the built-in `TimeoutError`! print('Timed out!')
- async close()
Tries to close the receiver if not yet closed. This also notifies the corresponding
Submitter
and will block further submissions. Waits until the receiver is indeed closed, or it takes too much time to close.If the receiver was created via the top-level
create_queue
function, the implicitly initialized runtime is also closed as well.The context manager internally calls this method at the end, and warns when the timeout has been reached. It is also called at the unspecified point after
Receiver
is subject to the garbage collection.- Returns:
True
if the receiver (and the runtime if any) has been closed in time.
Synchronous versions of those interfaces are also available through furiosa.runtime.sync
:
- furiosa.runtime.sync.Runtime.create_queue(model, *, device=None, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)
Synchronous version of
Runtime.create_queue
above.
- furiosa.runtime.sync.create_queue(model, *, device=None, worker_num=None, batch_size=None, input_queue_size=None, output_queue_size=None)
Synchronous version of
create_queue
above.
- class furiosa.runtime.sync.Receiver
Synchronous version of
Receiver
above, with the following exception:- recv(timeout=None)
Unlike an asynchronous version, this method does accept the timeout because it is otherwise impossible to specify the timeout in a synchronous context.
- Parameters:
timeout (float or None) – The timeout in seconds
- Raises:
TimeoutError – When
timeout
is given and the timeout has been reached
Note
Unlike
CompletionQueue.recv
, this method raises a standard Python exception.
Legacy Interface
- furiosa.runtime.session.create_async(model, *, device=None, worker_num=None, batch_size=None, compiler_hints=None, input_queue_size=None, output_queue_size=None)
Compiles given model if needed, allocate an NPU device, and initializes a new inference session.
- Parameters:
model (ModelSource) – Model path or data
device (str or None) – A textual device specification, see the section for defaults
worker_num (int or None) – The number of worker threads
batch_size (int or None) – The number of batches per each run
compiler_hints (bool or None) – If
True
, compiler prints additional hint messagesinput_queue_size (int or None) – The input queue size
output_queue_size (int or None) – The output queue size
- Return type:
Changed in version 0.10: All optional arguments are now keyword-only. Positional arguments are still accepted and behave identically for now, but will warn against the future incompatibility.
Legacy only: The input and output queue were unbounded when their sizes are not given. This was never documented and the current API doesn’t support unbounded queues. In order to faciliate migration, the legacy API will continue to use much larger default queue sizes though.
- class furiosa.runtime.session.AsyncSession
A submitting half of the inference session returned by
create_async
.This type is mainly used as a context manager:
session, queue = create_async("path/to/model.onnx") with session: session.send(inputs) with queue: _, outputs = queue.recv()
- class furiosa.runtime.session.CompletionQueue
A receiving half of the inference session returned by
create_async
.This type is mainly used as a context manager:
session, queue = create_async("path/to/model.onnx") with session: session.send(inputs) with queue: _, outputs = queue.recv()
Profiler
The furiosa.runtime.profiler
module provides a basic profiling facility.
- class furiosa.runtime.profiler.RecordFormat
Profiler format to record profile data.
- class property ChromeTrace: RecordFormat
- class property PandasDataFrame: RecordFormat
- class furiosa.runtime.profiler.Resource
Profiler target resource to be recorded.
- class furiosa.runtime.profiler.profile(resource=Resource.All, format=RecordFormat.ChromeTrace, file=None)
Profile context manager:
from furiosa.runtime.profiler import RecordFormat with open("profile.json", "w") as f: with profile(format=RecordFormat.ChromeTrace, file=f) as profiler: # Profiler enabled from here with profiler.record("Inference"): ... # Profiler recorded with span named 'Inference'
Please note that activating profiler incurs non-trivial performance overhead.
- Parameters:
resource (Resource) – Target resource to be profiled. e.g.
Cpu
orNpu
.format (RecordFormat) – Profiler format. e.g. ChromeTrace.
file – File that traces will be written to. If the file is not given, temporary file will be used. Format of traces writter to the file depends on the record format: Chrome tracing for
ChromeTrace
and CSV forPandasDataFrame
.
- Raises:
runtime.ProfilerErorr – Raise when failed to create profiler with given configuration.
Changed in version 0.10.0: Record format-dependent parameter
config
is replaced by parameterfile
.- record(name: str)
Returns a profiler record context manager.
At the enter of the context, profiler span trace with specified name starts, and ends at the exit of the context. Traces for execution within the context will be recorded as a child of the created span.
- Parameters:
name (str) – Profiler record span name.
- Return type:
ProfilerRecordObject
- pause()
Pause profiling temporarily within the profiling context.
Profiler stops recording traces and works with minimal overhead until
resume
is called. If the profiler is already in a paused state, it does nothing. This method can be called arbitrary number of times within the profiling context. Note that this method doesn’t affect the time measurment of the profiler, meaning that it just acts as if no events occured during pause, creating empty interval.New in version 0.10.
- resume()
Resume paused profiling.
If the profiler is not in a paused state, this method does nothing. For more details on paused state, see
pause
.New in version 0.10.
- get_pandas_dataframe()
Returns a
DataFrame
for recorded traces.The returned dataframe will look like:
trace_id parent_span_id span_id start end cat name thread.id dram_base pe_index execution_index instruction_index operator_index dur 0 6ffe9ac3080814bc134ae4c5e58269e0 0000000000000000 a61dd01a47ce8dee 1690798389820453606 1690798390204660478 Runtime Compile 35 <NA> <NA> <NA> <NA> <NA> 384206872 1 079f8437488528d5768780162ed59374 0000000000000000 2d18b0e17e760325 1690798390205840825 1690798390267819096 Runtime ProgramBinary 26 <NA> <NA> <NA> <NA> <NA> 61978271 2 fb4610c2fd1be67e63e01ca9169b6fef 0000000000000000 2a092524d04a4077 1690798390267849007 1690798390267857471 Runtime AllocateDram 26 <NA> <NA> <NA> <NA> <NA> 8464 3 009b425f06ca0065a64f0586d1a999b0 0000000000000000 cdac229f8d8569d7 1690798389793627190 1690798390268011450 Runtime Register 1 <NA> <NA> <NA> <NA> <NA> 474384260 4 348ee82fdf97fad9f782cc12a58d447d 0000000000000000 59b5a5d06439f9f1 1690798390270474367 1690798390270526470 Runtime Enqueue 32 <NA> <NA> <NA> <NA> <NA> 52103 5 27efb2c82a5ac93bed911142e9187c45 174b38c90d1f7a10 ff7c4f8798d75b63 1690798390270558295 1690798390270570293 Runtime FeedInputs 32 <NA> <NA> <NA> <NA> <NA> 11998
- Return type:
pandas.DataFrame
- get_pandas_dataframe_with_filter(column: str, value: str)
Returns a DataFrame only containing rows whose column
column
has a valuevalue
in the DataFrame returned byget_pandas_dataframe
.
- get_cpu_pandas_dataframe()
Returns a
DataFrame
for execution on CPU whose category is “Runtime”.- Return type:
pandas.DataFrame
- get_npu_pandas_dataframe()
Returns a
DataFrame
for execution on NPU whose category is “NPU”.- Return type:
pandas.DataFrame
- print_npu_operators()
Print npu operator report to the terminal.
Each row contains information for different NPU operator.
The output will look like:
┌─────────────────────────┬──────────────────────┬───────┐ │ name ┆ average_elapsed (ns) ┆ count │ ╞═════════════════════════╪══════════════════════╪═══════╡ │ LowLevelConv2d ┆ 5119.9375 ┆ 64 │ │ LowLevelDepthwiseConv2d ┆ 1091.0 ┆ 56 │ │ LowLevelPad ┆ 561.482143 ┆ 56 │ │ LowLevelExpand ┆ 2.0 ┆ 16 │ │ LowLevelSlice ┆ 2.0 ┆ 16 │ │ LowLevelReshape ┆ 2.0 ┆ 232 │ └─────────────────────────┴──────────────────────┴───────┘
- print_npu_executions()
Print npu execution report to the terminal. Each row contains information for each NPU task execution.
The output will look like:
┌─────────────────┬─────────────────┬──────────┬────────────────┬───────────┬─────────┬────────────┐ │ trace_id ┆ span_id ┆ pe_index ┆ execution_inde ┆ NPU Total ┆ NPU Run ┆ NPU IoWait │ │ ┆ ┆ ┆ x ┆ ┆ ┆ │ ╞═════════════════╪═════════════════╪══════════╪════════════════╪═══════════╪═════════╪════════════╡ │ 39ffc55ef7b2177 ┆ 555899badb3f8e5 ┆ 0 ┆ 0 ┆ 116971 ┆ 105186 ┆ 11785 │ │ 5338e9fa2d1fb70 ┆ 8 ┆ ┆ ┆ ┆ ┆ │ │ f1 ┆ ┆ ┆ ┆ ┆ ┆ │ │ 9c8aa64bbb878e3 ┆ 4e9a13e698f4fa1 ┆ 0 ┆ 0 ┆ 117011 ┆ 105186 ┆ 11825 │ │ b62194f8dec670e ┆ 9 ┆ ┆ ┆ ┆ ┆ │ │ 3c ┆ ┆ ┆ ┆ ┆ ┆ │ │ 0ce2a8ce2c591e3 ┆ 5cd8a081758f41c ┆ 0 ┆ 0 ┆ 116961 ┆ 105185 ┆ 11776 │ │ 4e92e0c421f3946 ┆ 4 ┆ ┆ ┆ ┆ ┆ │ │ 14 ┆ ┆ ┆ ┆ ┆ ┆ │ │ a941ace17a2c5e6 ┆ a3726d0ebb2705c ┆ 0 ┆ 0 ┆ 116909 ┆ 105186 ┆ 11723 │ │ 15a8f05d8872fa9 ┆ d ┆ ┆ ┆ ┆ ┆ │ │ 32 ┆ ┆ ┆ ┆ ┆ ┆ │ └─────────────────┴─────────────────┴──────────┴────────────────┴───────────┴─────────┴────────────┘
- print_external_operators()
Print external operator report to the terminal.
The output will look like:
┌────────────────────────────┬──────────────────┬───────────┬────────────┬────────────────┬────────┐ │ trace_id ┆ span_id ┆ thread.id ┆ name ┆ operator_index ┆ dur │ ╞════════════════════════════╪══════════════════╪═══════════╪════════════╪════════════════╪════════╡ │ 7d65ff7ae5587d3345d5df5a77 ┆ 53e3fb9c02964361 ┆ 35 ┆ Quantize ┆ 0 ┆ 175246 │ │ ebfaad ┆ ┆ ┆ ┆ ┆ │ │ 33371e09f89cfa06c41286df13 ┆ 8d5a00c6e4e8c2c0 ┆ 35 ┆ Lower ┆ 1 ┆ 183803 │ │ 11a30f ┆ ┆ ┆ ┆ ┆ │ │ 9f7df939abc20da11431c18024 ┆ 064dacd9a108c4a0 ┆ 35 ┆ Unlower ┆ 2 ┆ 60459 │ │ c41af1 ┆ ┆ ┆ ┆ ┆ │ │ 1bda703f4ffc878a4294ec6253 ┆ cb2f103208d2fa45 ┆ 35 ┆ Dequantize ┆ 3 ┆ 19468 │ │ 3ac8d0 ┆ ┆ ┆ ┆ ┆ │ │ 9f769c8951f39d98e6ee216e34 ┆ 91c0bdd8c5b81327 ┆ 35 ┆ Quantize ┆ 0 ┆ 85724 │ │ 6bc7e5 ┆ ┆ ┆ ┆ ┆ │ │ 048e5cab6d4d676e4e6b10e827 ┆ 714834cb8dc59f4b ┆ 35 ┆ Lower ┆ 1 ┆ 306893 │ │ 6b5489 ┆ ┆ ┆ ┆ ┆ │ │ 6bb481ca3b1eab843b795a7865 ┆ 46d538d7b4c72d25 ┆ 35 ┆ Unlower ┆ 2 ┆ 73313 │ │ 49558b ┆ ┆ ┆ ┆ ┆ │ │ e0f13a5fb0bf2942ed16171844 ┆ 71a432e3e3dc55f6 ┆ 35 ┆ Dequantize ┆ 3 ┆ 37079 │ │ ccb293 ┆ ┆ ┆ ┆ ┆ │ │ c3b2fdba80f16f781e4b313af3 ┆ 066e3916590edf38 ┆ 35 ┆ Quantize ┆ 0 ┆ 67805 │ │ a571b6 ┆ ┆ ┆ ┆ ┆ │ │ 4bebe5f61e84d502f5b5dc7d22 ┆ 9dfb32069b2b5a98 ┆ 35 ┆ Lower ┆ 1 ┆ 310303 │ │ 1e4f5a ┆ ┆ ┆ ┆ ┆ │ │ b8cabf53ae39a4ad18144af26c ┆ cb767fbdd718da89 ┆ 35 ┆ Unlower ┆ 2 ┆ 72378 │ │ e136c9 ┆ ┆ ┆ ┆ ┆ │ │ e40956dda5ecc0a1774e39377b ┆ 090d9cbd5e60032a ┆ 35 ┆ Dequantize ┆ 3 ┆ 33951 │ │ 1ef245 ┆ ┆ ┆ ┆ ┆ │ │ 3d13f40c0966940439adcce4c1 ┆ 4702a924e4b6d38b ┆ 35 ┆ Quantize ┆ 0 ┆ 76999 │ │ 9981a4 ┆ ┆ ┆ ┆ ┆ │ │ 53746b998038e994a5e378f9a2 ┆ 522b7a9e354de2b3 ┆ 35 ┆ Lower ┆ 1 ┆ 339339 │ │ 8caa5a ┆ ┆ ┆ ┆ ┆ │ │ 76a2080bc0917db26b7313e29a ┆ 4b1b0bf55f344258 ┆ 35 ┆ Unlower ┆ 2 ┆ 74708 │ │ 81def3 ┆ ┆ ┆ ┆ ┆ │ │ 4c0a04dc669b04416f18e781d6 ┆ 8eb55fb2b618933a ┆ 35 ┆ Dequantize ┆ 3 ┆ 33661 │ │ afc3c6 ┆ ┆ ┆ ┆ ┆ │ └────────────────────────────┴──────────────────┴───────────┴────────────┴────────────────┴────────┘
- print_inferences()
Print inference summary to the terminal.
The output will look like:
┌──────────────────────────────────┬──────────────────┬───────────┬─────────┐ │ trace_id ┆ span_id ┆ thread.id ┆ dur │ ╞══════════════════════════════════╪══════════════════╪═══════════╪═════════╡ │ b5edc4d40493df2028d186d4073d5487 ┆ a61af3b9ad70b956 ┆ 1 ┆ 4430749 │ │ 983e136f80e1c070dca3ad854f37cf97 ┆ f2dd4e899d52531d ┆ 1 ┆ 4181392 │ │ dada8a5830272b5d255fda801568fc5e ┆ cda7127619be5c33 ┆ 1 ┆ 4275757 │ │ 6ad054709f76095c86fba6dcd9254ca0 ┆ 9d7f199a445003aa ┆ 1 ┆ 4215571 │ └──────────────────────────────────┴──────────────────┴───────────┴─────────┘
- print_summary()
Print overall summary to the terminal.
The output will look like:
================================================ Inference Results Summary ================================================ Inference counts : 4 Min latency (ns) : 4181392 Max latency (ns) : 4430749 Mean latency (ns) : 4275867 Median latency (ns) : 4245664 90.0 percentile Latency (ns) : 4384251 95.0 percentile Latency (ns) : 4407500 97.0 percentile Latency (ns) : 4416800 99.0 percentile Latency (ns) : 4426099 99.9 percentile Latency (ns) : 4430284
Diagnostics
- furiosa.runtime.full_version()
Returns a string for the full version of this package.
- exception furiosa.runtime.FuriosaRuntimeError
A base class for all runtime exceptions.
Changed in version 0.10: Previously this was named
NativeException
and was a subclass ofFuriosaError
. A large number of subclasses were also removed, to make a room for the upcoming restructuring.
- exception furiosa.runtime.FuriosaRuntimeWarning
A base class for all runtime warnings.
New in version 0.10.
This package currently doesn’t have a dedicated logging interface, but the following environment variable can be used to set the basic logging level:
- FURIOSA_LOG_LEVEL
If set, should be either one of the following, in the order of decreasing verbosity:
INFO
(default)WARN
ERROR
OFF
Legacy Supports
The following submodules only exist to support legacy codes.
Deprecated since version 0.10: All legacy submodules will warn on import and any major incompatibilites will be reported.
Also unless otherwise stated, all functions and types described here are
either not available or will always fail.
The [legacy]
extra may be used to disable these behavior
at the expense of the lack of the current APIs.
furiosa.runtime.compiler
- furiosa.runtime.compiler.generate_compiler_log_path()
Generates a log path for compilation log.
- Return type:
Path
This function is legacy only.
furiosa.runtime.consts
This module is empty.
furiosa.runtime.envs
- furiosa.runtime.envs.current_npu_device()
Returns the current npu device name.
- Returns:
NPU device name
- Return type:
This function is legacy only.
- furiosa.runtime.envs.is_compile_log_enabled()
Returns True or False whether the compile log is enabled or not.
- Returns:
True if the compile log is enabled, or False.
- Return type:
This function is legacy only.
- furiosa.runtime.envs.log_dir()
Returns FURIOSA_LOG_DIR where the logs are stored.
- Returns:
The log directory of Furiosa SDK
- Return type:
This function is legacy only.
- furiosa.runtime.envs.profiler_output()
Returns FURIOSA_PROFILER_OUTPUT_PATH where profiler outputs written.
For compatibility, NUX_PROFILER_PATH is also currently being supported, but it will be deprecated by FURIOSA_PROFILER_OUTPUT_PATH later.
- Returns:
The file path of profiler output if specified, or None.
- Return type:
str or None
This function is legacy only.
furiosa.runtime.errors
- exception furiosa.runtime.errors.IncompatibleModel
- exception furiosa.runtime.errors.CompilationFailed
- exception furiosa.runtime.errors.InternalError
- exception furiosa.runtime.errors.UnsupportedTensorType
- exception furiosa.runtime.errors.UnsupportedDataType
- exception furiosa.runtime.errors.IncompatibleApiClientError
- exception furiosa.runtime.errors.InvalidYamlException
- exception furiosa.runtime.errors.ApiClientInitFailed
- exception furiosa.runtime.errors.NoApiKeyException
- exception furiosa.runtime.errors.InvalidSessionOption
- exception furiosa.runtime.errors.QueueWaitTimeout
- exception furiosa.runtime.errors.SessionTerminated
- exception furiosa.runtime.errors.DeviceBusy
- exception furiosa.runtime.errors.InvalidInput
- exception furiosa.runtime.errors.TensorNameNotFound
- exception furiosa.runtime.errors.UnsupportedFeature
- exception furiosa.runtime.errors.InvalidCompilerConfig
- exception furiosa.runtime.errors.SessionClosed
Specific subclasses of
FuriosaRuntimeError
.They are legacy only and have been mostly replaced with standard Python exceptions like
TypeError
orValueError
, except for the following:
This module also reexports FuriosaRuntimeError
as NativeException
.
furiosa.runtime.model
This module reexports Model
.
furiosa.runtime.session
This module reexports Session
, AsyncSession
, CompletionQueue
, create
and create_async
.
furiosa.runtime.tensor
- furiosa.runtime.tensor.numpy_dtype(value)
Returns numpy dtype from any eligible object.
- furiosa.runtime.tensor.zeros(desc)
Returns a zero tensor matching given tensor description.
- Parameters:
desc (TensorDesc) – Tensor description
- furiosa.runtime.tensor.rand(desc)
Returns a random tensor matching given tensor description.
- Parameters:
desc (TensorDesc) – Tensor description
This is meant to be a quick test function and no guarantees are made for quality, performance and correctness.
Legacy only: The function was only correctly defined for floating point types.
This module also contains Tensor
and TensorArray
which are described separately.
This module also reexports Axis
, DataType
and TensorDesc
.