Kubernetes Support

Kuberentes is an open source platform for managing containerized workloads and services. Furiosa SDK provides the following components to support the Kubernetes environment.

FuriosaAI NPU Device Plugin (Introduction to Kubernetes Device Plugin)
FuriosaAI NPU Feature Discovery (Introduction to Node Feature Discovery)

The two components above provide the following functions.

Make the Kubernetes cluster aware of the NPUs available to the node.
Through Kubernetes spec.containers[].resources.limits , schedule the NPU simultaneously when distributing Pod workload.
Identify NPU information of NPU-equipped machine, and register it as node label (you can selectively schedule Pods with this information and nodeSelector)
- The node-feature-discovery needs to be installed to the cluster, and the nfd-worker Pod must be running in the nodes equipped with NPUs.

The setup process for Kubernetes support is as follows.

1. Preparing NPU nodes

Requirements for Kubernetes nodes are as follows.

Ubuntu 20.04 or higher
Intel compatible CPU

You also need to install NPU driver and toolkit on each node of NPU-equipped Kubernetes. If the APT server is set up (see APT server configuration), you can easily install as follows.

apt-get update && apt install -y furiosa-driver-warboy furiosa-toolkit

Once the required package is installed as above, you can check for NPU recognition as follows, with the furiosactl command included in furiosa-toolkit. If the NPU is not recognized with the command below, try again after rebooting - depending on the environment.

$ furiosactl info
+------+--------+----------------+-------+--------+--------------+
| NPU  | Name   | Firmware       | Temp. | Power  | PCI-BDF      |
+------+--------+----------------+-------+--------+--------------+
| npu1 | warboy | 1.6.0, 3c10fd3 |  54°C | 0.99 W | 0000:44:00.0 |
+------+--------+----------------+-------+--------+--------------+

2. Installing Node Feature Discovery

In order to make Kubernetes to recognize NPUs, you need to install Node Feature Discovery. By running the command as shown in the example below, if there is a node label that begins with feature.node.kubernetes.io/..., Node Feature Discovery’s DaemonSet has already been installed

$ kubectl get no -o json | jq '.items[].metadata.labels'
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/os": "linux",
  "feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
  ...

If you do not have the Node Feature Discovery in your cluster, refer to the following document.
- Quick start / Installation
The following options must be applied when executing Node Feature Discovery.
- beta.furiosa.ai needs to be included in the --extra-label-ns option of nfd-master
- In the config file of nfd-worker, * Only vendor in the sources.pci.deviceLabelFields value * "12" must be included as a value in sources.pci.deviceClassWhitelist

Note

Installing Node Feature Discovery is not mandatory, but is recommended. The next step

will explain the additional tasks that must be performed if you are not using Node Feature Discovery.

3. Installing Device Plugin and NPU Feature Discovery

When the NPU node is ready, install Device Plugin and NPU Feature Discovery’s DaemonSet as follows.

kubectl apply -f https://raw.githubusercontent.com/furiosa-ai/furiosa-sdk/v0.9.0/kubernetes/deployments/device-plugin.yaml
kubectl apply -f https://raw.githubusercontent.com/furiosa-ai/furiosa-sdk/v0.9.0/kubernetes/deployments/npu-feature-discovery.yaml

After executing the above command, you can check whether the installed daemonset is functioning normally with the kubectl get daemonset -n kube-system command. For reference, the DaemonSet is distributed only to nodes equipped with NPUs, and uses alpha.furiosa.ai/npu.family=warboy information that the Node Feature Discovery (feature.node.kubernetes.io/pci-1ed2.present=true) attaches to each node.

 $ kubectl get daemonset -n kube-system
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                      AGE
 furiosa-device-plugin          3         3         3       3            3           feature.node.kubernetes.io/pci-1ed2.present=true   128m
 furiosa-npu-feature-discovery  3         3         3       3            3           feature.node.kubernetes.io/pci-1ed2.present=true   162m

The metadata attached by the Node Feature Discovery is shown in the following table.

NPU Node Labels
Label	Value	Description
beta.furiosa.ai/npu.count	1	The number of NPUs e x b number of NPUs attached to node
beta.furiosa.ai/npu.product	warboy, warboyB0	NPU Product Name (Code)
beta.furiosa.ai/npu.family	warboy, renegade	NPU Architecture (Family)
beta.furiosa.ai/machine.vendor	(depends on machine)	Machine Manufacturer
beta.furiosa.ai/machine.name	(depends on machine)	The Nmae of Machine (Code)
beta.furiosa.ai/driver.version	1.3.0	NPU Device Driver Version
beta.furiosa.ai/driver.version.major	1	Major Version Number of NPU Device Driver Version
beta.furiosa.ai/driver.version.minor	3	Minor Version Number of NPU Device Driver
beta.furiosa.ai/driver.version.patch	0	Patch Version Number of NPU Device Driver
beta.furiosa.ai/driver.reference	57ac7b0	Build Commit Hash of NPU Device Driver

If you want to check node labels, then execute the kubectl get nodes --show-labels command. If you see labels which start with beta.furiosa.ai Node Feature Discovery is successfully installed.

kubectl get nodes --show-labels

warboy-node01     Ready   <none>  65d   v1.20.10   beta.furiosa.ai/npu.count=1,beta.furiosa.ai/npu.product=warboy...,kubernetes.io/os=linux
warboy-node02     Ready   <none>  12d   v1.20.10   beta.furiosa.ai/npu.count=1,beta.furiosa.ai/npu.product=warboy...,kubernetes.io/os=linux

Device Plugin Configuration

Execution options for Device Plugin can be set by the argument of command line or configuration file.

Command Line Arguments

The option can be set by the k8s-device-plugin command as follows.

$ k8s-device-plugin --interval 10

For the Pod or DaemonSet specification command line arguments can be set as follows.

apiVersion: v1
kind: Pod
metadata:
  name: furiosa-device-plugin
  namespace: kube-system
spec:
  containers:
    - name: device-plugin
      image: ghcr.io/furiosa-ai/k8s-device-plugin:latest
      command: ["/usr/bin/k8s-device-plugin"]
      args: ["--interval", "10"]
# (the reset is omitted)

arguments of k8s-device-plugin
Item	Explanation	Default Value
default-pe	default core type when pod is allocated (Fusion/Single)	Fusion
interval	interval for searching device (seconds)	10
disabled-devices	devices not for allocations (several devices can be designated using comma)
plugin-dir	directory path of kubelet device-plugin	/var/lib/kubelet/device-plugins
socket-name	file name of socket created under <plugin-dir>	furiosa-npu
resource-name	name of NPU resource registered for k8s node	beta.furiosa.ai/npu

Setting Configuration File

You may set configuration file by executing k8s-device-plugin command with argument config-file. If config-file is set then the other arguments are not permitted.

$ k8s-device-plugin --config-file /etc/furiosa/device-plugin.conf

/etc/furiosa/device-plugin.conf

interval: 10
defaultPe: Fusion
disabledDevices:             # device npu1 equipped in warboy-node01 will not be used
  - devName: npu1
    nodeName: warboy-node01
pluginDir: /var/lib/kubelet/device-plugins
socketName: furiosa-npu
resourceName: beta.furiosa.ai/npu

Configuration file is a text file with Yaml format. The modification of file contents is applied to Device Plugin immediately. Updated configuration is recorded on log of Device Plugin. (but, modifications on pluginDir , socketName, or resourceName require reboot.)

3. Installing Device Plugin and NPU Feature Discovery provides device-plugin.yaml which is default configuration file based on ConfigMap. If you want to modify execution options of Device Plugin, modify ConfigMap. Once modified ConfigMap is applied to Pod, Device Plugin reads the ConfigMap and then reflects modification.

$ kubectl edit configmap npu-device-plugin -n kube-system

configmap/npu-device-plugin

apiVersion: v1
data:
  config.yaml: |
    defaultPe: Fusion
    interval: 15
    disabledDevices:
      - devName: npu2
        nodeName: npu-001
kind: ConfigMap

4. Creating a Pod with NPUs

To allocate NPU to a Pod, add as shown below to spec.containers[].resources.limits.

resources:
    limits:
        beta.furiosa.ai/npu: "1" # requesting 1 NPU

Full example for Pod creation is as follows.

$ cat > npu-pod.yaml <<EOL
apiVersion: v1
kind: Pod
metadata:
  name: npu-pod
spec:
  containers:
    - name: npu-pod
      image: ubuntu:focal
      resources:
        limits:
          cpu: "4"
          memory: "8Gi"
          beta.furiosa.ai/npu: "1"
        requests:
          cpu: "4"
          memory: "8Gi"
          beta.furiosa.ai/npu: "1"
EOL

$ kubectl apply -f npu-pod.yaml

After Pod creation, you can check NPU allocation as follows.

$ kubectl get pods npu-pod -o yaml | grep alpha.furiosa.ai/npu
    beta.furiosa.ai/npu: "1"
    beta.furiosa.ai/npu: "1"

The SDK application automatically recognizes the allocated NPU device. If there are multiple NPU devices on a node, you can check which device is allocated as follows:

$ kubectl exec npu-pod -it -- /bin/bash

$ echo $NPU_DEVNAME
npu0pe0-1

If furiosa-toolkit is installed in the Pod, you can check for more detailed device information using the furiosactl command as shown below.

See APT server configuration for installation guide using APT.

$ furiosactl
furiosactl 0.11.0
Controls the FURIOSA NPU. Find more information at https://furiosa.ai/

USAGE:
    furiosactl <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    help       Prints this message or the help of the given subcommand(s)
    info       Shows details of NPU devices
    list       Display NPU device list
    ps         Prints a list of processes using the NPU device.
    top        Periodically print the NPU utilization by process and device.
    version    Prints the furiosactl version information

$ furiosactl info
+------+--------+----------------+-------+--------+--------------+
| NPU  | Name   | Firmware       | Temp. | Power  | PCI-BDF      |
+------+--------+----------------+-------+--------+--------------+
| npu1 | warboy | 1.6.0, 3c10fd3 |  54°C | 0.99 W | 0000:44:00.0 |
+------+--------+----------------+-------+--------+--------------+

5. NPU monitoring

If you install npu-metrics-exporter, its daemon set and service will be created in your kubernetes cluster. The Pod that is executed through DaemonSet outputs various NPU status information that may be useful for monitoring. The data is expressed in Prometheus format. If Prometheus is installed, and service discovery is active, Prometheus will automatically collect data through the Exporter.

The collected data may be reviewed with visualization tools such as Grafana.

npu-metrics-exporter collection category list
Name	Details
furiosa_npu_alive	NPU operation status (1:normal)
furiosa_npu_uptime	NPU operation time (s)
furiosa_npu_error	Number of detected NPU errors
furiosa_npu_hw_temperature	Temperature of each NPU components (°mC)
furiosa_npu_hw_power	NPU instantaneous power usage (µW)
furiosa_npu_hw_voltage	NPU instantaenous voltage (mV)
furiosa_npu_hw_current	NPU instantaneous current (mA)