GPGPU on Raspberry Pi

By Wolfgang Keller
Draft
Originally written 2021-11-01
Last modified 2022-11-19

Table of contents

VideoCore versions

There exist two versions of the VideoCore GPU whose instruction sets differs quite a lot:

A more detailled list can be found at Raspberry Pi Documentation - Processors [visited 2022-06-18T17:43:32Z].

Disassemblers and instruction encodings

Ressources

For building the disassembler and analyzing the instruction set, we use the following ressources:

VideoCore IV:

VideoCore VI:

Another library of Mesa that we use is libmesa_util.a. Its source code can be found at src/util · main · Mesa / mesa · GitLab [visited 2022-08-29T21:59:22Z].

Links to test cases for the instruction sets are documented further below in section .

Disassemblers

Build disassemblers

A disassembler for machine instructions of the VideoCore IV and VideoCore VI QPUs can be found at nubok/vcqpudisasm [visited 2022-08-28T10:50:40Z]. This repository was inspired by Terminus-IMRC/vc6qpudisas: Disassembler of VideoCore VI QPU [visited 2022-08-28T12:29:55Z].

Let's go through its install instructions to set it up. The following steps were tested on Raspberry Pi OS (32 bit; based on Debian Bullseye) and Ubuntu 20.04 (WSL2; x86-64).

Build Mesa

First setup Mesa. We need to install various packages that are required for building Mesa:

sudo apt-get install meson
sudo apt-get install python3-mako
sudo apt-get install libdrm-dev
sudo apt-get install flex
sudo apt-get install bison
sudo apt-get install libx11-dev
sudo apt-get install libxext-dev
sudo apt-get install libxfixes-dev
sudo apt-get install libxcb-glx0-dev
sudo apt-get install libxcb-shm0-dev
sudo apt-get install libx11-xcb-dev
sudo apt-get install libxcb-dri2-0-dev
sudo apt-get install libxcb-dri3-dev
sudo apt-get install libxcb-present-dev
sudo apt-get install libxshmfence-dev
sudo apt-get install libxxf86vm-dev
sudo apt-get install libxrandr-dev

If you do these steps in Ubuntu 20.04 (for example under WSL2), you might need to add

sudo apt-get install pkg-config

Now run

git clone https://gitlab.freedesktop.org/mesa/mesa.git --depth=1
cd mesa/
git fetch --all --tags
git checkout tags/mesa-21.3.9 -b mesa21_3_9
mkdir build/
cd build/
meson .. -Dgallium-drivers=vc4,v3d -Dvulkan-drivers=broadcom -Dplatforms=x11
ninja src/broadcom/qpu/libbroadcom_qpu.a src/util/libmesa_util.a src/gallium/drivers/vc4/libvc4.a
cd ../../

Remark: In the line

git checkout tags/mesa-21.3.9 -b mesa21_3_9

replace 21.3.9 by a suitable other (more recent) version if necessary.

Build vcqpudisasm

If necessary, install cmake via

sudo apt install cmake

Start with

git clone https://github.com/nubok/vcqpudisasm.git
cd vc6qpudisas/
mkdir build/
cd build/
cmake .. -DCMAKE_PREFIX_PATH="$(realpath ../../mesa);$(realpath ../../mesa/build)"
make

Run examples

Now for running some examples:

VideoCore IV: Run

./vc4qpudisasm <<< '0x10025020cc9e7081'
add rb0, r0, r2 ; v8adds r0, r0, r1

Other examples can be found at userland/host_applications/linux/apps/hello_pi/hello_fft/hex at master · raspberrypi/userland [visited 2022-08-28T12:51:09Z] (also available at firmware/opt/vc/src/hello_pi/hello_fft/hex at master · raspberrypi/firmware [visited 2022-08-28T12:52:14Z]).

VideoCore VI: There seem to exist three versions of V3D: 3.3, 4.1 and 4.2 (corresponding to values 33, 41 and 42 of the ver field of struct v3d_device_info); see

The GPU of the respective Raspberry Pi versions seems to use version 4.2 (value 42).

With this knowledge, you can run the example:

./vc6qpudisas 42 <<< '0x54001f4038f91fbf'
add  r0, r1, r2      ; fmul  rf61, rf62, rf63

Other examples can, as mentioned above, be found at src/broadcom/qpu/tests/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2022-07-24T14:17:12Z].

Tutorials

VideoCore IV

Setup

Setup an SD card with the Raspberry Pi OS (formerly called Raspbian) based on Debian 10 (“buster”). The newer versions of Raspberry Pi OS that are based on Debian 11 (“bullseye”) don't seem to work because they don't support the fkms driver. See BULLSEYE: "dtoverlay=vc4-fkms-v3d" OR "dtoverlay=vc4-kms-v3d" That is one of many Questions - Raspberry Pi Forums [visited 2022-06-21T18:31:19Z] for details. See also "New" old functionality with Raspberry Pi OS (Legacy) - Raspberry Pi [visited 2022-06-21T18:31:31Z].

Then do the following steps:

Just for the sake of completeness, here is a table with the various possible settings and whether the hello_fft.bin demo that we run in section section does run with this setting set:

Setting in /boot/config.txt Corresponding setting in raspi-config (6 Advanced OptionsA2 GL Driver) Does hello_fft.bin (section section ) work?
dtoverlay=… commentedG1 LegacyYes
dtoverlay=dtoverlay=vc4-fkms-v3dG2 GL (Fake KMS)Yes
dtoverlay=dtoverlay=vc4-kms-v3dG3 GL (Full KMS)No

Run the FFT example

Sources:

The Raspberry Pi OS version based on “Buster” contains a folder /opt/vc. If it is missing (or - more interesting - you want to use the latest available version), you can create it manually by doing the following steps:

Best copy /opt/vc/src/hello_pi/hello_fft to your home folder:

cp -r /opt/vc/src/hello_pi/hello_fft ~

Now run

cd ~/hello_fft
make
sudo ./hello_fft.bin 8

(the command sudo mknod char_dev c 100 0 that is mentioned in the above linked blog article is not necessary anymore).

Explanation of the FFT example

An explanation of the FFT example by the original author Andrew Holme with lots of additional information and links can be found at GPU_FFT [visited 2022-11-02T09:52:58Z].

QPU assembler code

Mailbox functionality

TODO

Instruction set

Registers
Instructions

An instruction consists of 64 bit (8 byte) in little-endian format.

VideoCore VI

Instruction set

6 26
type ...
32
...

The type field determines the type of instruction (branch instruction or ALU instruction):

type Instruction type
= 000000branch instruction (b)
≠ 000000ALU instruction
Table : Encoding of the type field
Branch instructions
6 2 21 3
000000 10 addr_low cond
8 1 2 3 3 1 2 6 6
addr_high - msfign - bdu ub bdi raddr_a -

For the value of the cond field:

cond Suffix Explanation
000-unconditional branch
010.a0TODO
011.na0TODO
100.allaTODO
101.anynaTODO
110.anyaTODO
111.allnaTODO
Table : Encoding of the cond field

The disassembler in libbroadcom_qpu.a decodes 001 in the cond field identical to 000 (unconditional branch).

For the msfign field:

msfign Suffix Explanation (qpu_instr.h)
00-Ignore multisample flags when determining branch condition.
01p (pixel)If no multisample flags are set in the lane (a pixel in the FS, a vertex in the VS), ignore the lane's condition when computing the branch condition.
10q (quad)If no multisample flags are set in a 2x2 quad in the FS, ignore the quad's a/b conditions.
11invalid
Table : Encoding of the msfign field

The bdi field encodes the type of destination:

bdi Type Name Parameter
00absolutezero_addr+…offset (from addr)
01relativeoffset (from addr)
10link registerlri-
11register filerf…raddr_a
Table : Encoding of the bdi field

If ub = 1 (bit 14), the bdu field has the following encoding:

bdu Type Name Parameter
00absolutea:unif-
01relativer:unif-
10link registerlri-
11register filerf…raddr_a
Table : Encoding of the bdu field
ALU instructions
6 5 21
type sig ...
20 6 6
... raddr_a raddr_b

TODO

FF

GPGPU

Idein Inc. (GitHub page: Idein Inc. · GitHub [visited 2022-07-22T19:46:58Z]) did a lot of work for GPGPU on the VideoCore VI.

We consider the repository py-videocore6: GitHub - Idein/py-videocore6: Python library for GPGPU programming on Raspberry Pi 4 [visited 2022-07-22T19:52:46Z].

TODO

Various material

Vulkan on the Raspberry Pi:

VideoCore VI: