GPU Programming: Raspberry Pi (VideoCore IV, VideoCore VI, VideoCore VII)

By Wolfgang Keller
Draft
Originally written 2021-11-01
Last modified 2024-03-06

Table of contents

VideoCore versions

There exist three versions of the VideoCore GPU whose instruction sets differ quite a bit:

A more detailled list can be found at Raspberry Pi Documentation - Processors [visited 2022-06-18T17:43:32Z].

Ressources

Here are some ressources from the internet on the instruction sets of the VideoCore versions:

VideoCore IV:

VideoCore VI:

Another library of Mesa that we use is libmesa_util.a. Its source code can be found at src/util · main · Mesa / mesa · GitLab [visited 2022-08-29T21:59:22Z].

Links to test cases for the instruction sets are documented further below in section .

Disassemblers

Build disassemblers

A disassembler for machine instructions of the VideoCore IV and VideoCore VI QPUs can be found at nubok/vcqpudisasm [visited 2022-08-28T10:50:40Z]. This repository was inspired by Terminus-IMRC/vc6qpudisas: Disassembler of VideoCore VI QPU [visited 2022-08-28T12:29:55Z].

Let's go through its install instructions to set it up. The following steps were tested on Raspberry Pi OS (32 bit; based on Debian Bullseye) and Ubuntu 20.04 (WSL2; x86-64).

Build Mesa

First setup Mesa. We need to install various packages that are required for building Mesa:

sudo apt-get install meson
sudo apt-get install python3-mako
sudo apt-get install libdrm-dev
sudo apt-get install flex
sudo apt-get install bison
sudo apt-get install libx11-dev
sudo apt-get install libxext-dev
sudo apt-get install libxfixes-dev
sudo apt-get install libxcb-glx0-dev
sudo apt-get install libxcb-shm0-dev
sudo apt-get install libx11-xcb-dev
sudo apt-get install libxcb-dri2-0-dev
sudo apt-get install libxcb-dri3-dev
sudo apt-get install libxcb-present-dev
sudo apt-get install libxshmfence-dev
sudo apt-get install libxxf86vm-dev
sudo apt-get install libxrandr-dev

If you do these steps in Ubuntu 20.04 or Ubuntu 22.04 LTS (for example under WSL2), you need to add

sudo apt-get install pkg-config

Now run

git clone https://gitlab.freedesktop.org/mesa/mesa.git --depth=1
cd mesa/
git fetch --all --tags

Remark: In the line

git checkout tags/mesa-21.3.9 -b mesa21_3_9

or

git checkout tags/mesa-23.1.3 -b mesa23_1_3

replace 21.3.9 or 23.1.3 by a suitable other (more recent) version if necessary (similarly for your chosen branch name).

mkdir build
cd build
meson .. -Dgallium-drivers=vc4,v3d -Dvulkan-drivers=broadcom -Dplatforms=x11
ninja src/broadcom/qpu/libbroadcom_qpu.a src/util/libmesa_util.a src/gallium/drivers/vc4/libvc4.a
cd ../../

Build vcqpudisasm

If necessary, install cmake via

sudo apt install cmake

Start with

git clone https://github.com/nubok/vcqpudisasm.git
cd vcqpudisasm
mkdir build
cd build
cmake .. -DCMAKE_PREFIX_PATH="$(realpath ../../mesa);$(realpath ../../mesa/build)"
make

Run examples

Now for running some examples:

VideoCore IV: Run

./vc4qpudisasm <<< '0x10025020cc9e7081'
add rb0, r0, r2 ; v8adds r0, r0, r1

Other examples can be found at userland/host_applications/linux/apps/hello_pi/hello_fft/hex at master · raspberrypi/userland [visited 2022-08-28T12:51:09Z] (also available at firmware/opt/vc/src/hello_pi/hello_fft/hex at master · raspberrypi/firmware [visited 2022-08-28T12:52:14Z]).

VideoCore VI: There seem to exist at least four versions of V3D: 3.3, 4.0, 4.1 and 4.2 (corresponding to values 33, 40, 41 and 42 of the ver field of struct v3d_device_info). See

The GPU of the respective Raspberry Pi versions seems to use version 4.2 (value 42).

With this knowledge, you can run the example:

./vc6qpudisasm 42 <<< '0x54001f4038f91fbf'
add  r0, r1, r2      ; fmul  rf61, rf62, rf63

Other examples can, as mentioned above, be found at src/broadcom/qpu/tests/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2022-07-24T14:17:12Z].

Instruction set of VideoCore IV

Registers

Instructions

An instruction consists of 64 bit (8 byte) in little-endian format.

4 28
sig ...
32
...

The sig field determines the type of instruction (branch instruction, load immediate instruction or ALU instruction):

type Instruction type
= 1111branch instruction (b)
= 1110load immediate instruction
≠ 1110,1111ALU instruction
Table : Encoding of the type field

Branch instructions

TODO

Load immediate instructions

TODO

ALU instructions

TODO

Instruction set of VideoCore VI

Registers

TODO

Instructions

An instruction consists of 64 bit (8 byte) in little-endian format.

6 26
op_mul ...
32
...

The op_mul field determines the type of instruction (branch instruction or ALU instruction):

op_mul Instruction type
= 000000branch instruction (b)
≠ 000000ALU instruction
Table : Encoding of the op_mul field

Branch instructions

6 2 21 3
000000 10 addr_low cond
8 1 2 3 3 1 2 6 6
addr_high - msfign - bdu ub bdi raddr_a -

For the value of the cond field:

cond Suffix Explanation
000-unconditional branch
010.a0above 0 (?)
011.na0not above 0 (?)
100.allaall above (?)
101.anynaany not above (?)
110.anyaany above (?)
111.allnaall not above (?)
Table : Encoding of the cond field

The disassembler in libbroadcom_qpu.a decodes 001 in the cond field identical to 000 (unconditional branch).

msfign field

For the msfign field:

msfign Suffix Explanation (qpu_instr.h)
00-Ignore multisample flags when determining branch condition.
01p (pixel)If no multisample flags are set in the lane (a pixel in the FS, a vertex in the VS), ignore the lane's condition when computing the branch condition.
10q (quad)If no multisample flags are set in a 2x2 quad in the FS, ignore the quad's a/b conditions.
11invalid
Table : Encoding of the msfign field
bdi field

The bdi field encodes the type of destination:

bdi Type Name Parameter
00absolutezero_addr+…offset (from addr)
01relativeoffset (from addr)
10link registerlri-
11register filerf…raddr_a
Table : Encoding of the bdi field
bdu field

If ub = 1 (bit 14), the bdu field has the following encoding:

bdu Type Name Parameter
00absolutea:unif-
01relativer:unif-
10link registerlri-
11register filerf…raddr_a
Table : Encoding of the bdu field

ALU instructions

6 5 7 1 1 6 6
op_mul sig cond mm ma waddr_m waddr_a
8 3 3 3 3 6 6
op_add mul_b mul_a add_b add_a raddr_a raddr_b
op_mul field

TODO

op_add field

TODO

sig field

For the sig field (signaling bits) see the declarations

in src/broadcom/qpu/qpu_pack.c · main · Mesa / mesa · GitLab [visited 2022-12-30T23:29:25Z]. The disassembler code can be found at src/broadcom/qpu/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2023-07-20T20:07:53Z] (v3d_qpu_disasm_sig, which calls v3d_qpu_disasm_sig_addr if necessary).

Version 3.3:

sig MISC R3 R4 R5 Code
thrsw small_imm (smimm) ucb rotate (rot) ldvary ldvpm ldtlb ldtlbu ldtmu ldunif
00000
00001 ; thrsw
00010 ; ldunif
00011 ; thrsw; ldunif
00100 ; ldtmu
00101 ; thrsw; ldtmu
00110 ; ldtmu; ldunif
00111 ; thrsw; ldtmu; ldunif
01000 ; ldvary
01001 ; thrsw; ldvary
01010 ; ldvary; ldunif
01011 ; thrsw; ldvary; ldunif
01100 ; ldvary; ldtmu
01101 ; thrsw; ldvary; ldtmu
01110 TODO
01111 TODO
10000 ; ldtlb
10001 ; ldtlbu
10010-10101 reserved
10110 TODO
10111 TODO
11000 ; ldvpm
11001 ; thrsw; ldvpm
11010 ; ldvpm; ldunif
11011 ; thrsw; ldvpm; ldunif
11100 ; ldvpm; ldtmu
11101 ; thrsw; ldvpm; ldtmu
11110 TODO
11111 TODO
Table : Encoding of the sig field

Version 4.0:

sig MISC R3 R4 R5 Code
thrsw small_imm (smimm) ucb rotate (rot) ldvary ldtlb ldtlbu ldtmu ldunif wrtmuc
00000
00001 ; thrsw
00010 ; ldunif
00011 ; thrsw; ldunif
00100 ; ldtmu
00101 ; thrsw; ldtmu
00110 ; ldtmu; ldunif
00111 ; thrsw; ldtmu; ldunif
01000 ; ldvary
01001 ; thrsw; ldvary
01010 ; ldvary; ldunif
01011 ; thrsw; ldvary; ldunif
01100-01101 reserved
01110 TODO
01111 TODO
10000 ; ldtlb
10001 ; ldtlbu
10010 ; wrtmuc
10011 ; thrsw; wrtmuc
10100 ; ldvary; wrtmuc
10101 ; thrsw; ldvary; wrtmuc
10110 TODO
10111 TODO
11000-11110 reserved
11111 TODO
Table : Encoding of the sig field

Version ≥4.1: From version 4.1 on, the disassembly of some these signals uses another value, the sig_addr; their cells are highlighted with a -colored background.

sig MISC ldvary ldtlb ldtlbu ldtmu ldunif ldunifrf ldunifa ldunifarf wrtmuc Code
thrsw small_imm (smimm) ucb rotate (rot)
00000
00001 ; thrsw
00010 ; ldunif
00011 ; thrsw; ldunif
00100 ; ldtmu
00101 ; thrsw; ldtmu
00110 ; ldtmu; ldunif
00111 ; thrsw; ldtmu; ldunif
01000 ; ldvary
01001 ; thrsw; ldvary
01010 ; ldvary; ldunif
01011 ; thrsw; ldvary; ldunif
01100 ; ldunifrf
01101 ; thrsw; ldunifrf
01110 TODO
01111 TODO
10000 ; ldtlb
10001 ; ldtlbu
10010 ; wrtmuc
10011 ; thrsw; wrtmuc
10100 ; ldvary; wrtmuc
10101 ; thrsw; ldvary; wrtmuc
10110 TODO
10111 TODO
11000 ; ldunifa
11001 ; ldunifarf
11010-11110 reserved
11111 TODO
Table : Encoding of the sig field

Connecting to application code

VideoCore IV

Mailbox functionality

TODO

VideoCore VI

TODO

Libraries

VideoCore general

V3DLib

V3DLib: C++ library for programming the VideoCore GPU on all Raspberry Pi's. Builds upon QPULib; wimrijnders/V3DLib: C++ library for programming the VideoCore GPU on all Raspberry Pi's. [visited 2023-09-29T11:16:00Z]

VideoCore IV

py-videocore

See also py-videocore6.

py-videocore: Idein/py-videocore: Python library for GPGPU on Raspberry Pi [visited 2024-01-06T20:55:59Z].

QPULib

QPULib: Language and compiler for the Raspberry Pi GPU; mn416/QPULib: Language and compiler for the Raspberry Pi GPU [visited 2023-09-29T11:15:34Z]

VideoCore VI

py-videocore6

See also py-videocore.

Idein Inc. (GitHub page: Idein Inc. · GitHub [visited 2022-07-22T19:46:58Z]) did a lot of work for GPGPU on the VideoCore VI.

We consider the repository py-videocore6: GitHub - Idein/py-videocore6: Python library for GPGPU programming on Raspberry Pi 4 [visited 2022-07-22T19:52:46Z].

GPGPU examples

VideoCore IV

FFT example

Setup

Setup an SD card with the Raspberry Pi OS (formerly called Raspbian) based on Debian 10 (“buster”). The newer versions of Raspberry Pi OS that are based on Debian 11 (“bullseye”) don't seem to work because they don't support the fkms driver. See BULLSEYE: "dtoverlay=vc4-fkms-v3d" OR "dtoverlay=vc4-kms-v3d" That is one of many Questions - Raspberry Pi Forums [visited 2022-06-21T18:31:19Z] for details. See also "New" old functionality with Raspberry Pi OS (Legacy) - Raspberry Pi [visited 2022-06-21T18:31:31Z].

Then do the following steps:

Just for the sake of completeness, here is a table with the various possible settings and whether the hello_fft.bin demo that we run in section section does run with this setting set:

Setting in /boot/config.txt Corresponding setting in raspi-config (6 Advanced OptionsA2 GL Driver) Does hello_fft.bin (section section ) work?
dtoverlay=… commentedG1 LegacyYes
dtoverlay=dtoverlay=vc4-fkms-v3dG2 GL (Fake KMS)Yes
dtoverlay=dtoverlay=vc4-kms-v3dG3 GL (Full KMS)No
Run the FFT example

Sources:

The Raspberry Pi OS version based on “Buster” contains a folder /opt/vc. If it is missing (or - more interesting - you want to use the latest available version), you can create it manually by doing the following steps:

Best copy /opt/vc/src/hello_pi/hello_fft to your home folder:

cp -r /opt/vc/src/hello_pi/hello_fft ~

Now run

cd ~/hello_fft
make
sudo ./hello_fft.bin 8

(the command sudo mknod char_dev c 100 0 that is mentioned in the above linked blog article is not necessary anymore).

Explanation of the FFT example

An explanation of the FFT example by the original author Andrew Holme with lots of additional information and links can be found at GPU_FFT [visited 2022-11-02T09:52:58Z].

Various material

VideoCore IV

QPU assembler code

VC4ASM/vcio2

OpenCL

VideoCore VI

Vulkan on the Raspberry Pi:

VideoCore VI: