By Wolfgang Keller
Draft
Originally written 2021-11-01
Last modified 2024-07-05
There exist three versions of the VideoCore GPU whose instruction sets differ quite a bit:
A more detailled list can be found at Raspberry Pi Documentation - Processors [visited 2022-06-18T17:43:32Z].
Here are some ressources from the internet on the instruction sets of the VideoCore versions:
VideoCore IV:
src/gallium/drivers/vc4 · main · Mesa / mesa · GitLab [visited 2022-08-05T17:41:09Z]: source code of the vc4 driver that is built as libvc4.a. This library contains a disassembler for VideoCore IV instructions:
Also of interest is:
vc4_qpu_validate
to validate a GPU program. For this function not to generate into a trivial return
,
the DEBUG
macro has to be defined when building libvc4.a.
VideoCore VI:
Another library of Mesa that we use is libmesa_util.a. Its source code can be found at src/util · main · Mesa / mesa · GitLab [visited 2022-08-29T21:59:22Z].
Links to test cases for the instruction sets are documented further below in section .
A disassembler for machine instructions of the VideoCore IV and VideoCore VI QPUs can be found at nubok/vcqpudisasm [visited 2022-08-28T10:50:40Z]. This repository was inspired by Terminus-IMRC/vc6qpudisas: Disassembler of VideoCore VI QPU [visited 2022-08-28T12:29:55Z].
Let's go through its install instructions to set it up. The following steps were tested on Raspberry Pi OS (32 bit; based on Debian Bullseye) and Ubuntu 20.04 (WSL2; x86-64).
First setup Mesa. We need to install various packages that are required for building Mesa: For Ubuntu 24.04 LTS (for example under WSL2) these are
sudo apt-get install meson sudo apt-get install python3-mako sudo apt-get install libdrm-dev sudo apt-get install flex sudo apt-get install bison sudo apt-get install libx11-dev sudo apt-get install libxext-dev sudo apt-get install libxfixes-dev sudo apt-get install libxcb-glx0-dev sudo apt-get install libxcb-shm0-dev sudo apt-get install libx11-xcb-dev sudo apt-get install libxcb-dri2-0-dev sudo apt-get install libxcb-dri3-dev sudo apt-get install libxcb-present-dev sudo apt-get install libxshmfence-dev sudo apt-get install libxxf86vm-dev sudo apt-get install libxrandr-dev sudo apt-get install pkg-config
or in short
sudo apt-get install meson python3-mako libdrm-dev flex bison libx11-dev libxext-dev libxfixes-dev libxcb-glx0-dev libxcb-shm0-dev libx11-xcb-dev libxcb-dri2-0-dev libxcb-dri3-dev libxcb-present-dev libxshmfence-dev libxxf86vm-dev libxrandr-dev pkg-config
Optionally (but recommended) you want to additionally install zlib1g-dev and cmake (the latter building the disassembler further below in this text):
sudo apt install zlib1g-dev cmake
Now run
git clone https://gitlab.freedesktop.org/mesa/mesa.git --depth=1 cd mesa/ git fetch --all --tags
Checkout the correct tag: git checkout tags/mesa-24.1.3 -b mesa24_1_3
Remark: In the line
git checkout tags/mesa-24.1.3 -b mesa24_1_3
replace 24.1.3 by a suitable other (more recent) version if necessary (similarly for your chosen branch name).
mkdir build cd build meson setupcd .. .. -Dgallium-drivers=vc4,v3d -Dvulkan-drivers=broadcom -Dplatforms=x11 ninja src/broadcom/qpu/libbroadcom_qpu.a src/util/libmesa_util.a src/gallium/drivers/vc4/libvc4.a cd ../..
Run
git clone https://github.com/nubok/vcqpudisasm.git cd vcqpudisasm mkdir build cd build cmake .. -DCMAKE_PREFIX_PATH="$(realpath ../../mesa);$(realpath ../../mesa/build)" make
Now for running some examples:
VideoCore IV: Run
./vc4qpudisasm <<< '0x10025020cc9e7081' add rb0, r0, r2 ; v8adds r0, r0, r1
Other examples can be found at userland/host_applications/linux/apps/hello_pi/hello_fft/hex at master · raspberrypi/userland [visited 2022-08-28T12:51:09Z].
VideoCore VI:
There seem to exist at least four versions of V3D: 3.3, 4.0, 4.1 and 4.2 (corresponding to values 33
, 40
, 41
and 42
of the ver
field of struct v3d_device_info
). See
struct v3d_device_info
data structure,v3d_qpu_sig_unpack
the versions 3.3, 4.0 and 4.1 (and 7.1) are referenced.
The GPU of the respective Raspberry Pi versions seems to use version 4.2 (value 42
).
With this knowledge, you can run the example:
./vc6qpudisasm 42 <<< '0x54001f4038f91fbf' add r0, r1, r2 ; fmul rf61, rf62, rf63
Other examples can, as mentioned above, be found at src/broadcom/qpu/tests/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2022-07-24T14:17:12Z].
VideoCore VII:
The GPU of the respective Raspberry Pi versions seems to use version 7.1 (value 71
).
See
src/broadcom/qpu/qpu_pack.c · main · Mesa / mesa · GitLab [visited 2024-07-04T19:28:41Z].
There in the definition of the function v3d_qpu_sig_unpack
the version 7.1 (and 3.3, 4.0) is referenced.
An instruction consists of 64 bit (8 byte) in little-endian format.
4 | 28 |
---|---|
sig | ... |
32 |
---|
... |
The sig field determines the type of instruction (branch instruction, load immediate instruction or ALU instruction):
type | Instruction type |
---|---|
= 1111 | branch instruction (b ) |
= 1110 | load immediate instruction |
≠ 1110,1111 | ALU instruction |
TODO
TODO
TODO
TODO
An instruction consists of 64 bit (8 byte) in little-endian format.
6 | 26 |
---|---|
op_mul | ... |
32 |
---|
... |
The op_mul field determines the type of instruction (branch instruction or ALU instruction):
op_mul | Instruction type |
---|---|
= 000000 | branch instruction (b ) |
≠ 000000 | ALU instruction |
6 | 2 | 21 | 3 |
---|---|---|---|
000000 | 10 | addr_low | cond |
8 | 1 | 2 | 3 | 3 | 1 | 2 | 6 | 6 |
---|---|---|---|---|---|---|---|---|
addr_high | - | msfign | - | bdu | ub | bdi | raddr_a | - |
For the value of the cond field:
cond | Suffix | Explanation |
---|---|---|
000 | - | unconditional branch |
010 | .a0 | above 0 (?) |
011 | .na0 | not above 0 (?) |
100 | .alla | all above (?) |
101 | .anyna | any not above (?) |
110 | .anya | any above (?) |
111 | .allna | all not above (?) |
The disassembler in libbroadcom_qpu.a decodes 001 in the cond field identical to 000 (unconditional branch).
For the msfign field:
msfign | Suffix | Explanation (qpu_instr.h) |
---|---|---|
00 | - | Ignore multisample flags when determining branch condition. |
01 | p (pixel) | If no multisample flags are set in the lane (a pixel in the FS, a vertex in the VS), ignore the lane's condition when computing the branch condition. |
10 | q (quad) | If no multisample flags are set in a 2x2 quad in the FS, ignore the quad's a/b conditions. |
11 | invalid |
The bdi field encodes the type of destination:
bdi | Type | Name | Parameter |
---|---|---|---|
00 | absolute | zero_addr+… | offset (from addr ) |
01 | relative | … | offset (from addr ) |
10 | link register | lri | - |
11 | register file | rf… | raddr_a |
If ub = 1
(bit 14), the bdu field has the following encoding:
bdu | Type | Name | Parameter |
---|---|---|---|
00 | absolute | a:unif | - |
01 | relative | r:unif | - |
10 | link register | lri | - |
11 | register file | rf… | raddr_a |
6 | 5 | 7 | 1 | 1 | 6 | 6 |
---|---|---|---|---|---|---|
op_mul | sig | cond | mm | ma | waddr_m | waddr_a |
8 | 3 | 3 | 3 | 3 | 6 | 6 |
---|---|---|---|---|---|---|
op_add | mul_b | mul_a | add_b | add_a | raddr_a | raddr_b |
TODO
TODO
For the sig field (signaling bits) see the declarations
v33_sig_map
v40_sig_map
v41_sig_map
in src/broadcom/qpu/qpu_pack.c · main · Mesa / mesa · GitLab [visited 2024-07-04T19:28:41Z].
The disassembler code can be found at src/broadcom/qpu/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2023-07-20T20:07:53Z]
(v3d_qpu_disasm_sig
, which calls v3d_qpu_disasm_sig_addr
if necessary).
Version 3.3:
sig | MISC | R3 | R4 | R5 | Code | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
thrsw | small_imm (smimm) | ucb | rotate (rot) | ldvary | ldvpm | ldtlb | ldtlbu | ldtmu | ldunif | ||
00000 |
|
||||||||||
00001 | ; thrsw |
||||||||||
00010 | ; ldunif |
||||||||||
00011 | ; thrsw; ldunif |
||||||||||
00100 | ; ldtmu |
||||||||||
00101 | ; thrsw; ldtmu |
||||||||||
00110 | ; ldtmu; ldunif |
||||||||||
00111 | ; thrsw; ldtmu; ldunif |
||||||||||
01000 | ; ldvary |
||||||||||
01001 | ; thrsw; ldvary |
||||||||||
01010 | ; ldvary; ldunif |
||||||||||
01011 | ; thrsw; ldvary; ldunif |
||||||||||
01100 | ; ldvary; ldtmu |
||||||||||
01101 | ; thrsw; ldvary; ldtmu |
||||||||||
01110 | TODO | ||||||||||
01111 | TODO | ||||||||||
10000 | ; ldtlb |
||||||||||
10001 | ; ldtlbu |
||||||||||
10010-10101 | reserved | ||||||||||
10110 | TODO | ||||||||||
10111 | TODO | ||||||||||
11000 | ; ldvpm |
||||||||||
11001 | ; thrsw; ldvpm |
||||||||||
11010 | ; ldvpm; ldunif |
||||||||||
11011 | ; thrsw; ldvpm; ldunif |
||||||||||
11100 | ; ldvpm; ldtmu |
||||||||||
11101 | ; thrsw; ldvpm; ldtmu |
||||||||||
11110 | TODO | ||||||||||
11111 | TODO |
Version 4.0:
sig | MISC | R3 | R4 | R5 | Code | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
thrsw | small_imm (smimm) | ucb | rotate (rot) | ldvary | ldtlb | ldtlbu | ldtmu | ldunif | wrtmuc | ||
00000 |
|
||||||||||
00001 | ; thrsw |
||||||||||
00010 | ; ldunif |
||||||||||
00011 | ; thrsw; ldunif |
||||||||||
00100 | ; ldtmu |
||||||||||
00101 | ; thrsw; ldtmu |
||||||||||
00110 | ; ldtmu; ldunif |
||||||||||
00111 | ; thrsw; ldtmu; ldunif |
||||||||||
01000 | ; ldvary |
||||||||||
01001 | ; thrsw; ldvary |
||||||||||
01010 | ; ldvary; ldunif |
||||||||||
01011 | ; thrsw; ldvary; ldunif |
||||||||||
01100-01101 | reserved | ||||||||||
01110 | TODO | ||||||||||
01111 | TODO | ||||||||||
10000 | ; ldtlb |
||||||||||
10001 | ; ldtlbu |
||||||||||
10010 | ; wrtmuc |
||||||||||
10011 | ; thrsw; wrtmuc |
||||||||||
10100 | ; ldvary; wrtmuc |
||||||||||
10101 | ; thrsw; ldvary; wrtmuc |
||||||||||
10110 | TODO | ||||||||||
10111 | TODO | ||||||||||
11000-11110 | reserved | ||||||||||
11111 | TODO |
Version ≥4.1: From version 4.1 on, the disassembly of some these signals uses another value, the sig_addr; their cells are highlighted with a █-colored background.
sig | MISC | ldvary | ldtlb | ldtlbu | ldtmu | ldunif | ldunifrf | ldunifa | ldunifarf | wrtmuc | Code | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
thrsw | small_imm (smimm) | ucb | rotate (rot) | |||||||||||
00000 |
|
|||||||||||||
00001 | ; thrsw |
|||||||||||||
00010 | ; ldunif |
|||||||||||||
00011 | ; thrsw; ldunif |
|||||||||||||
00100 | ; ldtmu |
|||||||||||||
00101 | ; thrsw; ldtmu |
|||||||||||||
00110 | ; ldtmu; ldunif |
|||||||||||||
00111 | ; thrsw; ldtmu; ldunif |
|||||||||||||
01000 | ; ldvary |
|||||||||||||
01001 | ; thrsw; ldvary |
|||||||||||||
01010 | ; ldvary; ldunif |
|||||||||||||
01011 | ; thrsw; ldvary; ldunif |
|||||||||||||
01100 | ; ldunifrf |
|||||||||||||
01101 | ; thrsw; ldunifrf |
|||||||||||||
01110 | TODO | |||||||||||||
01111 | TODO | |||||||||||||
10000 | ; ldtlb |
|||||||||||||
10001 | ; ldtlbu |
|||||||||||||
10010 | ; wrtmuc |
|||||||||||||
10011 | ; thrsw; wrtmuc |
|||||||||||||
10100 | ; ldvary; wrtmuc |
|||||||||||||
10101 | ; thrsw; ldvary; wrtmuc |
|||||||||||||
10110 | TODO | |||||||||||||
10111 | TODO | |||||||||||||
11000 | ; ldunifa |
|||||||||||||
11001 | ; ldunifarf |
|||||||||||||
11010-11110 | reserved | |||||||||||||
11111 | TODO |
TODO
TODO
V3DLib: C++ library for programming the VideoCore GPU on all Raspberry Pi's. Builds upon QPULib; wimrijnders/V3DLib: C++ library for programming the VideoCore GPU on all Raspberry Pi's. [visited 2023-09-29T11:16:00Z]
See also py-videocore6.
py-videocore: Idein/py-videocore: Python library for GPGPU on Raspberry Pi [visited 2024-01-06T20:55:59Z].
QPULib: Language and compiler for the Raspberry Pi GPU; mn416/QPULib: Language and compiler for the Raspberry Pi GPU [visited 2023-09-29T11:15:34Z]
See also py-videocore.
Idein Inc. (GitHub page: Idein Inc. · GitHub [visited 2022-07-22T19:46:58Z]) did a lot of work for GPGPU on the VideoCore VI.
We consider the repository py-videocore6: GitHub - Idein/py-videocore6: Python library for GPGPU programming on Raspberry Pi 4 [visited 2022-07-22T19:52:46Z].
Setup an SD card with the Raspberry Pi OS (formerly called Raspbian) based on Debian 10 (“buster”). The newer versions of Raspberry Pi OS that are based on Debian 11 (“bullseye”) don't seem to work because they don't support the fkms driver. See BULLSEYE: "dtoverlay=vc4-fkms-v3d" OR "dtoverlay=vc4-kms-v3d" That is one of many Questions - Raspberry Pi Forums [visited 2022-06-21T18:31:19Z] for details. See also "New" old functionality with Raspberry Pi OS (Legacy) - Raspberry Pi [visited 2022-06-21T18:31:31Z].
Then do the following steps:
dtoverlay=vc4-fkms-v3d
(not
to be confused with dtoverlay=vc4-kms-v3d
), for example via sudo pico /boot/config.txt.dtoverlay
is set to
a different value) or run sudo raspi-config → 6 Advanced Options → A2 GL Driver → G2 GL (Fake KMS).Just for the sake of completeness, here is a table with the various possible settings and whether the hello_fft.bin demo that we run in section section does run with this setting set:
Setting in /boot/config.txt | Corresponding setting in raspi-config (6 Advanced Options → A2 GL Driver) | Does hello_fft.bin (section section ) work? |
---|---|---|
dtoverlay=… commented | G1 Legacy | Yes |
dtoverlay=dtoverlay=vc4-fkms-v3d | G2 GL (Fake KMS) | Yes |
dtoverlay=dtoverlay=vc4-kms-v3d | G3 GL (Full KMS) | No |
Sources:
The Raspberry Pi OS version based on “Buster” contains a folder /opt/vc. If it is missing (or - more interesting - you want to use the latest available version), you can create it manually by doing the following steps:
Best copy /opt/vc/src/hello_pi/hello_fft to your home folder:
cp -r /opt/vc/src/hello_pi/hello_fft ~
Now run
cd ~/hello_fft make sudo ./hello_fft.bin 8
(the command sudo mknod char_dev c 100 0 that is mentioned in the above linked blog article is not necessary anymore).
An explanation of the FFT example by the original author Andrew Holme with lots of additional information and links can be found at GPU_FFT [visited 2022-11-02T09:52:58Z].
Hacking The GPU For Fun And Profit [visited 2021-10-31T22:40:09Z] (SHA-256)
Parts:
Vulkan on the Raspberry Pi:
VideoCore VI: