By Wolfgang Keller
Draft
Originally written 2021-11-01
Last modified 2023-09-30
There exist two versions of the VideoCore GPU whose instruction sets differs quite a lot:
A more detailled list can be found at Raspberry Pi Documentation - Processors [visited 2022-06-18T17:43:32Z].
For building the disassembler and analyzing the instruction set, we use the following ressources:
VideoCore IV:
src/gallium/drivers/vc4 · main · Mesa / mesa · GitLab [visited 2022-08-05T17:41:09Z]: source code of the vc4 driver that is built as libvc4.a. This library contains a disassembler for VideoCore IV instructions:
Also of interest is:
vc4_qpu_validate
to validate a GPU program. For this function not to generate into a trivial return
,
the DEBUG
macro has to be defined when building libvc4.a.
VideoCore VI:
Another library of Mesa that we use is libmesa_util.a. Its source code can be found at src/util · main · Mesa / mesa · GitLab [visited 2022-08-29T21:59:22Z].
Links to test cases for the instruction sets are documented further below in section .
A disassembler for machine instructions of the VideoCore IV and VideoCore VI QPUs can be found at nubok/vcqpudisasm [visited 2022-08-28T10:50:40Z]. This repository was inspired by Terminus-IMRC/vc6qpudisas: Disassembler of VideoCore VI QPU [visited 2022-08-28T12:29:55Z].
Let's go through its install instructions to set it up. The following steps were tested on Raspberry Pi OS (32 bit; based on Debian Bullseye) and Ubuntu 20.04 (WSL2; x86-64).
First setup Mesa. We need to install various packages that are required for building Mesa:
sudo apt-get install meson sudo apt-get install python3-mako sudo apt-get install libdrm-dev sudo apt-get install flex sudo apt-get install bison sudo apt-get install libx11-dev sudo apt-get install libxext-dev sudo apt-get install libxfixes-dev sudo apt-get install libxcb-glx0-dev sudo apt-get install libxcb-shm0-dev sudo apt-get install libx11-xcb-dev sudo apt-get install libxcb-dri2-0-dev sudo apt-get install libxcb-dri3-dev sudo apt-get install libxcb-present-dev sudo apt-get install libxshmfence-dev sudo apt-get install libxxf86vm-dev sudo apt-get install libxrandr-dev
If you do these steps in Ubuntu 20.04 or Ubuntu 22.04 LTS (for example under WSL2), you need to add
sudo apt-get install pkg-config
Now run
git clone https://gitlab.freedesktop.org/mesa/mesa.git --depth=1 cd mesa/ git fetch --all --tags
Remark: In the line
git checkout tags/mesa-21.3.9 -b mesa21_3_9
or
git checkout tags/mesa-23.1.3 -b mesa23_1_3
replace 21.3.9 or 23.1.3 by a suitable other (more recent) version if necessary (similarly for your chosen branch name).
mkdir build cd build meson .. -Dgallium-drivers=vc4,v3d -Dvulkan-drivers=broadcom -Dplatforms=x11 ninja src/broadcom/qpu/libbroadcom_qpu.a src/util/libmesa_util.a src/gallium/drivers/vc4/libvc4.a cd ../../
If necessary, install cmake via
sudo apt install cmake
Start with
git clone https://github.com/nubok/vcqpudisasm.git cd vcqpudisasm mkdir build cd build cmake .. -DCMAKE_PREFIX_PATH="$(realpath ../../mesa);$(realpath ../../mesa/build)" make
Now for running some examples:
VideoCore IV: Run
./vc4qpudisasm <<< '0x10025020cc9e7081' add rb0, r0, r2 ; v8adds r0, r0, r1
Other examples can be found at userland/host_applications/linux/apps/hello_pi/hello_fft/hex at master · raspberrypi/userland [visited 2022-08-28T12:51:09Z] (also available at firmware/opt/vc/src/hello_pi/hello_fft/hex at master · raspberrypi/firmware [visited 2022-08-28T12:52:14Z]).
VideoCore VI:
There seem to exist at least four versions of V3D: 3.3, 4.0, 4.1 and 4.2 (corresponding to values 33
, 40
, 41
and 42
of the ver
field of struct v3d_device_info
). See
struct v3d_device_info
data structure,v3d_qpu_sig_unpack
the versions 3.3, 4.0 and 4.1 are referenced.
The GPU of the respective Raspberry Pi versions seems to use version 4.2 (value 42
).
With this knowledge, you can run the example:
./vc6qpudisasm 42 <<< '0x54001f4038f91fbf' add r0, r1, r2 ; fmul rf61, rf62, rf63
Other examples can, as mentioned above, be found at src/broadcom/qpu/tests/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2022-07-24T14:17:12Z].
An instruction consists of 64 bit (8 byte) in little-endian format.
4 | 28 |
---|---|
sig | ... |
32 |
---|
... |
The sig field determines the type of instruction (branch instruction, load immediate instruction or ALU instruction):
type | Instruction type |
---|---|
= 1111 | branch instruction (b ) |
= 1110 | load immediate instruction |
≠ 1110,1111 | ALU instruction |
TODO
TODO
TODO
TODO
An instruction consists of 64 bit (8 byte) in little-endian format.
6 | 26 |
---|---|
op_mul | ... |
32 |
---|
... |
The op_mul field determines the type of instruction (branch instruction or ALU instruction):
op_mul | Instruction type |
---|---|
= 000000 | branch instruction (b ) |
≠ 000000 | ALU instruction |
6 | 2 | 21 | 3 |
---|---|---|---|
000000 | 10 | addr_low | cond |
8 | 1 | 2 | 3 | 3 | 1 | 2 | 6 | 6 |
---|---|---|---|---|---|---|---|---|
addr_high | - | msfign | - | bdu | ub | bdi | raddr_a | - |
For the value of the cond field:
cond | Suffix | Explanation |
---|---|---|
000 | - | unconditional branch |
010 | .a0 | above 0 (?) |
011 | .na0 | not above 0 (?) |
100 | .alla | all above (?) |
101 | .anyna | any not above (?) |
110 | .anya | any above (?) |
111 | .allna | all not above (?) |
The disassembler in libbroadcom_qpu.a decodes 001 in the cond field identical to 000 (unconditional branch).
For the msfign field:
msfign | Suffix | Explanation (qpu_instr.h) |
---|---|---|
00 | - | Ignore multisample flags when determining branch condition. |
01 | p (pixel) | If no multisample flags are set in the lane (a pixel in the FS, a vertex in the VS), ignore the lane's condition when computing the branch condition. |
10 | q (quad) | If no multisample flags are set in a 2x2 quad in the FS, ignore the quad's a/b conditions. |
11 | invalid |
The bdi field encodes the type of destination:
bdi | Type | Name | Parameter |
---|---|---|---|
00 | absolute | zero_addr+… | offset (from addr ) |
01 | relative | … | offset (from addr ) |
10 | link register | lri | - |
11 | register file | rf… | raddr_a |
If ub = 1
(bit 14), the bdu field has the following encoding:
bdu | Type | Name | Parameter |
---|---|---|---|
00 | absolute | a:unif | - |
01 | relative | r:unif | - |
10 | link register | lri | - |
11 | register file | rf… | raddr_a |
6 | 5 | 7 | 1 | 1 | 6 | 6 |
---|---|---|---|---|---|---|
op_mul | sig | cond | mm | ma | waddr_m | waddr_a |
8 | 3 | 3 | 3 | 3 | 6 | 6 |
---|---|---|---|---|---|---|
op_add | mul_b | mul_a | add_b | add_a | raddr_a | raddr_b |
TODO
TODO
For the sig field (signaling bits) see the declarations
v33_sig_map
v40_sig_map
v41_sig_map
in src/broadcom/qpu/qpu_pack.c · main · Mesa / mesa · GitLab [visited 2022-12-30T23:29:25Z].
The disassembler code can be found at src/broadcom/qpu/qpu_disasm.c · main · Mesa / mesa · GitLab [visited 2023-07-20T20:07:53Z]
(v3d_qpu_disasm_sig
, which calls v3d_qpu_disasm_sig_addr
if necessary).
Version 3.3:
sig | MISC | R3 | R4 | R5 | Code | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
thrsw | small_imm (smimm) | ucb | rotate (rot) | ldvary | ldvpm | ldtlb | ldtlbu | ldtmu | ldunif | ||
00000 |
|
||||||||||
00001 | ; thrsw |
||||||||||
00010 | ; ldunif |
||||||||||
00011 | ; thrsw; ldunif |
||||||||||
00100 | ; ldtmu |
||||||||||
00101 | ; thrsw; ldtmu |
||||||||||
00110 | ; ldtmu; ldunif |
||||||||||
00111 | ; thrsw; ldtmu; ldunif |
||||||||||
01000 | ; ldvary |
||||||||||
01001 | ; thrsw; ldvary |
||||||||||
01010 | ; ldvary; ldunif |
||||||||||
01011 | ; thrsw; ldvary; ldunif |
||||||||||
01100 | ; ldvary; ldtmu |
||||||||||
01101 | ; thrsw; ldvary; ldtmu |
||||||||||
01110 | TODO | ||||||||||
01111 | TODO | ||||||||||
10000 | ; ldtlb |
||||||||||
10001 | ; ldtlbu |
||||||||||
10010-10101 | reserved | ||||||||||
10110 | TODO | ||||||||||
10111 | TODO | ||||||||||
11000 | ; ldvpm |
||||||||||
11001 | ; thrsw; ldvpm |
||||||||||
11010 | ; ldvpm; ldunif |
||||||||||
11011 | ; thrsw; ldvpm; ldunif |
||||||||||
11100 | ; ldvpm; ldtmu |
||||||||||
11101 | ; thrsw; ldvpm; ldtmu |
||||||||||
11110 | TODO | ||||||||||
11111 | TODO |
Version 4.0:
sig | MISC | R3 | R4 | R5 | Code | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
thrsw | small_imm (smimm) | ucb | rotate (rot) | ldvary | ldtlb | ldtlbu | ldtmu | ldunif | wrtmuc | ||
00000 |
|
||||||||||
00001 | ; thrsw |
||||||||||
00010 | ; ldunif |
||||||||||
00011 | ; thrsw; ldunif |
||||||||||
00100 | ; ldtmu |
||||||||||
00101 | ; thrsw; ldtmu |
||||||||||
00110 | ; ldtmu; ldunif |
||||||||||
00111 | ; thrsw; ldtmu; ldunif |
||||||||||
01000 | ; ldvary |
||||||||||
01001 | ; thrsw; ldvary |
||||||||||
01010 | ; ldvary; ldunif |
||||||||||
01011 | ; thrsw; ldvary; ldunif |
||||||||||
01100-01101 | reserved | ||||||||||
01110 | TODO | ||||||||||
01111 | TODO | ||||||||||
10000 | ; ldtlb |
||||||||||
10001 | ; ldtlbu |
||||||||||
10010 | ; wrtmuc |
||||||||||
10011 | ; thrsw; wrtmuc |
||||||||||
10100 | ; ldvary; wrtmuc |
||||||||||
10101 | ; thrsw; ldvary; wrtmuc |
||||||||||
10110 | TODO | ||||||||||
10111 | TODO | ||||||||||
11000-11110 | reserved | ||||||||||
11111 | TODO |
Version ≥4.1: From version 4.1 on, the disassembly of some these signals uses another value, the sig_addr; their cells are highlighted with a █-colored background.
sig | MISC | ldvary | ldtlb | ldtlbu | ldtmu | ldunif | ldunifrf | ldunifa | ldunifarf | wrtmuc | Code | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
thrsw | small_imm (smimm) | ucb | rotate (rot) | |||||||||||
00000 |
|
|||||||||||||
00001 | ; thrsw |
|||||||||||||
00010 | ; ldunif |
|||||||||||||
00011 | ; thrsw; ldunif |
|||||||||||||
00100 | ; ldtmu |
|||||||||||||
00101 | ; thrsw; ldtmu |
|||||||||||||
00110 | ; ldtmu; ldunif |
|||||||||||||
00111 | ; thrsw; ldtmu; ldunif |
|||||||||||||
01000 | ; ldvary |
|||||||||||||
01001 | ; thrsw; ldvary |
|||||||||||||
01010 | ; ldvary; ldunif |
|||||||||||||
01011 | ; thrsw; ldvary; ldunif |
|||||||||||||
01100 | ; ldunifrf |
|||||||||||||
01101 | ; thrsw; ldunifrf |
|||||||||||||
01110 | TODO | |||||||||||||
01111 | TODO | |||||||||||||
10000 | ; ldtlb |
|||||||||||||
10001 | ; ldtlbu |
|||||||||||||
10010 | ; wrtmuc |
|||||||||||||
10011 | ; thrsw; wrtmuc |
|||||||||||||
10100 | ; ldvary; wrtmuc |
|||||||||||||
10101 | ; thrsw; ldvary; wrtmuc |
|||||||||||||
10110 | TODO | |||||||||||||
10111 | TODO | |||||||||||||
11000 | ; ldunifa |
|||||||||||||
11001 | ; ldunifarf |
|||||||||||||
11010-11110 | reserved | |||||||||||||
11111 | TODO |
TODO
TODO
Setup an SD card with the Raspberry Pi OS (formerly called Raspbian) based on Debian 10 (“buster”). The newer versions of Raspberry Pi OS that are based on Debian 11 (“bullseye”) don't seem to work because they don't support the fkms driver. See BULLSEYE: "dtoverlay=vc4-fkms-v3d" OR "dtoverlay=vc4-kms-v3d" That is one of many Questions - Raspberry Pi Forums [visited 2022-06-21T18:31:19Z] for details. See also "New" old functionality with Raspberry Pi OS (Legacy) - Raspberry Pi [visited 2022-06-21T18:31:31Z].
Then do the following steps:
dtoverlay=vc4-fkms-v3d
(not
to be confused with dtoverlay=vc4-kms-v3d
), for example via sudo pico /boot/config.txt.dtoverlay
is set to
a different value) or run sudo raspi-config → 6 Advanced Options → A2 GL Driver → G2 GL (Fake KMS).Just for the sake of completeness, here is a table with the various possible settings and whether the hello_fft.bin demo that we run in section section does run with this setting set:
Setting in /boot/config.txt | Corresponding setting in raspi-config (6 Advanced Options → A2 GL Driver) | Does hello_fft.bin (section section ) work? |
---|---|---|
dtoverlay=… commented | G1 Legacy | Yes |
dtoverlay=dtoverlay=vc4-fkms-v3d | G2 GL (Fake KMS) | Yes |
dtoverlay=dtoverlay=vc4-kms-v3d | G3 GL (Full KMS) | No |
Sources:
The Raspberry Pi OS version based on “Buster” contains a folder /opt/vc. If it is missing (or - more interesting - you want to use the latest available version), you can create it manually by doing the following steps:
Best copy /opt/vc/src/hello_pi/hello_fft to your home folder:
cp -r /opt/vc/src/hello_pi/hello_fft ~
Now run
cd ~/hello_fft make sudo ./hello_fft.bin 8
(the command sudo mknod char_dev c 100 0 that is mentioned in the above linked blog article is not necessary anymore).
An explanation of the FFT example by the original author Andrew Holme with lots of additional information and links can be found at GPU_FFT [visited 2022-11-02T09:52:58Z].
Idein Inc. (GitHub page: Idein Inc. · GitHub [visited 2022-07-22T19:46:58Z]) did a lot of work for GPGPU on the VideoCore VI.
We consider the repository py-videocore6: GitHub - Idein/py-videocore6: Python library for GPGPU programming on Raspberry Pi 4 [visited 2022-07-22T19:52:46Z].
Hacking The GPU For Fun And Profit [visited 2021-10-31T22:40:09Z] (SHA-256)
Parts:
Vulkan on the Raspberry Pi:
VideoCore VI: