diff --git a/docs/dual-screen-3d-modes.rst b/docs/dual-screen-3d-modes.rst new file mode 100644 index 0000000..a22d5d4 --- /dev/null +++ b/docs/dual-screen-3d-modes.rst @@ -0,0 +1,529 @@ +Dual screen 3D modes +#################### + +The 3D engine of the Nintendo DS 3D can only be used to display 3D graphics on +one screen under normal conditions, but it is possible to display 3D on both +screens by using a special video capture circuit. + +This technique isn't straightforward. The video hardware of the NDS is very +flexible, and there are several ways to achieve 3D graphics on both screens. + +This document is a very detailed explanation of how it works in Nitro Engine +I've explained the reasons why each mode may be used, as well as their +limitations. + +If this document is too long for you to read, just use ``NE_InitDual3D_DMA()`` +in your games and skip the read! However, feel free to keep reading if you care +about the details about how all of it is implemented. + +Something that applies to all the systems is that the 3D engine can only send +the output to one of the two screens during a specific frame. Any dual 3D setup +implies saving that 3D engine output to VRAM and displaying it on the other +screen during the following frame. Essentially, the main graphics engine and sub +graphics engine need to switch from one screen to the other after every frame. + +A huge thanks to mtheal and JustBurn for the `NDS Homebrew VRAM Banks Selector +`_ and the `NDS Homebrew VRAM BG Allocation +Conflict Viewer `_, tools that made my life a lot +easier when working on this. Check them out, and their source code on `GitHub +`_. + +1. Memory usage +--------------- + +First of all, it is important to mention that dual 3D comes at a big cost. It is +impossible to have dual 3D without using two of the main VRAM banks (A, B, C and +D) as framebuffers. This means that half of the memory that the developer can +allocate for textures won't be available at all. There are two main things to +consider: + +- Compressed textures: They are the most efficient way to have textures in VRAM, + and they need to be stored in a combination of VRAM A and B or VRAM B and C. + All the dual 3D modes of Nitro Engine leave VRAM banks A and B available for + textures, so compressed textures can be used. + +- Clear bitmap: It is possible to store a bitmap with its own depth map (sort of + a Z-buffer for the bitmap) in VRAM banks C and D. While this is an interesting + feature, it isn't used nearly enough to worry about it, as it is a very + inefficient way to use VRAM (it requires 50% of the texture VRAM). In dual 3D + mode the developer can only use only 50% of the total memory available for + textures, so clear bitmaps aren't useful. + +This is a comparative table of all modes supported by Nitro Engine and how the +different VRAM banks are used. Each mode will be explained in each own section: + ++--------+-------------------+-------------------+-------------------+-------------------+ +| | Single screen 3D | Dual screen 3D | Dual 3D FB | Dual 3D DMA | ++========+===================+===================+===================+===================+ +| VRAM A | | | | | ++--------+ | 3D textures | 3D textures | 3D textures | +| VRAM B | | | | | ++--------+ 3D textures [1] +-------------------+-------------------+-------------------+ +| VRAM C | | Framebuffer 1 | Framebuffer 1 | Framebuffer 1 + | +| | | | | Debug console [3] | ++--------+ +-------------------+-------------------+-------------------+ +| VRAM D | | Framebuffer 2 | Framebuffer 2 | Framebuffer 2 | ++--------+-------------------+-------------------+-------------------+-------------------+ +| VRAM E | 3D palettes | 3D palettes | 3D palettes | 3D palettes | ++--------+-------------------+-------------------+-------------------+-------------------+ +| VRAM F | Debug console | Debug console | [2] | | ++--------+-------------------+-------------------+-------------------+-------------------+ +| VRAM G | | | | | ++--------+-------------------+-------------------+-------------------+-------------------+ +| VRAM H | | | | | ++--------+-------------------+-------------------+-------------------+-------------------+ +| VRAM I | | | | Pseudo FB + | +| | | | | Debug console [3] | ++--------+-------------------+-------------------+-------------------+-------------------+ + +| + +Note [1]: It is possible to tell Nitro Engine to not use specific VRAM banks out +of A, B, C and D. This is useful when using Nitro Engine at the same time as +some other library like NFlib. It is also possible to tell Nitro Engine to not +use VRAM banks A or B in dual 3D mode, but this isn't normally useful. + +Note [2]: The debug console isn't supported in this mode. + +Note [3]: The debug console is replicated in both VRAM I and the space of VRAM C +that isn't used by the video capture. This also means that enabling the debug +console doesn't require any additional memory banks, unlike in other modes +(however, it uses a bit of CPU). More details below. + +2. The capture unit +------------------- + +The capture unit allows you to capture the output of the main graphics engine. +It is possible to capture the 3D output by itself before any layering effect, +or the combined 2D+3D output. This last mode is useful if you want to display a +2D background on top of the 3D output, and then display that on the other +screen. In Nitro Engine this is used to display the debug console on top of the +3D output. + +It is a very flexible unit. It lets you select the size of the capture, the VRAM +bank used as destination (A, B, C or D), and even an offset for the destination +bank (for example, if you want to start saving the captured output after the +first fourth of the bank). Check `GBATEK +`_ for +more information. + +The only problem with this unit is that the destination bank must be mapped to +the CPU (LCD mode if you are using libnds). This restricts a lot what you can +do. There is a mode that lets you use the contents in VRAM as a framebuffer even +when it is mapped to the CPU, but it doesn't let you layer anything on top of +it. + +3. Single screen 3D mode +------------------------ + +Before talking about dual 3D, let's start with single screen 3D, which is easier +to explain. VRAM banks are used like this: + +- VRAM A, B, C, D: Textures. +- VRAM E: Texture palettes. +- VRAM F: Debug console (if it is enabled). + +The output in the main screen is: + +- Layer 0: 3D output + +When the debug console is enabled, it changes a bit. The priorities of layers 0 +and 1 are swapped so that the debug console appears on top of the 3D output. +VRAM F is setup as main engine background memory, and: + +- Layer 0: 3D output, priority 1. +- Layer 1: Debug console, priority 0. + +4. Dual screen 3D mode +---------------------- + +The source code `for regular dual 3D mode is here `_ +and `the source code for the console initialization is here `_. + +This is the traditional way to handle dual 3D, and it is used in multiple +examples that can be found `online `_. + +VRAM banks are used like this: + +- VRAM A, B: Textures. +- VRAM C, D: Framebuffers. +- VRAM E: Texture palettes. +- VRAM F: Debug console of main screen (if it is enabled). + +**Odd frames:** + +- VRAM C is mapped as sub engine background. It holds a 16 bit framebuffer. +- VRAM D is mapped to the CPU. +- VRAM F is mapped as main engine background. It holds the debug console. + +The output of the main screen is the combined output of the 3D engine and the +main 2D engine, which displays the debug console. + +The output of the sub screen is the background stored in VRAM C. + +The capture unit takes as input the same output as the main screen. + +**Even frames:** + +- VRAM C is mapped to the CPU. +- VRAM D is mapped as sub engine sprites. It holds a 16 bit framebuffer. +- VRAM F isn't used. + +The output of the main screen is the combined output of the 3D engine and the +main 2D engine, which isn't showing anything right now. + +The output of the sub screen is the background stored in VRAM D. However, it +isn't possible to use VRAM D as sub engine background memory, so a grid of 64x64 +sprites is setup so that they display the contents of VRAM D as a 256x192 image. + +The capture unit takes as input the same output as the main screen. + +.. image:: ./images/ne_dual_3d.png + :alt: Dual 3D diagram + :align: center + +| + +**Advantages:** + +- The latency of the 3D rendered image is zero. It is directly displayed on the + screen without any delay. + +- The screen effects based on changing the horizontal scroll every scanline + work. + +- The debug console is supported. + +- No CPU or DMA processing is needed. + +- All 256x192 pixels of the screens are used. + +**Disadvantages:** + +- If the framerate ever drops below 60 FPS, the same 3D output will be displayed + on both screens until the next frame is rendered. It can happen when loading + assets at runtime, for example. This happens because only one frame is + actually displayed from a framebuffer. The other one depends on the actual + output of the 3D graphics engine. + + While screen A is displayed from a framebuffer, the framebuffer of screen B is + capturing the output of the new screen B. During the next frame, if screen A + can't be drawn, the framebuffer of screen B will be displayed from a + framebuffer while the 3D engine still displays the previous screen, which is + screen B. + +- The textures and palettes of the NDS use 15 bit colors, that is, they use 5 + bits per component (R, G, B). However, the LCD screen of the NDS uses 6 bits + per component. Normally the least significant bit isn't used, but it can be + used if two poligons are blended together, for example. + + This is a problem because the video capture unit doesn't capture the 6 bits. + If the output switches between actual 3D output and captured 3D output the + user may notice that the image changes slightly between frames. + +- The debug console requires an additional VRAM bank, even if it's one of the + small ones. + +**Conclusion:** + +I discourage you from using this mode. It's too easy to fall below 60 FPS and +hit the issue where both screens will show the same output. + +5. Dual screen 3D FB mode +------------------------- + +The source code `for dual 3D FB mode is here `_. + +This is another well-known way to handle dual 3D, but it prevents you from using +the 2D engine in the screen that is being updated in that frame. This prevents +us from displaying a debug console in one screen because it would only be +displayed there half of the frames. + +It is named after the ``MODE_FB2`` and ``MODE_FB3`` constants of `libnds +`_. + +VRAM banks are used like this: + +- VRAM A, B: Textures. +- VRAM C, D: Framebuffers. +- VRAM E: Texture palettes. + +**Odd frames:** + +- VRAM C is mapped as sub engine background. It holds a 16 bit framebuffer. +- VRAM D is mapped to the CPU. + +The main engine is setup in direct VRAM display mode, so the output of the main +screen is the contents of VRAM D. + +The output of the sub screen is the background stored in VRAM C. + +The capture unit takes as input the 3D output directly, and it saved it to VRAM +D, which can be accessed by the capture hardware because it's mapped to the CPU. + +**Even frames:** + +- VRAM C is mapped to the CPU. +- VRAM D is mapped as sub engine sprites. It holds a 16 bit framebuffer. + +The main engine is setup in direct VRAM display mode, so the output of the main +screen is the contents of VRAM C. + +The output of the sub screen is the background stored in VRAM D. However, it +isn't possible to use VRAM D as sub engine background memory, so a grid of 64x64 +sprites is setup so that they display the contents of VRAM D as a 256x192 image. + +The capture unit takes as input the 3D output directly, and it saved it to VRAM +C, which can be accessed by the capture hardware because it's mapped to the CPU. + +.. image:: ./images/ne_dual_3d_fb.png + :alt: Dual 3D diagram in FB mode + :align: center + +| + +**Advantages:** + +- If the framerate ever drops below 60 FPS, the output of both screens will be + stable, no matter how long the wait is until the next frame is rendered. + +- No CPU or DMA processing is needed. + +- All 256x192 pixels of the screens are used. + +**Disadvantages:** + +- The latency of the 3D rendered image is high. The capture unit saves the 3D + output to the same bank that is being used as framebuffer, but it is saved + at the same time the framebuffer is sent to the screen, which means the actual + output is frame stored in the framebuffer from the last time it was drawn, 2 + frames ago. + +- The screen effects based on changing the horizontal scroll every scanline + don't work. + +- The debug test console can't be used. The main screen always uses the contents + of VRAM directly, it isn't possible to layer anything on top of it. The main + and sub screens switch every frame, so it wouldn't be possible to display a + stable console on any screen. + +**Conclusion:** + +The high latency of this mode may make it unsuitable for fast-paced games. +Also, due to the lack of a usable debug text console, it may not be useful for +developers. + +If latency isn't an issue in your game, it is possible to use this mode for the +release version of the game, after you don't rely on the debug console for +anything. + +6. Dual screen 3D DMA mode +-------------------------- + +The source code `for this mode is here `_ +and `this is the code to initialize the debug console `_. + +Thanks to `Gericom `_ for all the advice regarding +this system! + +This system is the most complex one, by far. It combines the ability of having +stable output with low framerates, the ability of showing special effects, and +the ability to have a debug console. However, this has a cost in CPU time. + +It is always required to use 2 VRAM banks to hold the two framebuffers of the +screens. This mode is similar to the previous one, but it takes the current VRAM +bank used as destination of the video capture unit, and it copies it to a pseudo +framebuffer in VRAM I with DMA. By copying every line right after it has been +drawn it is possible to use VRAM I as a 16 bit bitmap. + +The main problem is that VRAM I isn't big enough to hold a full 256x192. The +workaround is to either write out of bounds and rely on the mirroring of +addresses of VRAM I, or to only copy to the first line of VRAM I, and to stretch +it to fill the whole screen. Nitro Engine uses this last system so that the rest +of VRAM I can be used for other things. + +VRAM banks are used like this: + +- VRAM A, B: Textures. +- VRAM C: Framebuffer of main screen and debug console of main screen. +- VRAM D: Framebuffer. +- VRAM E: Texture palettes. +- VRAM I: Pseudo framebuffer and debug console of sub screen. + +As you can see, there are two copies of the console. This is done to save VRAM. +In short, there is a lot of unused memory in VRAM C, D and I with this setup. +In odd frames the console is displayed on the sub screen from VRAM I; in even +frames it is displayed in the main screen from VRAM C. + +**Odd frames:** + +- VRAM C is mapped to the CPU. +- VRAM D is mapped as main engine background. It holds a 16 bit framebuffer. +- VRAM I is mapped as sub engine background. It holds one line of a 16 bit + framebuffer, as well as the debug console. + +The output of the main screen is the framebuffer stored in VRAM D. + +The output of the sub screen is the pseudo framebuffer stored in VRAM I, with +the debug console on top of it. + +The capture unit takes as input the 3D output. This is stored in VRAM C. The +important thing is that the image is written to VRAM C at the same rate it would +be drawn on the screen. If the DMA is setup to copy from VRAM C to VRAM I right +after each line is written to VRAM C, it is possible to display the captured +image right away. The only problem with this is that there is a one scanline +delay in the copy, so the first line of the screen won't be able to show +anything useful. In Nitro Engine, the vertical blank interrupt handler sets it +to black so that it doesn't show different values every frame. + +Finally, that one line is stretched to fill the whole screen, and that's how by +simply copying the line every horizontal blanking period the screen can be +filled with different lines. + +This is different than in the dual 3D FB mode. In the FB mode it isn't possible +to have this delay, so the image displayed on the screen is the one rendered 2 +frames ago. In DMA mode the image being displayed is the one rendered this +frame. + +**Even frames:** + +- VRAM C is mapped as main engine background. It holds a 16 bit framebuffer. +- VRAM D is mapped to the CPU. +- VRAM I is mapped as sub engine background. It holds one line of a 16 bit + framebuffer, as well as the debug console (but the debug console isn't used + right now). + +The output of the main screen is the framebuffer stored in VRAM C, with the +debug console stored in VRAM C on top of it. + +The output of the sub screen is the pseudo framebuffer stored in VRAM I. + +The capture unit takes as input the 3D output. The same notes about the DMA +apply as in odd frames. + +.. image:: ./images/ne_dual_3d_dma.png + :alt: Dual 3D diagram in DMA mode + :align: center + +| + +Note that VRAM D doesn't require a copy of the console, only VRAM C and I. Also, +libnds only supports printing to one console at a time, so it is needed to load +the font to both VRAM C and I, setup the map there, and then synchronize both +maps manually. Which bank to use as main console bank? VRAM I, because it is +always mapped as sub background RAM. Nitro Engine copies the map from VRAM I to +VRAM C whenever VRAM C is mapped as main background RAM. + +VRAM C (128 KB) is arranged like this: + ++-------------------------+-------+----------------------+ +| Memory range | Size | Used as | ++=========================+=======+======================+ +| 0x06000000 - 0x06001000 | 4 KB | Tile base 0 | ++-------------------------+-------+----------------------+ +| | | Free | ++-------------------------+-------+----------------------+ +| 0x06004000 - 0x06004800 | 2 KB | Map base 8 | ++-------------------------+-------+----------------------+ +| | | Free | ++-------------------------+-------+----------------------+ +| 0x06008000 - 0x06020000 | 96 KB | BMP base 0 (256x192) | ++-------------------------+-------+----------------------+ + +| + +VRAM viewer: https://mtheall.com/vram.html#T1=3&NT1=128&MB1=8&TB1=0&T2=6&MB2=2&S2=1 + +The reason to shift the framebuffer is that map bases over index 31 can't be +accessed normally. It requires setting DISPCNT map base 64 KB step to a non-zero +value, which would affect all backgrounds. By shifting the bitmap it is possible +to use low indices and not have to worry about DISPCNT. + +VRAM I (16 KB) is arranged like this: + ++-------------------------+-------+------------------------------------+ +| Memory range | Size | Used as | ++=========================+=======+====================================+ +| 0x06208000 - 0x06209000 | 4 KB | Tile base 2 | ++-------------------------+-------+------------------------------------+ +| | | Free | ++-------------------------+-------+------------------------------------+ +| 0x0620B000 - 0x0620B200 | 512 B | Framebuffer (one line, 256 pixels) | ++-------------------------+-------+------------------------------------+ +| | | Free | ++-------------------------+-------+------------------------------------+ +| 0x0620B800 - 0x0620C000 | 2 KB | Map base 23 | ++-------------------------+-------+------------------------------------+ + +| + +VRAM viewer: https://mtheall.com/vram.html#SUB=1&T1=3&NT1=128&MB1=23&TB1=2&T2=6&MB2=2&S2=0 + +Yes, there is an overlap between the framebuffer and the tileset. The size of +each tileset slot is 16 KB, so there is only space for one tileset in VRAM I. +Unfortunately, there is also only one slot for bitmaps in VRAM I, as they are +organized in 16 KB slots. Both slots need to share the space. + +Luckily, the tileset of the console doesn't use the whole 16 KB. The current +console of libnds uses up to 256 characters, which requires (for a 4 bpp +tileset) 8 KB of VRAM. This means that anything in the second half of the bank +(VRAM I is 16 KB in size) can be used for maps or to store the pseudo +framebuffer line. + +8 KB is the same size as 16 lines in a 16-bit background. We can setup our +16-bit bitmap as if it started at the same base as the tileset, and we can store +our line bitmap at some point between 16 and 32 lines. + +Now, remember that the map uses the last 2 KB of the VRAM bank, and that uses as +much memory as 8 lines of a 16-bit bitmap. The free lines of the bitmap are +actually 16 to 24. + +Nitro Engine uses line 20, located at offset 0x3000 from the start of the bank. + +**Advantages:** + +- If the framerate ever drops below 60 FPS, the output of both screens will be + stable, no matter how long the wait is until the next frame is rendered. + +- The latency of the 3D rendered image is zero. It is directly displayed on the + screen without any delay. + +- The screen effects based on changing the horizontal scroll every scanline + work. + +- The debug console is supported. + +**Disadvantages:** + +- Some CPU and DMA processing is needed. + +- Display lists can't be sent to the GPU using the DMA in GFX FIFO mode because + of the conflicts with the HBL DMA used to copy frames from VRAM C and D to + VRAM I. This adds a bit of overhead to the process (but not that much) + +- The first row of the screens remains unused. Only 256x191 pixels are used. + +**Conclusion:** + +This is the mode I'd recommend everyone to use for your applications. The CPU +overhead is well worth the reliability under low framerates, the availability of +the debug console and the low latency of the 3D output. + +7. Conclusion +------------- + +Yes, this is complicated. + +I've tried to give as much flexibility to the developers as possible, and this +is the result. It is certainly overwhelming for someone who has never done +anything similar. I've also tried to keep compatibility with the old unreliable +dual 3D system, just in case the new ones break some game created with old +versions of Nitro Engine. + +My advice is to use the DMA method and forget about the others. The classic dual +3D system is too unreliable, you can't expect to have a game that always draws +exactly 60 FPS. The FB method has a pretty big latency when displaying the 3D +output, so it is only useful for slow-paced games. + +I hope that this document has helped you understand how tricky this is, and how +flexible the NDS video hardware is! diff --git a/docs/images/ne_dual_3d.png b/docs/images/ne_dual_3d.png new file mode 100644 index 0000000..3552609 Binary files /dev/null and b/docs/images/ne_dual_3d.png differ diff --git a/docs/images/ne_dual_3d_dma.png b/docs/images/ne_dual_3d_dma.png new file mode 100644 index 0000000..ee646d5 Binary files /dev/null and b/docs/images/ne_dual_3d_dma.png differ diff --git a/docs/images/ne_dual_3d_fb.png b/docs/images/ne_dual_3d_fb.png new file mode 100644 index 0000000..1080801 Binary files /dev/null and b/docs/images/ne_dual_3d_fb.png differ diff --git a/docs/images/ne_dual_3d_modes.dia b/docs/images/ne_dual_3d_modes.dia new file mode 100644 index 0000000..dc885f4 Binary files /dev/null and b/docs/images/ne_dual_3d_modes.dia differ