There is a lot of buzz around of Directx12 and Vulkan now a days. Because it’s going to change gaming and game development in general. Basically Directx12 and Vulkan are low level Graphics API.
But, what is an API? They are set of rules and commands which let two software components interact with each other (game engine and graphics driver). Which helps developers write a program that request services from other application or Operating system (OS).
Nowadays, Games uses 3D APIs which helps in generating computer graphics. They also provide developers access to the hardware and it is at very abstract level. DirectX and OpenGL are the two most popular graphics APIs out there. DirectX is developed by Microsoft and OpenGL and Vulkan is developed by Khronos group.
Traditional APIs like DirectX 11 and OpenGL were functioning fine but as the hardware technology evolved the API’s technology remains quite the same. They are high level API, hardware level access is very limited. Developers have demanded from many years to get direct access to the hardware but it was not possible with these traditional APIs, DirectX 12 and Vulkan is going to change that.
Issues with Traditional API
Directx11 and OpenGL are high level APIs which can only communicate with the hardware at abstract level means it does not have full access of the hardware. As the developers were not getting direct access to the hardware, they were not able to optimize games to a greater extent in which performance suffers and hardware utilization is not efficient.
As the hardware technology evolved the processor got faster and efficient from their previous generations. And the dawn of multi core CPU’s and Multi-threading era began. As the time has passed processor and graphics technology evolved but the APIs technology remains the same quite a bit which has led to many issues.
Multi Core and Multi-Threading support
Traditional APIs supports some level of multi core and multi-threaded CPU architecture, API only allows the games to use one core out of many available core in the processor. This led to underutilization of hardware and higher API overhead. The below image show the working of traditional API.
The processor request the GPU to display graphics on the screen is known as Draw call. Millions of calls are requested by the processor to the GPU while games are running. As the games are getting more complex the more draw calls are made to the GPU. The traditional API like Dx11 and OpenGL does not use more than one core at time to make draw calls. As the calls are less, the processors and the GPU are underutilized which leads to less performance in games.
Traditional APIs were monolithic, as they are highly abstracted, developers were not having ability to explicitly schedule and batch processes. This was handled by the Graphics driver. DirectX and OpenGL were heavily relying on graphics driver for various process such as batch scheduling, memory management, error validation and correction, synchronization etc. This leads to stall in GPU operations. Hence performance may suffer.
Developers want explicit access to schedule and batch processes. So they can prioritize their workload as they want. Which gives efficient utilization of hardware and improves performance of the GPU.
DirectX 11 and OpenGL were not natively supporting multi GPU configuration. But it can be done using various available tools. As a result a workaround was developed know as Alternate Frame Rendering (AFR).
Graphics cards essentially works on series buffer, where the result of the rendering tasks is stored and then displayed on the screen. In AFR, each frame is rendered by each graphics card in Multi GPU configuration, stored in a queue and the displayed one at a time on a display.
The problem with AFR is where each graphic card need to store identical data of game in them for proper synchronization and error handling. This led games to use limited amount of Video Random Access Memory (VRAM) as the game data is cloned on each of them. And there was no way to spread the workload among the GPUs individually.
For Example: Amount of VRAM available in multi GPU configuration
Directx11: AMD RX480 4GB + AMD RX480 4GB = 4 GB VRAM
What has changed with Dx12 and Vulkan?
Directx12 and Vulkan are low level APIs, which were developed by their parent companies Microsoft and Khronous group. Dx12 was developed from the scratch while Vulkan was derived from AMD’s own low level API Mantle.
As both the APIs are low level, the game developers have direct access to hardware which was not possible previously. This has opened a new door in game development and for Gamers. As the developers have direct access to the metal (hardware) so they can optimize games to a greater extent and can achieve console like in game performance if not better.
Multi-Threaded Command Buffer Recording
In traditional APIs based applications multi core CPU are partially or underutilize. The CPU performs operations of games based on the command buffer. The command buffer is a list of operations that CPU must reorganize and represent to the GPU so that the graphics work can be done.
The lack of utilization is owned to traditional APIs that its relative ability to break game command buffer into small and parallel chunks that can be spread among many cores available.
DirectX 12 has changed command buffer in quite away:
- Overhead is significantly reduced by distributing driver and API code on all cores in a processor.
- The time required to process complex CPU task is significantly improved.
- Game workloads are distributed across all cores.
- Now CPU is able to deliver more draw calls to the GPU.
In a GPU a processing is performed by the computational unit (Cu). These CUs consist of shaders, which performs various computation. The GPU handle work in the form of graphics queue. In traditional APIs graphics were having sequential pipeline these APIs were not very efficient and were not utilizing the full horse power the GPU has to offer.
In a GPU, latency is the biggest enemy, it is the gap in the GPU pipeline in which the GPU stays ideal. Developers have struggled with latency for a long time but the problems was not with the hardware but was on the API levels so parallel operations were not possible on traditional APIs.
With Directx12 and Vulkan Asynchronous shaders was introduced, it has enabled simultaneous execution of pipelines in the GPU which is known as Asynchronous compute (Async compute). By the way Async compute cannot be enabled only by software it require a special components inside the GPU which are known as Asynchronous Engines (ACE) which are moderated by Graphics Command Buffer.
With Asynchronous compute GPU pipeline has changed. This has given developer the ability to program pipelines which are executed in parallel. This has leads significant performance gains in games which supports Asynchronous compute.
As the latency was high in traditional API’s graphics pipeline, which led to gap in them and at that period of time the GPU remains idle. As the GPU remains idle there is a performance degradation. Async compute helps to fill this gaps in the graphics pipeline, continuously provide data to the GPU to perform graphics operations and avoids GPU going in idle state. This is executed in non-synchronous fashion so name Asynchronous compute. The below video by AMD’s You Tube channel will help you to understand it better.
With Async compute in game performance is increased up to 40% .This has created a great joy among gamers ,as they are getting extra in game FPS without investing in newer hardware(only for AMD).
AMD has quite an advantage from Async compute because AMD has been adding ACE to their GPU architecture since the early HD7000 series graphics card. These graphics cards has shown significant performance gains just by changing the API in the graphics setting.
Previous Nvidia GPU architecture like the Kepler (GTX 700) and Maxwell (GTX 900) doesn’t support Asynchronous compute. But the latest generation Pascal architecture (GTX 1000) from Nvidia supports Asynchronous compute. Which has shown slight improvement in their performance.
Explicit Multi Adapter(EMA)
Explicit Multi Adapter allows user to use any GPU in the system, regardless of the manufactures. That means you can pair AMD and Nvidia GPU and use it in multi GPU configuration. It is also possible to pair a discrete GPU with an iGPU (integrated graphics on processor) so it will offload some minor workload on to the iGPU while the discrete GPU will take care of the rest of the workload. It is similar to Dual Graphics technology which was launched with AMD’s APU. With DirectX 12 this feature is supported natively. This feature is known as Asymmetric Multi-GPU.
In DirectX 11 frames were rendered using AFR in multi GPU configuration, where GPU_A will render only even frames while GPU_B will render odd frames then they are queued and displayed on the monitor one by one. In DirectX 12 we were introduced to Split Frame Rendering technology (SFR). It is use to handle graphics pipelines for multi GPU systems. In it a single frame which is to be displayed on the monitor is split into half which is then rendered by each GPU individually. As the frame is rendered in real time so there no need of a queue to store frames. By this frames are executed by 2-3 times than the traditional technique.
In DirectX11 and pervious APIs were not supporting multi GPU configuration natively .The support was added in the form of graphics driver provided by manufactures later on. In traditional APIs multi GPU accustomed to only offering one GPUs worth of VRAM. Due the draw backs of AFR, which requires that each GPU in the multi GPU setup must contain an identical copy of game’s data set to ensure synchronization and avoid scene corruption.
With DirectX 12 it was not mandatory to use AFR, therefore it was not a requirement to keep identical game’s data in each GPU. This has open door for developer to construct large game workloads that can be processed across multiple GPUs. Now with Directx12’s multi adapter developers will have the ability to control GPUs individually along with what allocates in their memory. Unlike traditional APIs Dx12 has ability to combine the total amount of VRAM into one memory pool and use it as a whole.
Directx11: AMD RX480 4GB + AMD RX480 4GB = 4 GB VRAM
Directx12: AMD RX480 4GB+ AMD RX480 4GB = 8 GB VRAM
Note : Vulkan 1.0 does not support multi GPU and Explicit Multi Adapter
The Correspondent at the Khronous Group quoted the following :
There is no multiple GPU support in version 1.0. That was unfortunately a feature Khronos had to cut in order to preserve schedule. It is expected to be near the top of the list for Vulkan 1.1. It is perfectly possible for a Vulkan implementation to expose multiple GPUs. What Vulkan currently can’t do is allow resource sharing between them. So from a point of view of, for example, a Windows system manager, its possible to recognize multiple ways to render to a surface and then use operating system hooks to transfer that to the screen. What Vulkan doesn’t have is the ability to share a texture or a render target between multiple GPUs.
Some of the upcoming and released titles that will supports DirectX 12 and Vulkan:
- Ashes of the Singularity – Oxide games
- Rise of the Tomb Raider – Crystal Dynamics
- Hitman – IO Interactive
- Doom – id Software
- The Talos Principle- Coreteam
Directx12 and Vulkan are exciting and it’s going to improve Game development and Gaming in general. As developers are getting better control and Gamers are getting better FPS,stability and power utilization.
Hope you enjoyed it, if you do please share it on your social networks. If you any suggestions please comment below. Stay tuned for more, Catch you all later.