NVIDIA GeForce GTX 480 – The wait is over …

27 March, 2010

GF100 architecture

As most of you already know the architecture is code named Fermi, after italian physicist Enrico Fermi and physics is something that shines through the whole project – according to NVIDIA the delays have been because it couldn’t everything to work just they way it wanted. Here we are the result is the biggest update since the launch of the G80 architecture, launched in 2006, with the circuit named GF100 (GeForce Fermi 100).

There is little to no information on any other DirectX11 circuits from NVIDIA, but the company promises a transition to Direct X11 capable circuits across the lineup within 12 months. In lack of other relatives to analyze we can start dissecting GF100 right away. The architecture and a walkthrough of it can be found below.

GF100 core – 16 processors divided into 32 cores each

NVIDIA has chosen to the difficult path by building a huge die consisting of no less than 3.0 billion transistors. This in turn consists of 16 “Streaming Multiprocessors” (SM), I.e. something similar to CPUs that in turn have access to 32 CUDA Cores, shaders, these are similar to regular CP cores since these do the actual work. Each SM has 16/48KB L1 cache and on top of that a shared L2 cache at 768KB. The L1 cache is listed as 16 or 48KB because each SM has 64KB memory to access, but can split into 16KB L1 cache or used as shared memory, remaining 48KB will be used when needed the most.

Structure of the GF100 core

SM – Streaming Multiprocessor

After a short walkthrough of the core we feel that we have to compare NVIDIA’s two latest architecture, and as expected the new GF100 slaps the older G80 back into the stone age. This doesn’t necessarily mean dramatic performance improvements at all times, but it is clear that NVIDIA has designed GF100 for both programming support (with shared memory, L1 cache and L2 cache) and the previously mentioned physics bit that is best translated into FLOPS, in other words Floating Point capacity.

GPU	G80	GT200	GF100
Transistors	681M	1.4B	3.0B
CUDA Cores (Shaders)	128	240	512 (480)
Double Precision Floating Point Capability	–	30 FMA ops/clock	256 FMA ops/clock
Single Precision Floating Point Capability	128 MAD ops/clock	240 MAD ops/clock	512 MAD ops/clock
Special Function Units/SM	2	2	4
Warp Schedulers/SM	1	1	2
Shared Memory/SM	16KB	16KB	16/48KB
L1 Cache/SM	–	–	16/48KB
L2 Cache	–	–	768KB
ECC Memory Support	No	No	Yes
Concurrent Kernels	No	No	Up to 16
Load/Store address width	32-bit	32-bit	64-bit

One thing you may have noticed with the table above is that GF100 is specified to have 512 CUDA Cores, while the GTX 480 launched today only has 480 cores. most likely this is because of the extremely complex manufacturing process that includes 3 billion transistors. We presume it had to give in and disable one unit to get a decent amount of working dies from each wafer, instead of repeating the catastrophe from the launch of GeForce GTX 280 where the availability was extremely scarce due to the complex and difficult to manufacture architecture.

For handling memory Fermi is designed a lot like an ordinary CPU. It has shared memory, L1 and L2 cache and 384-bits memory bus that is attached to GDDR5 memory (first time ever for NVIDIA) and for the first time it supports ECC memory.

The memory hierarchy

On top of that NVIDIA has been working hard on its GigaThread technology, which is a way for executing multiple simultaneous threads, like modern CPUs, to use the full potential of the GPU, instead of queuing up commands. A very intuitive illustration can be found below.

GigaThread i GF100 – executes your threads without pause

After this walkthrough of GF100 and its architecture we will reacquaint ourselves with DirectX 11 and its features and check what it has to offer against the competition.

Extraknapparna hos Nintendo Switch 2 är inte för input

Radeon RX 9070 får bekräftat lanseringsfönster

EA Origin går i graven om tre månader

Microsoft skjuter ner rykte om Xbox-nedläggning

Nvidia visar prestandasiffror för RTX 50-serien

Test: Philips Evnia 49M2C8900 – tungdriven men imponerande välvd 32:9-skärm

Test: Logitech PRO X Superlight 2 – underväldigande uppgradering

Test: Samsung 990 Pro 4 TB – nya V-NAND ger nya…

Test: Lenovo Yoga Pro 9i – RTX 4070, för kreatörer!

TechBubbel 185 – Sista TechBubbel

TechBubbel 184 – Nya från Qualcomm: snabbare än M2

TechBubbel 183 – Intels löjliga lansering

TechBubbel 182 – Nya PlayStation 5 är ”Slim”

TechBubbel 181 – Ryssland-mobilen, i Sverige