Nvidia ampere a100: full graphic function and cpu independently

NVIDIA Ampere A100: Full graphic function and CPU independently

In the background language with chief developer Jonah albums c’t learned some more, interesting details on Nvidia’s recent A100 chip for data centers. He has full graphics functions, a HBM2 module "too much of" and can be almost independent as the EGX-EDGE accelerator with a smart network chip from the CPU.

The A100 is the first chip with the new ampere architecture of Nvidia. For this purpose, the manufacturer integrates a comprehensive arsenal of computational units for different purposes, such as tensor-cores for machine learning, and much ECC-protected memory from the L1 cache to the 40 GB of local HBM2 – and that, although the A100 is not yet at all the full expansion of the ampere architecture (GA100) has.

In the presentation of the functioning area, NVIDIA focused on the features that can be particularly points for the advanced clientele: strong improvements in machine learning (Machine Learning, ML) and high-performance computing (HPC) with double-accurate floating-point calculations. Sometimes short in the presentation of other aspects such as the graphics functions of the chip came.


In the accompanying material for the ampere architecture, the so-called whitepaper, was the speech that the A100 graphics chip had no display connections, no raytracing units and no video encoding hardware block. This led to speculation as to whether the GPU may be designed as a pure DataCenter accelerator and is not usable as a graphics card.

Nvidias Senior Vice President of GPU Engineering Jonah M. Albums said opposite C’t in a conference call now that these functions were indeed not in focus, so that they have not been discussed in the envision. Similar to Nvidia’s Previous Datacenter GPU, the Tesla V100 procedure, which is very well via full graphic functionalitat. This also applies to the A100: From rasterization units to the grid power amplifier, all exist, only to renounce the RayTracing cores of the Turing-Gamer graphics cards MUSSE A100. Gamer graphics cards with A100 are still very unlikely.

NVIDIA Ampere A100: Full graphic function and CPU independently

The ampere full expansion GA100: 8192 FP32 shader cores, 48 MB2-cache and 48 GB HBM2.

A memory chip remains unused

The A100 is sold with 40 GB of Local Memory of the type HBM2 – that corresponds to FUNF 8 GB stacks. However, six memory chips can be seen on all available representations of the chip package. The A100 can actually address six HBM2 stacks in its full expansion stage via 12 512-bit memory controllers. In the first A100, according to information from albums, every memory chip will be occupied, ie: 8 GB of expensive HBM2 lie in the initial configuration broke. This reduces the chances that such a GPU configuration go downstream in the desktop or even gamer market.

SmartNIC makes server CPU (a bit) overflowed

The A100 will add the SXM4 module for server use as a EGX version. The chip is on a PCIe-4.0 card loaded at the same time the 200 Gbps fast network chip Connectx-6 DX sits. He comes from Nvidia’s short-selling network subsidiary Mellanox and serves Ethernet or Infiniband. With this platform, nvidia can demonstrate on the one hand the integration planes for mellanox technology and also start first steps to decouple a GPU-based data center from the host processors: the Connectx-6 DX has a crypto unit that comes in real time 200 Gbps and closed. The data can then be bladed directly into the graphics memory. For example, flexible virtualized GPU clusters may be possible in which data can also be taken without host processors from and to the GPU, such as NVME-Over-Fabric.

Multiple GPU

MIG stands for Multi-Instance GPU. A single A100 accelerator can be divided into up to seven smaller units. This increases the flexibility in the data center, if, for example, many less elaborate inferencing instances have to run in parallel at main times, but in the evening the net is re-exercised with newly obtained data.

The division only requires a GPU reset for initialization, then the resource blocks can be changed at runtime: minimum is one of the seven active graphics processing clusters of the A100 and maximum can be summarized, but there is some overhead, performance cost. Therefore, the mode is not continuous and ex works. How much performance is accurately lost, Nvidia could not say in the context of a virtual lecture of the GTC 2020 on demand from c’t. That certainly depends on the respective application.

Both the data paths within the GPU and the storage areas are completely separated from each other, so nvidia. Do not connect the others, for example, the cache access or only see their storage area. An important restriction: only one of the instances is a full GPU with graphic ability, the remaining six are pure compute instances, so no pictures can spend.

First no retail sale

However, NVIDIA currently sells the A100 only as part of its preconfigured HGX boards and DGX server insertion. In the simpler HGX variant, four A100 are directly connected to each other directly over each of their 12 NV links. For eight chips, six NV switches help and a 16x A100 variant will be equipped with a backplane in addition to a backplane.

The DGX-A100 server is equipped with two 64-core EPYC 7742 as main processors that remove the Intel Xeon CPUs used in the previous DGX generation. An important reason is that in the mainstream segment so far only EPYC-7xx2-CPUs PCIe 4.0 offer which the A100 chips do not brake.

A detailed analysis of the NVIDIA ampere architecture can be found in the upcoming C’t edition 13/2020 and then in Heise+.

Like this post? Please share to your friends:
Leave a Reply

;-) :| :x :twisted: :smile: :shock: :sad: :roll: :razz: :oops: :o :mrgreen: :lol: :idea: :grin: :evil: :cry: :cool: :arrow: :???: :?: :!: