Nvidia has just unveiled its fastest GPU yet here at GTC 2016, a brand new graphics chip based on the company’s next generation Pascal architecture.
Nvidia claims that GP100 is the largest FinFET GPU that has ever been made, measuring at 600mm? and packing over 15 billion transistors. NVIDIA Pascal architecture for exponential performance leap — A Pascal-based Tesla P100 solution delivers over a 12x increase in neural network training performance compared with a previous-generation NVIDIA Maxwell™-based solution. CoWoS with HBM2 for big data workloads — The Pascal architecture unifies processor and data into a single package to deliver unprecedented compute efficiency. New AI algorithms for peak performance — New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning.
The GP100 GPU is comprised of  3840 CUDA cores, 240 texture units and a 4096bit memory interface. The massive GP100 GPU has significantly more pascal streaming multiprocessors, or CUDA core blocks. Perhaps one of the most exciting, yet perhaps predictable, revaluations about the GP100 Pascal flagship GPU is that it can achieve clocks even higher than Maxwell. We’re looking at actual frequencies of upwards of 1500Mhz on the GeForce equivalent of the P100. We’ve already seen AMD take advantage of HBM memory technology with its Fiji XT GPU last year. TSMC’s new 16nm FinFET process promises to be significantly more power efficient than planar 28nm. One of the more significant features that was revealed for Pascal was the addition of 16FP compute support, otherwise known as mixed precision compute or half precision compute.
However due to its very attractive power efficiency advantages over FP32 and FP64 it can be used in scenarios where a high degree of computational precision isn’t necessary.
The technology targets GPU accelerated servers where the cross-chip communication is extremely bandwidth limited and a major system bottleneck.
NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly.
Pascal will be the company’s first graphics architecture to use next generation stacked memory technology, HBM.


Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximize application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication. It is engineered to deliver the fastest performance and best energy efficiency for workloads with near-infinite computing needs. Within each Pascal streaming multirprocessor there are two 32 CUDA core partitions, two dispatch units, a warp scheduler and a fairly large instruction buffer, matching that of Maxwell.
Despite Nvidia opting for very conservative clock speeds on its professional GPUs like the Tesla & Quadro products the P100 actually has a base clock speed of 1328mhz and a boost clock speed of 1480mhz. Namely HBM, mixed precision compute, NV-Link and the smaller, more power efficient TSMC 16nm FinFET manufacturing process. At this mode the accuracy of the result to any computational problem is significantly lower than the standard 32FP method, which is required for all major graphics programming interfaces in games and has been for more than a decade. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect.
However, the focus is crystal clear and is 100% about pushing power efficiency and compute performance higher than ever before. It will also be the first ever to feature a brand new from the ground-up high-speed proprietary interconnect, NV-Link.
Like us, tweet to us or +1 us, to keep up with our round the clock updates, reviews, guides and more. Keeping this massive GPU fed is 4MB of L2 cache and a whopping 14MB worth of register files. Considering that GPU Boost 2.0 actually allows these cards to operate at even higher clock speeds than the nominal boost clock. Each is very important in its own right and as such we’re going to break down everyone of these four separately. The new memory standard will also allow for a huge increase in memory capacities, 2.7X the memory capacity of Maxwell to be precise. AMD has also announced last month at its Capsaicin event that it will be bringing HBM2 with its next generation Vega architecture, succeeding its 14nm FinFET Polaris architecture launching this summer with GDDR5 memory.


Which would enable Nvidia to build faster, significantly more complex and more power efficient GPUs.
This includes DirectX 12, 11, 10 and DX9 Shader model 3.0 which debuted almost a decade ago. Nvidia’s Maxwell GPU architecture feature in the GTX 900 series of GPUs is limited to FD32 operations, this in turn means that FP16 and FP32 operations are processed at the same rate by the GPU. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs. NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. Permissions beyond the scope of this license may be available on the Terms and Conditions page. In turn we should expect even more performance out of each Pascal CUDA cores compared to Maxwell.
Which indicates that the new Pascal flagship will feature 32GB of video memory, a mind-bogglingly huge number.
However, adding the mixed precision capability in Pascal means that the architecture will now be able to process FP16 operations twice as quickly as FP32 operations. And as mentioned above this can be of great benefit in power limited, light compute scenarios.
First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance.



Easy ways to make money fast for 13 year olds free
New money opportunities