TSMC’s New Wafer-on-Wafer Process to Empower NVIDIA and AMD GPU Designs
Andrew Wheeler posted on May 03, 2018 | | 682 views

Transistor density scaling is slowing down with Moore's law. This fact coupled with an inability to increase the area of a single GPU die are two crucial points of concern for graphics card manufacturers. And these manufacturers are powering advancements in GPU-based, high-performance computing applications such as artificial intelligence. They are expecting companies like Taiwan Semiconductor Manufacturing Company (TSMC) to increase their rate of innovation to accommodate the rate of innovation provided by Moore's law that is an industry standard.

TSMC unveiled Wafer-on-Wafer (WoW) chip manufacturing technology at its TSMC 2018 Technology Symposium, an annual conference which was held in Santa Clara this year. (Image courtesy of TSMC.)
TSMC unveiled Wafer-on-Wafer (WoW) chip manufacturing technology at its TSMC 2018 Technology Symposium, an annual conference which was held in Santa Clara this year. (Image courtesy of TSMC.)

What is a Multi-Chip Module (MCM) GPU Design?

To increase performance and artificially ensure that the rate of Moore’s law stays constant, package-level integration of multiple GPU modules that leverage high bandwidth and signaling technologies is necessary to increase power efficiency. To partition GPUs into modules (known as GPMs), chip designers have to optimize architecture to decrease latency between bandwidths that bind the stacked GPMs together. They also have to improve the efficacy of GPM data locality. These are just two primary factors motivating design who work toward making these Multi-Chip Module GPUs a convincing insurance policy against the slowing tide of Moore's law.

Since manufacturers like TSMC (Taiwan Semiconductor Manufacturing) have a vested interest as a silicon foundry and chipmaker, they have to ensure that their manufacturing capabilities can continue to scale up GPU performance for big clients like NVIDIA and AMD regardless of industry-wide transistor density scaling slowing down. TSMC recently showed off a promising solution for Wafer-on-Wafer (WoW) technology, which addresses latency between the different GPU clusters that make up an MCM based GPU.

To understand TSMC’s novel WoW innovative approach, the current approach must first be understood. An MCM is generally manufactured using a custom interconnect and an interposer. The interconnect is a bottleneck for the latency between modules in an MCM. Part of the reason for this is that the wafers in an MCM are positioned and connected laterally by the interconnect. And the interconnect is what primarily causes latency between the wafers.

To get around this, TSMC’s new proposal involves the use of Through Silicon Vias, also known as TSVs. These 10-micron holes allow the two silicon wafers to touch. This approach from TSMC is meant to demonstrate that stacking dies on top of one another can improve power efficiency and decrease latency lost between GPMs. (Image courtesy of TSMC.)
To get around this, TSMC’s new proposal involves the use of Through Silicon Vias (TSVs). These 10-micron holes allow the two silicon wafers to touch. This approach from TSMC is meant to demonstrate that stacking dies on top of one another can improve power efficiency and decrease latency lost between GPMs. (Image courtesy of TSMC.)

What is an Interconnect?

The interconnect is one of the most crucial components of consideration when planning, designing, engineering and manufacturing MCMs and integrated circuits like System on a Chip (SoC) or ASIC (Application Specific Integrated Circuit).  

Historically, interconnects were built using a wiring approach called subtractive aluminum—where blanket films of aluminum are deposited, patterned then etched—leaving isolated and exposed wires which are then coated in dielectric material. The vias are tiny etching holes, and they interconnect the two wafers to each other through the insulating material. Dynamic Random-Access Memory (DRAM) chips are built this way. 

The interconnect layer wiring material has changed in recent years from aluminum to copper. Since the number of transistors that are interconnected on modern microprocessors has increased exponentially, timing delays in the wiring of interconnect levels increased as well, prompting the switch to copper. Copper has a 40 percent less capacitive resistance than aluminum, yielding a 15 percent faster processor speed overall.

Even after switching to the damascene copper manufacturing process, miniaturization of the wires of interconnects produces resistance-capacitance delay issues. Due to the length of the wires increasing while the width decreases, it becomes harder and harder to push electrons through them at the same rate of speed.

The combination of a plateauing rate of transistor scaling and increasing latency between GPMs is part of the reasons why TSMC is using Through-Silicon Vias (TSV) for 3D integration.

What Are the Benefits of TSMC’s Wafer on Wafer Tech for GPUs?

GPUs designed by NVIDIA and AMD and manufactured using TSMC’s wafer-on-wafer technology could become more powerful without increasing their physical size. Layers are stacked vertically rather than horizontally along the printed circuit board (PCB) like solid-state drives (SSDs).

Currently, NVIDIA and AMD GPUs are built from a single wafer, which is why TSMC’s research and development teams completed a goal to stack and bond two wafers, one above the other, on a single package. The single package is a cube-shaped and its two stacked wafers are connected by an electrical interface known as an interposer, which routes each connection to another.

Switching wafer scaling from horizontal to vertical may not seem like the most innovative engineering move of all time, but it is no simple task. The reason it’s going to have an impact on the industry is that TSMC can now offer NVIDIA and AMD the ability to add two GPUs to one graphics card as a “new” or refreshed product offering—without having to develop the architecture of a new GPU to fit more cores. This announcement and plan from TSMC is designed to reduce GPU performance anxiety about the future, as transistor scaling becomes harder to replicate at the exponential rate of years past.

GPU Gains from Wafer-on-Wafer Tech

Since the operating system would detect the twin-wafer GPU stack as one chip instead of a multi-GPU configuration, increasing capacity while using the exact same amount of room as a single card.

Like 3D NAND and DRAM, TSMC should be able to offer NVIDIA and AMD the ability to add more storage capacity, but manufacturing stacking processor wafers will likely prove to be a high-cost endeavor. During inspection, costs could accrue because if one wafer out of the two fails, both must be discarded.

The WoW approach from TSMC is similar to the way the dies are stacked like DRAM and NAND, which allows for faster interfacing and significantly more GPU cores to work with. (Image courtesy of TSMC.)
The WoW approach from TSMC is similar to the way the dies are stacked like DRAM and NAND, which allows for faster interfacing and significantly more GPU cores to work with. (Image courtesy of TSMC.)

TSMC’s customers include both NVIDIA and AMD, so this wafer-on-wafer stacking process will supposedly ensure that the core count can continue to increase as the technology for transistor scaling slows down industry-wide. It's too early to tell for sure if all the engineering bottlenecks have been addressed.

Won’t Wafer-on-Wafer Be Too Hot?

Mounting the wafers with TSVs leaves more of an air gap between the two wafers. But both components generate heat, which is problematic. The bottom chip’s heat will warm up to top wafer, and though the top wafer will be cooled more than the bottom one. So the heat sink would need to transfer heat away from the small area around the bottom wafer, which is costly to do.

Since WoW is costly, TSMC is likely to use  on high-yield production nodes to reduce loss due to waste. But as far as addressing cooling or heating issues, TSMC is pretty mum. The prospect for AMD and NVIDIA to stack dies on top of each other and presto—double the GPU core count for chip refreshes without having to develop new GPU architectures means that GPU-based, high-performance computing applications such as artificial intelligence can continue to rely on historical rates of growth in GPU processing power.


Recommended For You