Die-To-Die Stress Becomes A Major Issue

Advanced nodes and packaging are turning minor issues into major ones.

popularity

Stress is becoming more critical to identify and plan for at advanced nodes and in advanced packages, where a simple mismatch can impact performance, power, and the reliability of a device throughout its projected lifetime.

In the past, the chip, package, and board in a system generally were designed separately and connected through interfaces from the die to the package, and from the package to the board. But there are so many connections and possible interactions these days that it’s no longer possible to work in isolation. Each of these is a potential source of stress, and they can compound as designs become more integrated and complex.

“There are no longer a few hundred connections in the interface,” said Sooyong Kim, director and product specialist in 3D-IC chip package systems and multiphysics at Ansys. “There are millions upon millions of bump connections these days, from one die to another. And the materials being used may come from different foundries, and different packaging groups, bringing all kinds of heterogeneous input into the picture.”

Stress adds a whole new layer of complexity to this equation, and it is forcing some fundamental changes in the chain of responsibility. “This used to be the off-chip engineer’s job,” Kim said. “Now it’s the silicon engineer’s job. A lot of parts are being siliconized, and because of that, there are different approaches needed to analyze mechanical problems due to these tight integrations. Those tight integrations also impact performance and drive up the heat easily. Originally, this was an electrical problem. Then it became a heat problem. Now it is a mechanical problem. However, this mechanical problem affects the electrical problem, which affects the heat inside the chip, and so on, in a complex cycle of multiphysics.”

Strain and stress fall under the law of elasticity, or Hooke’s Law, which states that for relatively small deformations of an object — like a semiconductor device — the displacement or size of the deformation is directly proportional to the deforming force or load. Under these conditions, the object returns to its original shape and size when the load is removed.

“When we talk about stress inside the IC, maybe 10 years ago, people would bring up the concept of strained silicon,” explained CT Kao, solution architect in the Digital & Signoff Group at Cadence. “Strained silicon inside the transistor pulls the silicon atom apart. The silicon layers are sitting on a silicon germanium layer, and those atoms are pushed apart when silicon is deposited on top of the silicon germanium. This increases the electrical mobility, bringing good effects, and reducing some obstacles. In this way, stress is of a mechanical, thermal, and force origin.”

In the early years of CMOS, these effects largely could be controlled by design rules. Transistor sizes were large enough that by controlling the spacing and enclosures within the active and well regions, the influences were negligible. Now, with semiconductor design and manufacturing in single-digit nanometer processes, these effects are showing themselves in ways that are less easily avoided. When stresses are applied to a transistor, they modify carrier motilities, which changes electrical behavior. Those stresses often take the form of medium-range geometrical configurations within the chip.

“With the end of Dennard scaling, foundries continue to look for ways to boost the performance of transistors,” Yves Laplanche, distinguished engineer, Physical Design Group at Arm noted. “The use of mobility enhancement techniques based on stress engineering has become widely used. These techniques are either directly focusing on the channel and Source/Drain material, like the use of Germanium to change the lattice structure or involve specific process steps that induce stress on the transistors from their surroundings. This is the case for Stress Memorization Techniques (SMT) or in the selection of specific Contact Edge Stop Layers (CESL). In these cases, the environment of the device on which the stress is applied will greatly influence the efficiency of these techniques and varies with the favor of the device; NMOS and PMOS, for instance, can have opposite behaviors. With FinFET technologies and small geometries in the 7nm and 5nm nodes, the relative influence range of the variations increases.”

Causes of stress
For example, the distance between a transistor and the edge of the well can create stresses that result in incorrect electrical simulations. These layout issues impact not only the chip-level design but even the IP block levels.

“Consider, for example, a chip in which an IP block is placed multiple times. When simulated in standalone mode, the IP block may pass with flying colors, but in context, each placement creates different behavior because of the different stresses applied by the surrounding elements. Another example is an analog block, for which symmetry matching is critical to correct performance. The standalone circuit may behave as ideally symmetrical, but in context, one component may experience different stresses from its mirrored twin, thus destroying the intended symmetrical behavior,” explained John Ferguson, product marketing director, Calibre DRC applications at Siemens EDA.

Indeed, at the physical IP development stage, stress effects are both a challenge and an opportunity. “In memory macros or analog blocks, the layout and vicinity of all devices can be closely controlled. On the opposite side of the spectrum, in the logic libraries, the surrounding of each standard cell can change. Specifically, Place and Route tools can create one combination of cells out of an almost infinite number of possibilities. In the design of physical IP at Arm, the stress effects are thoroughly analyzed and accounted for in the definition of the architecture to maintain the potential performance variations within a defined set of boundaries without compromising the area of the full system. The remaining variations are considered in the electrical modeling of our IP designs. Judicious implementation choices enable performance differentiation to target the power, performance, and area (PPA) required by the market,” Arm’s Laplanche explained.

Certain factors are bigger causes of stress than others. Chips are getting bigger, and they often are packaged on a wafer, such as in TSMC’s silicon on wafer (SoW) approach. “If you look at a picture of it, the real wafer after it is generated, but before being bonded to a die, is actually noticeably bent,” said Ansys’ Kim. “How are you going to handle that? When this happens, the electrical properties will change. Does it bond at all? Is it a different material? Understanding the silicon process is now critically important, and conventional mechanical engineers don’t know the silicon processes. But without knowing that, you can’t really do a proper analysis.”

All of this has an impact on reliability. “If it’s SoW, it’s a very power-hungry chip, and depending on where it is applied, the application can be very demanding. For example, if it’s doing display processing, there may be clustered current coming into the subset area of the microbumps. Is that a sustainable heat source compared to the structural readiness of that 3D IC structure? That’s another problem. After it is manufactured, and when it is used in the end application, does it hold? Sometimes there’s a meltdown in between,” Kim explained.

Strain engineering adds yet another complication. Strain is a type of stress that is applied to silicon to improve electron mobility. But too much strain can cause problems, such as cracks in the interconnect, Over time, at increasingly smaller nodes, these issues can increase in intensity.

“When new things pop up, initially, people tend to cover it with crude approximation such as first-order effects,” said Victor Moroz, a Synopsys fellow in the Silicon Engineering Group. “Usually that comes down to buffering, where another buffer is created to make sure you’re set for the worst condition. But this means some performance is being left on the table. Eventually, this gets refined to include second-order effects in order to not waste anything.”

Dark silicon concepts can be useful here, where blocks or transistors are powered down until they are needed. “If you use all the transistors at the same time, it’s definitely going to melt. You have to make your circuits lazy enough. On-chip monitors can be used here to sense the temperature as you go, and then, if overheating is detected, it just slows everything down,” Moroz said.

Chips are heating up
Heat is another cause of stress, and it’s a problem that has been steadily increasing with more transistor density and more compute intensity, particularly in AI chips where the goal of many architectures is higher utilization of processing elements.

“A large die may have localized hot spots due to more active device switching,” said Siemens’ Ferguson. “These hotspots change the stress profile across the chip. The heat issue is even more concerning in the multi-die world. One die stacked on top of another needs a longer path to dissipate its heat. As it disperses its own heat, it adds some of that heat to the die below, which imparts new stresses on that chip’s transistors that must be accounted for.”

Thermal mismatch due to varying coefficients of thermal expansion makes this much more complicated. “When you heat up a structure that consists of several materials, different materials expand and contract at different rates,” said Moroz. “Usually, the metals expand more with temperature and shrink more as you cool down, compared to dielectric for semiconductors, and that’s what creates the stress of the interfaces there.”

Fig. 1: Transient behavior of heating and cooling at different thermal contacts of a thermal resistor for two different package types. Source: Synopsys

Fig. 1: Transient behavior of heating and cooling at different thermal contacts of a thermal resistor for two different package types. Source: Synopsys

Addressing this challenge requires the addition of temperature simulation, not just of the chip, but of the entire package. Accurately designing and characterizing the electrical behavior of any given transistor requires context, including the package, the input power, and the switching across the whole system. While this is possible, it also can be impractical. The alternative is designing soft IP for use in such a way that its use will be safe in virtually any context. Similarly, chiplets must be safe to insert into all sorts of packages, Ferguson said.

This is difficult, however, when self-heating is involved. “If you have self-heating, then you might have mobility changes, but that’s negligible. The self-heating is not big enough to change mobility much, although it may degrade the transistor performance by a few percent. However, it also accelerates the aging process in transistors, like NBTI (negative-bias temperature instability) threshold shift, which slows down the circuit because of accelerated aging,” Moroz noted.

Packaging considerations
Packaging is a big part of the strain/stress discussion. In considering various packaging strategies there is a wide range of choices, starting from inexpensive options, which have poor thermal conductivity qualities.

“Given that silicon is a fairly good thermal conductor, it spreads heat well, but a bottleneck is introduced with a cheap package to radiate the heat outside,” Moroz said. “Inside, everything is going to be the same temperature because silicon is a better thermal conductor than a cheap package. There are more expensive packages that you probably cannot afford to use in something like a mobile application, which is more suited to a high-performance computing application. Those packages would be able to radiate 10 times or 20 times more heat than cheaper packages. That changes the picture, as well, and has to be factored in.”

Commercial tools and solutions from the leading EDA tool providers are helping designers figure out how to account for this.

Further, once the package becomes a better heat conductor, it removes the bottleneck of the package but also makes other things more complicated. “Specifically, there are then non-uniform temperatures inside the package. Wherever you have a hotspot, it gets radiated outside. But the temperature inside is not uniform, so you also have to take care of that,” he said.

There are stresses just putting the packages together. But as more packages are customized, those stresses become more difficult to identify. So detailed models need to be constructed, but those models quickly can balloon in size.

“As the industry continues to evolve from the historic Moore’s Law model and expand to a chiplet world, we get a new set of sources of stress. TSVs, bumps, BGAs, and stacked devices all contribute more stress to a design,” said Ferguson. “Historically, packaging solutions could consider some of these stress impacts, but they did so without knowledge of the location or intended electrical behavior of the transistors. That approach results in either over-constraining the advanced package, impacting the total size or cost of the completed design, or in designs that don’t function properly (due to the induced stresses being ignored). While capturing these stresses is possible with historic approaches, the compute requirement to do so is too large to be practical for designs with millions or even billions of transistors. An alternative approach is the use of dedicated compact models that can be applied to get much more accurate results within reasonable compute times.”

Avoiding problems
While tools and solutions are still evolving to address stress from the chip level all the way to the system level, engineering teams can take some steps now in an effort to preclude problems later.

“Previously, design engineers only verified after they built something,” said Ansys’ Kim. “But that concept is no longer valid because architecturally, the structure is very complex already without making decisions upfront, and they will fail down the road. For this reason, it is very important to come up with a good methodology in the prototyping flow or architecture flow, and run ‘what if’ scenarios before having the design done.”

Ferguson agreed. “The whole topic of stress impacts represents an entirely new level of optimization that is going to be required more and more as the industry continues down Moore’s Law, as well as toward a chiplet-based economy. Components must be characterized for stresses and heat across multiple corners. Integration tools must be able to make informed decisions on placements based on those parameters. And while the industry is actively gearing for this in several directions, it’s clear there’s still plenty of work to do.”

The starting point is what caused the stress in the first place. “With stress, we need to look at the physics, we need to look at what caused the stress,” said Cadence’s Kao. “When people see the wafer bowing up and down, they see the deformation, they see the strain, they measure the strain. Then, they look at the case of wafers. It’s not due to a force pulling down the spring. It’s due to the temperature elevation, and to the multiple materials inside the wafer, shrinking and expanding at different rates. That’s what caused the strain, and the stress. When we look at the problem, we see the effect, then we try and find out the cause. Then we fix the adverse effect. Inside the chip, all the stress there will cause breakdown, failure, or separation. At the end of the day, materials, manufacturing, and EDA design tools must all come together to solve this challenge.”

Related
Wrestling With Variation In Advanced Node Designs
Margin is no longer effective, so now the problem has to be solved on the design side.
Performance And Power Tradeoffs At 7/5nm
Experts at the Table: Security, reliability, and margin are all in play at leading-edge nodes and in advanced packages.
How Chips Age
Are current methodologies sufficient for ensuring that chips will function as expected throughout their expected lifetimes?



2 comments

GT says:

Good article. I enjoyed reading. But does CESL stand for “contact etch stop layers”? (iso. Contact edge stop layers”

Yves Laplanche (Arm) says:

You are right, this is a type. Contact Etch Stop Layers CESL.

Leave a Reply


(Note: This name will be displayed publicly)