Mapping Heat Across A System

Addressing heat issues requires a combination of more tools, strategies for removing that heat, and more accurate thermal analysis early in the design flow.

popularity

Thermal issues are becoming more difficult to resolve as chip features get smaller and systems get faster and more complex. They now require the integration of technologies from both the design and manufacturing flows, making design for power and heat a much broader problem.

This is evident with the evolution of a smart phone. Phones sold 10 years ago were very different devices. Functionality was lower, processors were slower, and the phones themselves were much thicker. With a slimmer form factor, today’s phones are present a much harsher thermal environment that needs to be accounted for with every component designed for that system.

“Heat is a waste by-product from the operation of the electronics, and has to be removed,” noted John Parry, strategic business development manager at Siemens EDA. “Heat degrades electronic performance. Hot computers run slower, and applications that are running can ‘lock.’ Heat also reduces reliability, with heat being the single the biggest factor in field failures. And finally, heat can injure people. Exposed surfaces must be kept cool enough to touch. In the case of wearables, temperatures only slightly higher than the human body can cause a low temperature burn over time.”

When thermal mitigation is active, device frequency must be reduced constantly, which in turn may impact a product’s competitiveness. It’s not just the chip, either. Thermal issues impact the PCB and the rest of a system, as well.

“It’s so important to have a vision before the hardware is released, and have some sort of understanding about how the product would operate in its environment,” explained Melika Roshandell, product marketing director for the system analysis team in Cadence’s Custom IC & PCB Group.

While thermal analysis isn’t new, it hasn’t been a major focus for most chip designers. That’s starting to change, particularly as more compute power is packed into mobile devices and systems that are expected to last longer in the field. And it’s especially important as designs become more heterogeneous, and as advanced packaging increasingly pushes into the Z axis.

“It’s the scale of what you’re looking at,” said Marc Swinnen is director of product marketing for the Semiconductor Division of Ansys. “Chip designers are used to looking at hundreds of microns, which is the area for all of these effects for thermal, for electromagnetic. But once you get into the 3D-IC, we’re talking centimeters from one end to the other, with power rings around them. Anything inside that ring is going to feel that electromagnetic field, and it’s a significant effect. It’s much bigger than anything electrical engineers are used to on the chip. With thermal differences on a few hundred microns, you’re not going to see that much thermal gradient. But across 2 or 3 or 5 centimeters, you are going to see thermal gradient, and the scale makes a difference. A significant change in scale is not just a quantitative difference. It’s also a qualitative difference, which always has been the distinction between PCB and chip. If you make a big enough difference in scale, it completely changes the tools and methodologies you use, even though the fundamental problems are exactly the same.”

This requires more tools, and more specific tools, to address thermal issues. And the importance of those tools will only grow as the industry migrates to chiplets over the next several years, creating the need for analyzing, verifying and optimizing in-package heat.

“Designs assembled from chiplets introduce new elements to consider, assuming we are talking about 2.xD configurations, such as different maximum junction requirements,” said Javier DeLaCruz, senior director of system integration at Arm. “Here, devices in the same package may need to be more thermally decoupled. Advancements with heat spreaders with embedded heat pipes are one effective way to mitigate the differing thermal needs of each of the chiplets.”

Concurrently, the chiplets need to be modeled individually to determine their heat generation signatures. From there, the chiplets need to be considered as part of the packaged system, including the coupling of these heat sources, which impacts each chiplet, and which may then have impacts on timing and other performance metrics, DeLaCruz said.

Another consideration is that while chiplets allow for die optimization for given functions — which could reduce power loss — if chiplets are utilized to add more functionality through 3D stacking, power loss should increase, said Danny Clavette, distinguished engineer and DC-DC Compute system architect at Infineon Technologies Americas Corp.

To account for this, there is interest on behalf of semiconductor companies to introduce power conversion chiplets into future systems, which will help improve transient response due to reduced parasitic to loads. This could increase the need for increase thermal analysis and system optimization.

When the packaged chip eventually makes it into its system-housing, there are specific thermal considerations and cooling requirements. For example, most substrates are limited to approximately 105° C, and thus need to consider how much added power is integrated.

“Also, system thermal impedance from chips to top and chips to bottom need to be understood for given systems,” said Clavette. “Most systems try to limit the amount of heat transferred to the motherboard, with the preferred thermal path being to heat sink or cold plate mounted on top.”

It’s possible today to abstract states of energy consumption based on software workloads, but the software running on those designs will require more accurate models for dynamic power consumption and thermal effects. Modeling complexity is dependent on the desired accuracy, he said.

That complexity also can vary greatly between handset systems and high-performance compute systems, for example. But each system has a limit for what it can handle, and understanding those limits is critical.

“Often, packaged parts are designed and thermally simulated with an abstract set of assumptions that may not represent the complete system enclosure,” said Arm’s DeLaCruz. “This can pose a challenge at the system enclosure, as there can be many parts vying for the same thermal dissipation path.”

There are several tools in the toolbox for thermal management, DeLaCruz noted. “ One tool is to leverage the thermal mass of the packaged system to allow for adequate warning of a thermal event. It takes many clock cycles for a part to heat up, which can be observed and compensated for within software.”

Fig. 1: Thermal profile of a smart phone with detailed package, PCB and chip model. Source: Cadence

Tools made specifically for 3D-ICs need to provide boundary conditions, as this is needed to fully understand thermal conditions, said Ansys’ Swinnen. “Thermal tools need to be able to call on other related tools to put the whole picture together, such as a multiphysics tool for analyzing multi-die chip packages and interconnects for power integrity, parasitic extraction, signal integrity, thermal behavior, and thermo-mechanical stress, which needs to call on the PCB system-level thermal tool with computational fluid dynamics so fans and heat sinks are taken into account. By doing this, a boundary condition can be determined for the edge conditions so that an analysis can be done for the chip. And thermal leads into CFD, along with expansion and warpage, which leads to mechanical and stresses. So more and more physics are being dragged into the 3D-IC picture.”

Cadence’s Roshandell agrees. “Thermal and fluidics go hand in hand with electronic designs because the constraints of the electronics system model will determine your thermal behavior. In that respect, you would know the power envelope of your system, and based on that you can optimize your thermal. For example, if the power envelope is 4 watts, I know that after I reach 4 watts I have to do something for thermal. I either have to mitigate it, or I have to bring the frequency lower. This is how thermal comes into play with the whole electronic design. It goes with the constraint of your system and how you want to improve your system performance.”

A lot of the industry is still just learning about thermal issues. “Chip designers typically haven’t had to worry about it too much,” said Swinnen. “It’s something that’s more a system-level concern, and it’s generic for the chip. The chip is often simply modeled as a certain temperature — just one for the entire chip — and that’s certainly not adequate anymore. You need to have more. The power that a chip creates depends on the temperature it’s at, but the temperature depends on the power it creates. In a power/temperature table, for every power you can see what the temperature will be, and that table is basically what some refer to as a chip thermal model (CTM). With this model, you can then subjectively measure at the system level what temperature your chip’s going to be, and the table will tell you what the power output of that chip will be. This feeds back into what the temperature gradient will be, so you can converge on a consistent solution and whether that power output matches the temperature it’s at.”

Thermal mitigation strategies
Heat is transmitted by conduction, convection, and radiation. “From a cooling perspective, we can mainly improve conduction and convection,” said Siemens EDA’s Parry. “The only alternative, if the electronics get too hot, is to reduce the power consumption by temporarily lowering the product’s performance.”

Because the thermal envelope comes along with the power envelope, the power will cause the thermal to go crazy. “If you do not have a thermal mitigation plan, what will happen is it will go to thermal runaway whereby the temperature keeps going up,” Roshandell said. “Then the power keeps going up because leakage is the exponential function of temperature. And when the leakage goes up, what happens? Your temperature goes up again, and you go into a feedback loop that will make the device either burn, or cause a huge reliability issue. That’s why it’s so important to have thermal mitigation. What the power envelope gives is, ‘Up to this power, the thermal will be fine.’ For example, with a power envelope of 4 watts, you know that up to 4 watts you do not hit the thermal mitigation. Once you get to 4.1 watts you’re going to hit the thermal mitigation. If you do not have thermal mitigation, you’re going to go to thermal runaway. It’s so important to have thermal mitigation or air flow, if it is an option.”

Thermal mitigation depends on the system. “The best course of action at the chip level could be reducing the chip’s frequency, optimizing the floor plan, employing leakage recovery, or having a temperature sensor close to the hotspot,” Roshandell said. “At the package level, thermal vias can be included, or a higher copper density can be used. Then, for the whole system, if you can have a fan, that’s definitely going to help. But with a lot of electronics getting smaller, you really cannot have a fan in there. However, there are new technologies like heat pipes or advanced thermal interface materials that can help to reduce the heat.”

Additionally, to cool electronics via conduction, the following techniques can be used:

  • Gap pads. These can be useful in conducting heat from the tops of components, or the back side of a PCB to an enclosure.
  • Underfill. Components that are subjected to drop, shock, and vibration in harsh environments can be underfilled, which improves the conduction between the component and the board, particularly in leaded packages.
  • Thermal vias. Heat conduction into the board can be increased by adding these devices below and around the package, which connect down to one or more ground planes, improving heat spreading.
  • Peltier devices, or thermoelectric coolers. These are used to control temperature, but increase system-level heat dissipation.

To cool electronics via convection, Parry suggested the following approaches:

  • Fans. These are the most common ways to increase convection.
  • Liquid cooling. Liquids are a much better medium for transferring heat than air, and are used for the highest heat loads. The downside is that the cooling solution will be around 5X more expensive, and heat transfer requires a radiator.
  • Heat pipes. These can be very useful for moving heat from where it is generated to a place where it is easier to cool. An example is a laptop computer, where the heat from the CPU is moved to right in front of an exhaust fan.
  • Heat sinks. These typically are added to the top of a component. A heat sink is a passive heat exchanger that acts to enhance the heat transfer from a surface to an adjacent fluid medium. The apparent simplicity makes this seem a very desirable choice. The heatsink helps because the limiting thermal resistance in the path from the source at the die junction to the local ambient is the case-to-ambient thermal resistance.

Adding a heatsink changes the heat flow from the top of the package to the local ambient. In doing so, it also changes how much heat is conducted into the board. So in addition to reducing the component temperature, it also reduces the board temperature, which affects other components. The task of a thermal engineer is to consider how best to use the thermal budget, which is the maximum junction temperature increase above ambient minus some margin for error.

Conclusion
While there are many techniques and tools that can be used to take thermal and fluidics into consideration, for designs to be successful today the thermal team of any company has to get involved from the very first stages of the design.

“The thermal team can give a vision about how to improve the entire system performance under thermal constraints,” said Roshandell. “They can do ‘what-if’ studies. And while it will build on the previous projects, keep in mind that we are moving into newer technologies, and these new technology nodes have different leakage power and different dynamic power. All of those things have a huge impact on the thermal. So involve the thermal team from the first stage of the design, not after the design is completed and then asking them to fix it. That would be a little late.”



Leave a Reply


(Note: This name will be displayed publicly)