Leveraging Chip Data To Improve Productivity

Collecting, analyzing and utilizing data can pay big benefits for design productivity, reliability, and yield.

popularity

The semiconductor ecosystem is scrambling to use data more effectively in order to increase the productivity of design teams, improve yield in the fab, and ultimately increase reliability of systems in the field.

Data collection, analysis, and utilization is at the center of all these efforts and more. Data can be collected at every point in the design-through-manufacturing flow and into the field, and it can be looped back into current and future designs to prevent costly glitches or failures, or pushed forward to prevent unwanted interactions between software and hardware.

“When we look at data in the field, and what goes on in a new foundry/fab bring-up process, we’ve seen how they’re creating tons of data on each chip made in order to determine how to improve the yield,” said Kam Kittrell, vice president of product management in the Digital & Signoff Group at Cadence. “This was always a big data problem, and there were lots of esoteric things that went into this — what it takes to improve the yield, and design for manufacturability. It’s taking big data, analyzing it, looking at how to tweak what’s going on in order to get better yield going forward. That’s right at the process node development. But now it’s going further and further downstream, because we want to be able to do system-level test as hyperscaler companies are putting together very complex cloud systems with thousands of computers, and building thousands of sites with these computers that are all identical.”

Making that mix of systems, software, and hardware work together is big challenge, and collecting field data from test chips and feeding that back in the design-through-manufacturing flow can significantly improve yield in production chips and systems. “If they’ve learned something about the chip, they can test downstream to see if there’s any immediate failure,” said Kittrell. “You don’t want to be doing the customer’s payload in the cloud, then see it fails, and then you find out something that you could have known from chip test.”

Others agree. “The megatrend that’s going on here is product complexity,” said Chris Mueth, senior manager of new markets and digital twin program manager at Keysight. “That’s the big driver. You could say there’s some regulatory standards and miniaturization going on, but it’s all really about complexity, and it’s going to just keep getting worse because consumers want more capability in the palm of their hands. Developers are going to continue to push more and more functionality into products.”

Consider a 2.5G mobile chip 15 years ago. “There may have been 100 requirements for a PA chip that would go in a phone,” Mueth said. “Now it’s a multi-function 5G chip that could have 2,000 requirements. And it may have multiple bands. It has to operate on multiple voltages, in multiple operating modes, and all this has to be managed and verified. We’ve heard stories of chip manufacturers that missed verifying a requirement and only caught it after it was already in a chip in a phone.”

And that’s only part of the story. All of this needs to be viewed in the context of changes that can affect an entire system, so data needs to be collected and analyzed end-to-end.

“If you’re a traditional company that’s making a connected appliance, you do user research and focus groups,” said Rob Conant, vice president of software and ecosystem for Infineon‘s Connected Secure System Business Unit. “You use that to inform your future product direction. In the IoT space, they really haven’t gone into that model wholeheartedly. It’s kind of an afterthought add-on to those products. However, other products have been built with connectivity at the core. A home security system is a good example of that. It has a very specific value proposition and a very specific customer, a tangible customer deliverable. Connectivity is core to that deliverable. For the companies that make those services and solutions, it’s not about how attractive the packaging is on their camera. It’s about how economically and how meaningfully they can deliver that specific customer value, so they’re much more aggressive about using data to understand how their products are operating, how their customers are using those products, and how those products tie in to that customer value. And that idea of customer success exists to some extent in consumer hardware. If you take it a layer down into the semiconductor companies themselves, this idea is trickling down into the lower-level component providers like Infineon and others, but it has not been native. It’s not something where our people start with those ideas. It’s an add-on for the products we sell.”

Better automation
Utilizing data effectively can pay big dividends for design teams. The chasm between increasing design complexity and a talent shortage is growing, and data is an essential factor in closing that gap.

“Any EDA company spends a lot of time in design and test, but because of the mega trends happening, there’s a budding area in data management to deal with how to manage all the requirements,” said Mueth. “How do I know if those requirements have been met? What is the criteria for simulating or testing devices to make sure it met the requirements? Do you have the requirements management itself? Then, configuration. How do I know the IP in my chip is the right IP when I manufactured it? What is the traceability of the tools and the versions I use? All of this is important when you’re doing the verification. And as you can imagine, I’m piling up tons and tons of data out there.”

That data, in turn, can be used to improve simulation and verification, and it can shorten the debug process.

“In the simulation realm, engineering teams are doing three main tasks,” said Simon Davidmann, CEO of Imperas. “One is building software that runs. They try to get software up and running, and have certain data they want to understand. Two is the other extreme, whereby engineers are performing the verification around RISC-V. Then there’s another bunch who are looking at improving performance. All of these people want different types of data.”

But engineering teams need to understand what they are collecting data for and what type of data they need.

“It’s the ‘what’ and ‘why.’ The ‘how’ is obviously a necessity for us engineers because, for example with our modeling, we are speed freaks,” Davidmann said. “We don’t want to put anything in the model that will slow us down. If you want to start doing analysis on what’s going through the model, it’s going to slow it down. So we are very concerned about what data people want because it’s going to have performance impacts. Then, when an engineering team wants to add data analysis, from our point of view, there’s several types of data that can be useful. First, they must identify what use they want to put it to and what granularity they want. Some people are trying to tune software and need very specific data, such as cycle-by-cycle data. Or, if someone’s trying to verify something, they’ll need completely different data to be related to the hardware events and the like. Once they’ve got the data, there are different abstractions. For example, if we’re helping an engineering team port Linux, they don’t want to look at the events in the RTL. They don’t even want to look at the register values. They want to look at the abstraction of C, or even more, they want to look at the abstraction of functions. Or, they want to look at the abstraction of the scheduler of the jobs within the OS. That’s all data that can be collected. Then, they can analyze it to see how well it performs, or what bits of the OS they’ve explored.”

Ensuring a device meets requirements is a big challenge, which is why an estimated two-thirds (or more) of chip development is spent in verification. “Pressure here comes in a lot of different ways,” Mueth said. “Part of it is just defining the requirements I need, and defining how those requirements will be simulated or tested. Then, the process definition is needed, along with construction of automation, because you’re probably not going to do all this stuff manually. You want to do it in an automated fashion. Then you have to determine how to collect the data, reduce it, and make sense of it.”

Leveraging data more effectively
Data can vary significantly depending on mission profiles, which are far different for an automotive chip than a 5G phone chip, and it can vary depending on how and where it is used in the flow.

“Today we ask our customers, ‘Do you have any insight into what your profile actually looks like?’ The answer is, more or less, ‘No,’ and this is the same case for HPC,” said Pawini Mahajan, product manager, silicon lifecycle solutions for automotive at Synopsys. “As a result, what we’re trying to do through silicon lifecycle management (SLM) is insert monitors early in the design lifecycle at the architecture level, and to collect data throughout the production lifecycle, including within the ramp phases, the production phases, the manufacturing phases. We can constantly collect data even before the device gets to the field. All of that data is something we’ve gathered throughout the lifecycle, which we provide back to the design engineer as a feedback loop to further improve the next-generation design. But what we do with the in-field data you also could do for the mission profile. For example, you could monitor for mission profile, or for aging and degradation. All of that data is being collected and filled, which could be used from an end-consumer perspective or be fed back to the design team for a better enhanced design in the future perspective.”

How that data gets sliced up an analyzed depends on what it’s being used for. Much of this is done using in-house tools that users have created for their own purposes. And in some cases, data is incomplete because data collection is blocked by contractual obligations.

“This is an area that is quite broken and disjointed,” Mahajan said. “Every semiconductor company, Tier One, or OEM, depending where they are in the lifecycle, has their own pieces of the solution. Some semiconductor companies may have their own version of process, voltage, and temperature monitors that they’re using to gather data off of their chips. But once that chip goes into a car, there is probably not a metric as of today that exists to pull that data out through the OEM’s data lake because some of the interactions don’t exist in these contracts.”

In the design flow, the overriding concern is productivity improvements. Data is important, but often it’s not leveraged as effectively as it could be.

“The challenges become so difficult with growing design sizes that they just don’t have time to do the rounds of optimization that they’d like to do,” said Mark Richards, product manager, DesignDash at Synopsys. “It’s largely about getting it done within the time window they have, so whatever PPA, whatever cure they end up with, they end up with. What they’re trying to do as these designs get larger, and take longer to iterate over, is just get better productivity somehow. Data is a means to an end, and they’ll take the best means that can be found at present to get them there.”

If the right data is collected and analyzed, it can be used to significantly improve that optimization. “If we can start to expose that to the user in a way that they can absorb it efficiently, that helps in that thrust and drive to productivity,” Richards said. “How can we tap into all of these engines, use under-the-hood methods to be able to read the engines at a much finer-grain level than they get from a log file? Then we can start to coalesce and merge that data, and try and find patterns within it. If we can extract those patterns, we can start to see them either as causation, as correlation, or something in between. That helps the engineer do their job more efficiently, which is ultimately what they’re trying to get to.”

Once a chip is manufactured and passes testing, it then can be used for hardware/software bring-up. “Assuming the silicon is good quality and it’s a known good die, you want to (depending on the application) add the first-level of software, such as the software kernel, and then test it,” said Vivek Chickermane, senior director of R&D for Tessent Embedded Analytics at Siemens Digital Industries Software. “Then you put it with the next level, which could be the operating system micro-services, then the full operating system, and maybe add some apps. You build up the software stack, and do as much testing as you can in the lab. There, engineers really want to identify and debug issues before it’s deployed in the field, and they focus tightly on a few questions. For example, if the system is a single die, it may use IPs that have not been used before and which have never been proven in silicon. You obviously want to verify that. The next level is adding the kernels or operating system, etc. We may not be sure how robust the ecosystem for that is. How good are the compiler, the kernel services, micro-services? You want to make sure all of that works well. Then you want to check that the software will behave as expected, and will achieve the required performance.”

Once the design team meets debug goals and the hardware/software co-design is deemed to be good, they are ready to move from lab to field. That brings a whole different set of challenges.

“There’s a lot of uncertainty and unpredictability in the field, because you can’t always ensure that all the software that is going to be used in the field has ever been tried before,” Chickermane said. “Let’s say you buy a mobile phone, and now you’re going to add an app that was developed sometime later. The silicon provider hasn’t tested that app. Imagine now that it’s a large system, and there are all kinds of applications with complex requirements. That is a completely different set of problems. In the field, the challenge is how to build a very reliable and predictable system, especially when you have a lot of dynamic components? How do I build it from less-reliable and less-predictable components? The reliability isn’t a function of the silicon being poorly designed. It’s that the silicon hasn’t been proven 100%. You release it with maybe 95% confidence. So there’s still uncertainty, which gets stressed in the field.”

Using data effectively is essential for closing that gap, whether that involves aggregate or monitoring data.

“With aggregate data, I may want to run an application and see how many cache misses there are or how many floating point operations happen,” he said. “Does the power-mode savings kick in, and how many times does it kick in? The system is designed with that aggregate data because a lot of metrics, like battery life or CPU utilization, depend on it. That data comes from performance counters. On the other hand, monitoring data is more comprehensive. While performance counters provide aggregate data, monitors provide time sensitive data, so there’s a timestamp. Let’s say you decide you’re going to gather data every 100 microseconds, so every 100 microseconds is a timestamp. That provides a snapshot of whatever I care about. For example, our users want what is known as the instruction trace. They want to know what instructions were executing at that time. If it’s a bus monitor, then it’s looking at what transaction is happening on the bus. Is the CPU talking to the memory? Is it talking to the network? Is it talking to an I/O?”

The power of data
It’s one thing to have the data. It’s a big leap to apply it in ways that can make a significant difference in design, which translates into value for consumers.

“If you look at a company like Apple with the iPhone, they have incredible amounts of information about cellular connectivity,” said Infineon’s Conant. “Where does cellular connectivity work? Where does it not work? Why does it work? Why does it not work in various places? How are customers using this product? How does power consumption vary depending on how the MCU is used? Their ability to design their own core, or write their own core chipset for that phone, is largely determined by the richness of the data they have, which provides insights into the usage model.”

The real value comes from depth of understanding of different use cases, and the ability to utilize data to build products for those use cases. “If you look at the performance of the phones that use their silicon, it is very good, because they understand deeply the usage model,” Conant said. “The design of those chips is driven by that data, and by deep understanding of it. That’s something that semiconductor companies need to aspire to — having a deep understanding of the usage model for their products. But today, they’re often at arm’s length.”



1 comments

Arpan Bhattacherjee says:

Great article! Especially the closing statement regarding the deep understanding of use cases and end customer use model. Couldn’t agree more. I like to refer to the analogy of an iPhone with a shattered screen… where the only use case is receiving blind calls.

Leave a Reply


(Note: This name will be displayed publicly)