ARM Cortex-A72 preview: the successor to A57 coming on phones in 2016
The Cortex-A72 comes with some huge promises by ARM: the company promises that the A72 will deliver up to 3.5 times the performance of the Cortex-A15 on smartphones, and that it will feature ‘breakthrough energy efficiency’. Those are some brave claims for a core that is expected to arrive on real-world devices towards the end of 2016, some 18 months from now. Let’s explore all these claims in a bit more depth and see whether all is as rosy as headlines would have you believe.
Cortex A72: to be manufactured on 16nm FinFET+
When you see ARM’s bar charts with the Cortex-A72 towering over the performance-driven Cortex-A57 (which is expected to ship on flagships like the Galaxy S6 and HTC One (M9) soon) and the Cortex-A15, you’d be forgiven to not pay closer attention to the fact that ARM is comparing apples to oranges here. That is, we have different manufacturing nodes: the exact comparison puts Cortex-A72 made on a 16nm FinFET process vs Cortex-A57 made on a 20nm process, and finally - Cortex-A15 made on a 28nm node.
It’s hard to estimate how much of that advertized 3.5 times of a performance difference is due to the simple fact that the switch from 28nm to 16nm amounts to two full processing nodes AND a transition from a planar architecture to FinFET, but one thing is certain: it’s a big part of that gain.
It’s important to remember that TSMC’s 16nm FinFET (currently coming in two varieties, the CLN16FF and FinFET Plus) uses the the back-end-of-line (BEOL) interconnect architecture of the company’s 20nm process, but with FinFET transistors rather than typical planar transistors. In addition to that, recently, TSMC has unveiled a third 16nm process it refers to as ultra-low-power (ULP), which is a strong hint that 16nm will be one of the longer-lasting nodes as was 28nm.
Back to the main topic, subtracting the performance gains from the switch to a new node, the effective performance boost should be humbler than the numbers advertized by ARM, but right now we can only guess what that improvement will be. Word of caution, though, is that it’s a bit too early to crown the A72 as the new king, as the chip is expected to arrive on actual devices in some 18 months.
The Cortex-A72 is a ‘big’ core that succeed the A57, and that means that we can see big.LITTLE systems with multiple A72 performance-driven cores working alongside multiple A53 energy-efficient cores. Rumors already point to an octa-core chip by MediaTek using four A72s and four A53s in a big.LITTLE configuration, but chances are that Qualcomm and Samsung will have similar solutions. In fact, more than ten companies have already licensed the A72 core for use in their chips, and those include Chinese rising stars like HiSilicon and Rockchip.
We are seeing the ‘core wars’ heating up as more and more companies introduce octa-core chips. It’s hard to find any factual proof that would show a tangible benefit from going to a whopping octa-core design. Interestingly, Apple remains the only company that manages to stay immune to the octa-core trend so far, yet the Apple A8 remains the most powerful smartphone chip in terms of CPU power that is currently available, and it’s a dual-core one (Apple also uses a super-low-power M8 microprocessor for sensor data).
Along with the Cortex A72, ARM has also introduced the new CoreLink CCI-500 interconnect, a substantial upgrade over the current CCI-400. The interconnect is the part of the system that allows for seamless communication between big and LITTLE cores in ARM’s big.LITTLE, and the introduction of this new interconnect shows that the company will continue to pursue big.LITTLE systems.
Some industry watchers have pointed out that big.LITTLE results in system chips that use multiple cores, and this is a welcome development for ARM as it is the company selling the IP for the cores (the more cores, the better for ARM, goes this argument). It’s a speculative but interesting point of view, especially in the light of what can be achieved with well-designed dual-core designs where a large core scales well at all clock levels.
The CCI500 adds a snoop filter that does the job of being a central hub for cache coherence, something particularly important for large, multi-core systems with multiple L2 caches. Adding a snoop filter, also results in up to 30% better memory performance on the CPU ports.
The last piece of the A72 puzzle (or at least the surrounding story) aws the Mali T880 GPU, the newest addition to the Mali graphics series. Details are scant at the moment: we only know that the promise for the Mali T880 is a 1.8x increase in performance over the Mali T760 GPU as well as a 40% decrease in energy consumption.
It’s important to remember that TSMC’s 16nm FinFET (currently coming in two varieties, the CLN16FF and FinFET Plus) uses the the back-end-of-line (BEOL) interconnect architecture of the company’s 20nm process, but with FinFET transistors rather than typical planar transistors. In addition to that, recently, TSMC has unveiled a third 16nm process it refers to as ultra-low-power (ULP), which is a strong hint that 16nm will be one of the longer-lasting nodes as was 28nm.
Back to the main topic, subtracting the performance gains from the switch to a new node, the effective performance boost should be humbler than the numbers advertized by ARM, but right now we can only guess what that improvement will be. Word of caution, though, is that it’s a bit too early to crown the A72 as the new king, as the chip is expected to arrive on actual devices in some 18 months.
An important claim for the A72 is its target for 1.9x improvement for sustained performance over 20nm A57 chips like the Snapdragon 810. We are left in the dark about the actual clock speed at which we have this kind of improvement, but we’ll keep an eye on this as, hopefully, ARM releases more details in the near future.
OMG, more cores: on octa-cores, and the pressure of marketing
The Cortex-A72 is a ‘big’ core that succeed the A57, and that means that we can see big.LITTLE systems with multiple A72 performance-driven cores working alongside multiple A53 energy-efficient cores. Rumors already point to an octa-core chip by MediaTek using four A72s and four A53s in a big.LITTLE configuration, but chances are that Qualcomm and Samsung will have similar solutions. In fact, more than ten companies have already licensed the A72 core for use in their chips, and those include Chinese rising stars like HiSilicon and Rockchip.
We are seeing the ‘core wars’ heating up as more and more companies introduce octa-core chips. It’s hard to find any factual proof that would show a tangible benefit from going to a whopping octa-core design. Interestingly, Apple remains the only company that manages to stay immune to the octa-core trend so far, yet the Apple A8 remains the most powerful smartphone chip in terms of CPU power that is currently available, and it’s a dual-core one (Apple also uses a super-low-power M8 microprocessor for sensor data).
The whole story: a new interconnect, new Mali GPU
Along with the Cortex A72, ARM has also introduced the new CoreLink CCI-500 interconnect, a substantial upgrade over the current CCI-400. The interconnect is the part of the system that allows for seamless communication between big and LITTLE cores in ARM’s big.LITTLE, and the introduction of this new interconnect shows that the company will continue to pursue big.LITTLE systems.
The CCI500 adds a snoop filter that does the job of being a central hub for cache coherence, something particularly important for large, multi-core systems with multiple L2 caches. Adding a snoop filter, also results in up to 30% better memory performance on the CPU ports.
reference: ARM
Things that are NOT allowed: