Apple iPhone 6 (Apple A8) performance review: CPU and GPU compared to the best Android phones out there
When it comes to performance and power there is no device so widely misunderstood as the iPhone. The new iPhone 6 (and iPhone 6 Plus) is no exception - you’d find bashful comments about its comparatively low clock speed, ‘only’ two CPU cores, low amount of RAM, lack of expandable storage, and what not in practically every online forum.
Looking at numbers without fully understanding them, though, is a dangerous business. This iPhone 6 performance review aims to clear some of the widespread misunderstandings and give a more detailed overview of the state of mobile CPUs, and how Apple’s efforts compare to that of the main rival: the mostly Qualcomm-powered Android fleet.
When it comes to the CPU, it’s worth starting off with a quick refresh on the facts. The overwhelming majority of mobile devices - be it Android, Windows Phone, or iOS ones - are based on ARM-derived architectures. ARM offers two types of licenses to its clients: a processor license and an architecture license.
Most manufacturers use the processor license that grants them the right to take an ARM-designed core and use it in their SoC. An example for ARM-designed cores include the battery-optimized Cortex A7 (and its newer, 64-bit Cortex A53 successor) and the Cortex A15 (with its newer, Cortex A57 64-bit heir). Phone makers like Samsung, for instance, take those two cores and combine them in various big.LITTLE combinations to come with SoCs like the Exynos 5430 in the Galaxy Alpha where the company combines four power-efficient A53s running at lower clock speeds and four performance-driven A57 that can go up to higher clocks, but also draw more battery.
Looking at this timing, you see how this coincides with Apple’s industry-first introduction of 64-bit chips - the first 64-bit phone, the iPhone 5s, arrived two years after Apple introduced its first processor, and Apple has clearly used this time slot to outpace the industry. To this day, Apple remains uniquely positioned in the transition to 64-bit on mobile - all first-party apps were 64-bit-ready on iOS 7 launch date, and the company has given developers an ample timeline and great tools to optimize their app quickly and effortlessly to 64-bit. With extremely low levels of fragmentation in Apple’s ecosystem (where by fragmentation we mean that iOS adoption rates are high and happen in days, while on Android transitions span months, if not years), the company is one year away from having a lineup consisting of 64-bit devices only. This will happen next year when the Apple iPhone 5 is expected to go out of production, and the 64-bit iPhone 5s with Apple A7 (or as speculated, a plastic derivative of the 5s with similar hardware) takes the lowest place in Apple’s ecosystem.
Looking over to the Android camp, we’re seeing that the platform lags behind a full year and more. To this date, in late 2014, the biggest Android vendors like Samsung, HTC, LG, and others, are all releasing their flagships with 32-bit chips like the Snapdragon 805 and Snapdragon 801. Both those chips are based on the now 3-year old Krait core (with some tweaks, of course), and later on in this article you’d be able to spot the difference in compute power. Naturally, using the 32-bit 805 translates into those flagships not being able to benefit from ART optimizations in Android L.
The earliest this could (and likely would) change is in spring of 2015 when the first wave of Android flagships for next year is expected to arrive. Some (and hopefully most) of those devices are said to feature the Snapdragon 810, Qualcomm’s first top-level 64-bit SoC. In just over a year time, Qualcomm has overhauled its portfolio to consist of 64-bit chips on practically all levels, from the low to the high-end. However, the Snapdragon 810 does not ship with a custom Qualcomm core (such a core would likely take more time for development) - instead, the company goes back to using an ARM processor license and equips the 810 with a big.LITTLE setup with four low-power Cortex A53 and four performance-driven Cortex A57 cores.
Given the long period of time it takes for the Android install base to switch to an ART-enabled version of the platform in meaningful numbers (let’s keep in mind that we don’t have a minimum target for ART, and chances are that it won’t be KitKat, but Android L), it is clear that Android is in a much less favorable position in terms of 64-bit-readiness.
Being as secretive as Apple is (the company does not disclose processor details in the way Intel does) hides a little joy for us, tech reviewers, to try and reverse-engineer its efforts.
We’re not completely in the dark, though: in the past two release cycles, Apple has been disclosing the number of transistors in the Apple A8: there’s now a whopping 2 billion of them, double the number from the A7. As far as we can tell, this is the most ever in a smartphone chip - in comparison, some estimates claim that the Snapdragon 805 chip features 700 million transistors.
From here on, the journey towards a better understanding of the Apple A8 starts with a teardown of the iPhone 6 and images of the A8 die from Chipworks. Those images give us a detailed breakdown of the Apple A8 die and the location of its various components.
Despite (or rather because of) the doubling of transistor count, the die size has grown smaller and comes in at 89mm2 in the A8, down from 102mm2 in the A7. Apple has switched the places of components on the die, and the CPU is now on the bottom left (it was on the bottom right), with a large block of L3 cache above it. Despite a 20% decrease in the size of the SRAM block (cells have shrunk in third from 0.12µm to 0.08µm), it’s likely that more advanced circuitry makes up for the difference and we’re still dealing with 4MB of L3 cache memory. At the time of this writing, we have seen the first benchmarks showing that memory latency has indeed improved by a hefty 20ns when we go out to L2 $ and further.
The most drastic change in size, however, seems to be in the CPU die size: the new CPU measures 12.2mm2, nearly 30% smaller than the 17.1mm2 CPU die in the Apple A7. By all visible clues, the rest of the architecture remains the same: we have 64KB/64KB of L1 instruction/data $ (L1 is the fastest cache, located on the CPU die), and a 1MB block of L2 cache shared between the cores.
Apple has provided a few important details about the CPU performance of its new A8: first, the company says the new CPU comes a 25% performance improvement, and illustrates this with a chart showing generational improvement all the way since the 2G iPhone (the 25% number is derived by comparing the iPhone 5s’s 40x CPU overhead over the 2G iPhone and the 50x peek in the iPhone 6).
With a modest boost in CPU clock speeds from 1.3GHz to 1.4GHz (an 8% speed-up), the 25% improvement obviously comes from various other tweaks and tricks. Before diving deeper in benchmarks, though, here is the place for a quick insert about clock speeds and the state of the industry. Commentators in forums are quick to point out the apparent inferiority of Apple clock speeds in comparison to the much faster speeds declared in rival Snapdragon and Exynos chips, for instance. The most up-to-date example is the Snapdragon 805 with a declared clock speed of ‘up to 2.7GHz’. At first sight, Apple’s Cyclone core looks like a sore loser with its declaration for just half that at 1.4GHz.
Most people would call it a day at this point - the Snapdragon outperforms the A8 hugely, case closed. This, however, would be naïve: running real-world applications and games shows instantly that the 2.7GHz speeds can only be achieved for a very short periods of time, but after those short outbursts, the chip quickly throttles back to the much more sane ~1.3GHz. Put simply, the 2.7GHz number that you read about is not the nominal frequency, but maxed out turbo speeds that are not sustainable for the long term. In fact, Apple is being much more truthful as it declares actual nominal (and not turbo) speeds for its chip, plus, the company goes on to disclose a second big thing about its chip: sustained performance times. Apple actually claims its A8 is capable of running flat at its nominal speeds for (at least) 20 minutes.
This is the right place to note that ARM, the licensee company for both the Snapdragon and the Apple A8 CPU cores, has actually claimed that the current generation of its processors works best in terms of thermal output/performance at around 1.2GHz. Going up above that ensues big consequences - AnandTech has earlier shared estimates that going above the 1.5GHz threshold by just 100MHz brings up a shocking, quadratic increase in voltage and power consumed by the chip.
Apple A8 and ARM's architecture license
When it comes to the CPU, it’s worth starting off with a quick refresh on the facts. The overwhelming majority of mobile devices - be it Android, Windows Phone, or iOS ones - are based on ARM-derived architectures. ARM offers two types of licenses to its clients: a processor license and an architecture license.
Most manufacturers use the processor license that grants them the right to take an ARM-designed core and use it in their SoC. An example for ARM-designed cores include the battery-optimized Cortex A7 (and its newer, 64-bit Cortex A53 successor) and the Cortex A15 (with its newer, Cortex A57 64-bit heir). Phone makers like Samsung, for instance, take those two cores and combine them in various big.LITTLE combinations to come with SoCs like the Exynos 5430 in the Galaxy Alpha where the company combines four power-efficient A53s running at lower clock speeds and four performance-driven A57 that can go up to higher clocks, but also draw more battery.
The other type of licensees, those under ARM’s architecture license program, take a totally different approach by just using the ARM instruction set, while building their own CPU core. The most prominent companies that do that are Qualcomm and… Apple. Apple used to operate under an ARM processor license all the way until the iPhone 4s, but decided to switch to an architecture license for the iPhone 5, and has building its own CPU cores ever since then.
The state of 64-bit
The earliest this could (and likely would) change is in spring of 2015 when the first wave of Android flagships for next year is expected to arrive. Some (and hopefully most) of those devices are said to feature the Snapdragon 810, Qualcomm’s first top-level 64-bit SoC. In just over a year time, Qualcomm has overhauled its portfolio to consist of 64-bit chips on practically all levels, from the low to the high-end. However, the Snapdragon 810 does not ship with a custom Qualcomm core (such a core would likely take more time for development) - instead, the company goes back to using an ARM processor license and equips the 810 with a big.LITTLE setup with four low-power Cortex A53 and four performance-driven Cortex A57 cores.
Apple A8 die break-down
Both TSMC and Samsung are said to be making the A8 in a 40-60 ratio
We’re not completely in the dark, though: in the past two release cycles, Apple has been disclosing the number of transistors in the Apple A8: there’s now a whopping 2 billion of them, double the number from the A7. As far as we can tell, this is the most ever in a smartphone chip - in comparison, some estimates claim that the Snapdragon 805 chip features 700 million transistors.
From here on, the journey towards a better understanding of the Apple A8 starts with a teardown of the iPhone 6 and images of the A8 die from Chipworks. Those images give us a detailed breakdown of the Apple A8 die and the location of its various components.
Despite (or rather because of) the doubling of transistor count, the die size has grown smaller and comes in at 89mm2 in the A8, down from 102mm2 in the A7. Apple has switched the places of components on the die, and the CPU is now on the bottom left (it was on the bottom right), with a large block of L3 cache above it. Despite a 20% decrease in the size of the SRAM block (cells have shrunk in third from 0.12µm to 0.08µm), it’s likely that more advanced circuitry makes up for the difference and we’re still dealing with 4MB of L3 cache memory. At the time of this writing, we have seen the first benchmarks showing that memory latency has indeed improved by a hefty 20ns when we go out to L2 $ and further.
Apple has provided a few important details about the CPU performance of its new A8: first, the company says the new CPU comes a 25% performance improvement, and illustrates this with a chart showing generational improvement all the way since the 2G iPhone (the 25% number is derived by comparing the iPhone 5s’s 40x CPU overhead over the 2G iPhone and the 50x peek in the iPhone 6).
On clock speeds and deceptive marketing
With a modest boost in CPU clock speeds from 1.3GHz to 1.4GHz (an 8% speed-up), the 25% improvement obviously comes from various other tweaks and tricks. Before diving deeper in benchmarks, though, here is the place for a quick insert about clock speeds and the state of the industry. Commentators in forums are quick to point out the apparent inferiority of Apple clock speeds in comparison to the much faster speeds declared in rival Snapdragon and Exynos chips, for instance. The most up-to-date example is the Snapdragon 805 with a declared clock speed of ‘up to 2.7GHz’. At first sight, Apple’s Cyclone core looks like a sore loser with its declaration for just half that at 1.4GHz.
Most people would call it a day at this point - the Snapdragon outperforms the A8 hugely, case closed. This, however, would be naïve: running real-world applications and games shows instantly that the 2.7GHz speeds can only be achieved for a very short periods of time, but after those short outbursts, the chip quickly throttles back to the much more sane ~1.3GHz. Put simply, the 2.7GHz number that you read about is not the nominal frequency, but maxed out turbo speeds that are not sustainable for the long term. In fact, Apple is being much more truthful as it declares actual nominal (and not turbo) speeds for its chip, plus, the company goes on to disclose a second big thing about its chip: sustained performance times. Apple actually claims its A8 is capable of running flat at its nominal speeds for (at least) 20 minutes.
Cyclone CPU block diagram, courtesy of AnandTech
We don’t currently have the tools to run low-level tests closer to the metal, but good work on instruction parallelism optimizations is the likely culprit to this.
We start our tests with GeekBench 3.2, a cross-platform benchmark that gives a nice detailed breakdown of integer, floating point and memory performance, segmented in a single-core and multi-core view. GeekBench also gives a grand total score that makes it easy to compare devices, if you don't want to look at that in excruciating detail.
We start our tests with GeekBench 3.2, a cross-platform benchmark that gives a nice detailed breakdown of integer, floating point and memory performance, segmented in a single-core and multi-core view. GeekBench also gives a grand total score that makes it easy to compare devices, if you don't want to look at that in excruciating detail.
It's important to note that the benchmark scores very a little between runs (a very insignificant margin), so results may vary, but the difference should be no bigger than say 2-3%. GeekBench version 3.1 added support for 64-bit chips, and introduced some changes to the memory testing, while version 3.2 of the benchmark has been released just recently adding 32-bit ARMv8 processor support.
Note that Geekbench 3 total scores are calibrated against a baseline score of 2500 (which is the score of an Intel Core i5-2520M @ 2.50 GHz). Higher scores are better, with double the score indicating double the performance.
Overall, after a huge boost in encryption with the A7, here we're seeing more balanced gains. Still, the improvements in image processing is particularly large, as testified by image compression and the Sobel test. Lua, a simple lightweight scripting language often used by game programmers, scores are also improved, and Dijkstra used for AI game path calculations also sees a huge improvement. Integer and floating point performance have each grown by more than 20%.
We have not been able to measure the power footprint of the Apple A8, but some reports claim that average power consumption has increased by some 15% form 4.3W for the A7 to around 5.0W for the A8.
What all of these results show is that Apple is still far ahead the Android pack when it comes to single-core performance.
Further investigation CPU performance, we turn to the Sunspider benchmark. Sunspider is one of the most popular browser-based cross-platform benchmarks and as such many vendors optimize their design to accomodate for it.
For a second reference point, a similarly stressful CPU test is Mozilla's Kraken. The A8 with Cyclone has a significant lead here as well.
RAM
The iPhone 5s was the first iPhone to switch to using LPDDR3 type of RAM, and the iPhone 6 continues on that tradition. All models come with 1GB of RAM, an amount that Apple has considered sufficient.
We would have preferred to see 2GB in order for the phone to be more future proof, but given the way iOS handles multi-tasking, it seems that having 1GB of RAM on the iPhone 6 does not really slow down performance.
A comparison between the Rogue architecture (top) and Rogue XT (bottom) used in the Apple A8 GPU
A comparison between the Rogue architecture (top) and Rogue XT (bottom) used in the Apple A8 GPU
Apple claims that it delivers up to 50% improvement, in line with ImgTec’s CES announcements for Series6XT GPUs being “up to 50% faster compared to their Series6 counterparts, clock for clock, cluster for cluster”.
In the GX6450 in particular, we have ASTC support added, as well as ray tracing hardware. ImgTec has also added some nice architectural improvements for better low-power performance: things like turning parts of the design like clusters on and off, better resource management, an updated rasterizer, and improved GPU compute paths.
Apple put up a demo at the iPhone 6 unveiling of a game by Super Mega Evilcorp where we saw a live rendition of a virtual world of 1.3 million polygons running at 60fps. Luckily, for graphics, we have a nice set of cross-platform benchmarks that give us a good idea about the level of performance of the GX6450 when compared to the latest on Android. We can see that the iPhone 6 GPU is at the top, with similarly powerful performance as the Adreno 420 in the Snapdragon 805 (available now only on the Galaxy Note 4).
A breakdown of GFXBench results for the iPhone 6
Conclusion
With the A8, Apple has made gradual progress, polishing what’s already one of the most powerful CPUs out there and further extending its lead in CPU performance when compared to Android. Apple has made great use of its scale and is the first smartphone maker to deliver a 20nm chip in meaningful volume (Samsung’s 20nm Exynos in the Galaxy Alpha is not really shipping in big quantities). The company has also done great with with the ISP, and this has allowed it to pack features never before seen on a smartphone like 240fps at 720p.
What’s particularly impressive in comparison with the Android ecosystem, is how smooth and fast Apple is executing the transition to 64-bit. Next year, we expect all iPhones to ship with 64-bit hardware, and in a couple of years we’ll have completed the transition. For Android, this will be a much more painful process, as even current flagships ship with 32-bit chips, and the situation will only start changing early next year.
Admittedly, the 1GB of RAM makes the new iPhone 6 less future proof, but given the way iOS multitasks, this does not result in any slowdowns to performance. On the GPU front, the new GX6450 is at the top of the charts as well, but here we have relative parity with Android’s leading hardware in the form of the Adreno 420 in the Snapdragon 805.
Things that are NOT allowed: