Breaking News

Talkin’ Treble: How Android engineers are winning the war on fragmentation

Q&A: Android engineers Dave Burke and Iliyan Malchev talk Android P progress at length.

With the launch of Android 8.0 last year, Google released Project Treble into the world. Treble was one of Android's biggest engineering projects ever, modularizing the Android operating system away from the hardware and greatly reducing the amount of work needed to update a device. The goal here is nothing short of fixing Android's continual fragmentation problem, and now, six months later, it seems like the plan is actually working.

At Google I/O this year, you could see signs of the Treble revolution all over the show. The Android P beta launched, but it wasn't just on Google's own Pixel devices—for the first time ever, an Android Developer Preview launched simultaneously on devices from Google, Nokia, OnePlus, Xiaomi, Essential, Vivo, Sony, and Oppo, all thanks to Project Treble compatibility. Even car makers—some of the slowest adopters of technology on Earth—were on the Project Treble train. Dodge and Volvo both had prototype cars running Android as the infotainment system, and both were running Android P.

Image used on the post

As is becoming custom for our annual trip to Google I/O, we were able to sit down with some core members of the Android Team: Iliyan Malchev, the head of Project Treble, and Dave Burke, Android's VP of engineering. (We quoted Iliyan Malchev a million times during the Android 8.0 review, so it was nice to get information from him first hand, and Dave Burke has been through the Ars interview gauntlet several times now.) And through this lengthy chat, we got a better understanding of what life is like now that Project Treble is seeing some uptake from OEMs.

What follows is a transcript with some of the interview lightly edited for clarity. For a fuller perspective, we've also included some topical background comments in italics.

Proving out Project Treble with Android P
First up, a recap of what's going on with Treble right now.

Iliyan Malchev: With Treble, the operating system has separated to the adaptation layers that tailor down to the hardware. And that's still the case, but the devil is in the details. There's a ton of nuance that we still need to get right, and this is what we've focused on with this [Android P] release. What is the case today—and I think that gets overlooked by a lot of the press on Treble—is that any device that is preloaded with Google's apps, any device that launches with Oreo or subsequent releases, must work smoothly with a binary image of Android that we built for certification purposes.

This image isn't a product. The intent is not to launch this, but the idea is, by requiring that this "golden image" run on everything out there, we cresate a centripetal force that pushes our partners ever so gently toward not changing Android in ways that aren't really meaningful to their bottom lines. We finished that technical work with Android P this year, and we started working with silicon manufacturers.

Dave Burke: Yeah, I think this is actually one of the biggest shifts. After finishing the technical work, there was the actual process of engagement of working with the silicon vendors, which is a big change.

Malchev: We have teams in Taipei, Seoul, and San Diego that work with Mediatek, Samsung Semiconductor, and with Qualcomm, respectively. We took our work and we applied it to their BSPs [Board Support Package]. Qualcomm and everyone else will take AOSP [Android Open Source Project] as we publish it, incorporate it into that BSP, and then give that in turn to the device manufacturers. That BSP really is where devices start. They don't start with AOSP, because AOSP is, by itself, not a complete product.

Like Malchev says, the open source parts of Android (AOSP) just consist of operating-system code and won't actually run on a piece of hardware. A Board Support Package (BSP) combines AOSP with all the other code needed to make Android run on a piece of hardware. This is usually things like the Linux kernel and drivers. Like the diagram shows, Google publishes AOSP, SoC vendors like Qualcomm combine AOSP with a specific version of the Android Linux kernel and drivers to create a BSP, and OEMs load the BSP onto a phone, adding hardware and software customizations.

Malchev: It was the standing issues with the BSPs that we tackled, because if we release AOSP in August and then Qualcomm does three months of work to release the BSP, then it's already the end of the year. If you're a device manufacturer, you're basically out of luck. So we absorbed all of this work to make it simultaneous with the internal Android development.

Burke: I think one example is Telecom and Telephony. And how many changes did we upstream? There were a lot.

Malchev: Right, so in addition to all of this, we also started making AOSP more of a fleshed-out product by upstreaming 150 features that our partners had to maintain out-of-tree. That's very important to them because the ongoing costs of maintaining all of this kind of code is massive.

"Upstream" for Android is the Linux Kernel. Google maintains its own fork of Linux for Android, but the two are closer today than they have ever been. Correction: Nevermind, "Upstream" here means "upstream of OEMs," which is AOSP. Malchev is referring to including third-party phone features in AOSP so OEMs don't have to manage as much code.

Burke: And the other big shift is just our workflow. The silicon vendors—the three in this case for the chipsets that we're supporting—are actually committing code into AOSP. All the companies are working together on one repository. That's a huge change in how we operate, because we used to build the OS to a certain point; then the silicon vendor would take it to a certain point; then the device maker would take it to a certain point, all serialized. Now, we can work in parallel with the silicon vendors on the same codebase. When we have release candidate of P, they have what they call "CS," or "commercial sampling," and they're ready at the same time, which is a huge difference.

Burke and Malchev are describing the process we saw at I/O with the launch of the Android P Beta. Google, Qualcomm, and other SoC vendors and OEMs all have a hand in bringing a new build of Android to a device. And, before, the "serialized" development process meant each company had to finish before the next company could start. With a stable interface between the hardware and software parts of Android, everyone can work simultaneously to port a new version of Android to a piece of hardware.

For an idea of what this is like now, Google was even nice enough to send along this quote from Essential, which gives us a timeline of how long the Android P beta took to port. 

"Making sure our Essential users have the very latest OS updates is incredibly important to us. Once we'd enforced vendor and system separation on Oreo using Treble, our small team got Android P running on PH-1 in only 3 days." --Rebecca Zavin, VP Software, Essential

The Essential Phone is probably the best case scenario for an update: a Project Treble-compatible device with stock Android, so there won't be a ton of software modification required. Three days for a port still seems like an incredible amount of progress, considering many OEMs take months to update. 

Malchev: And that's really the part of the iceberg that's beneath the surface. Dave just described cooperating with these silicon manufacturers; it means we both needed to change our development practices dramatically. Qualcomm has a 6,000 person-strong engineering department that works on Android. For Project Treble, they started working jointly on our infrastructure with us. So we're both keeping their BSP up to date, and we make sure it matures as Android itself does. And so that's a massive, massive change. And we're doing the same with MediaTek and with Samsung Semiconductor.

Ars: So what did you have to change in Treble for Android P? I don't know if you pay attention to the custom ROM scene...

Malchev: I do a lot, yeah.

Ars: Well with Android O, people were already able to take something like a Huawei phone, which ran a heavy Android skin, replace it with AOSP, and it worked. So Treble seemed pretty done... 

Malchev: I was so happy when the first reports started streaming in, because these were independent proofs that this generic nature works. The reason these custom ROMs are so easy now is because they are on top of this generic image that we require for Treble compatibility. So even though we don't publish the instructions on how we create it, it's basically Android open source.

Burke: Previously it used to be that, to have a compatible device, a device maker would have to pass CTS—our Compatibility Test Suite. Now they have to pass CTS on top of this golden image on top of their silicon implementation.

Ars: And when you say the golden image, you mean the AOSP build? It just has to boot, and everything has to work?

Burke: Yeah, it has to pass all these tests—like eight million, or whatever, CTS tests. And it all has to run through this golden image.

We'll get to what's actually new in Treble for Android P in a moment, but at this point in the conversation Dave Burke whips out a Qualcomm Mobile Test Platform (MTP) device as a visual aid. This is Qualcomm's developer platform and usually the first form factor a new SoC shows up in before the OEMs can integrate it into a consumer product. An MTP is basically the biggest smartphone on earth. It's at least an inch thick, with a ridiculous port selection like RJ-45, LTE antenna hookups, and multiple MicroSD slots. Burke had the fancy new Snapdragon 845 version, but you can see an older version from Mobile World Congress 2014, above. 

Ars: Oh, is that the giant Qualcomm brick?

Burke: This is actually a Pixel 3.

Ars: Oh, OK. (laughing)

Burke: We decided bigger was better.

Ars: Hey, if it's all battery, that's fine. (laughing)

Malchev: So this is an example of, this is the hardware that Qualcomm puts forth, right? It's called an MTP.

Ars: Yeah, I've seen those at Mobile World Congress.

Malchev: Yeah, right. So this is the progenitor of every phone out there. Like, every phone that has this SoC, which is the 845 in this case, is basically a near clone of this reference design. It's like a PC, but, sort of in its own micro universe per chipset. This is running Android P right now with Qualcomm's addons. And basically this is the BSP running Android P right now. It is basically ready for OEMs to take and launch devices on.

You ask what we did for Treble that enabled custom ROMs. They started with Oreo, right? Which is what we did in lockdown with the launch. In O MR1 [Android 8.1], and in P, we improved Treble. If you view Android from the position of a custom ROM, you'll basically have to do zero effort to support future versions of Android on top of that vendor's scope. We made that seal basically airtight in P. Whereas, in Oreo, you have to do some work around something that we call the VNDK (Vendor NDK) that you've touched on, I think, in your original post on Treble. The VNDK wasn't quite finished in Oreo. We finished that with MR1 and P. And we sort of tightened a few screws, in a way.

It's hard. The boundary between the top and the bottom is very, very long and squiggly, if you will. So, defining it and enforcing it is quite an effort. The more that we can automate it, the better off we are. So I think we've finished the automation piece.

A bit about Vendor NDKs: Project Treble works by giving each subsection of Android a HAL (a hardware abstraction layer). These are the standardized interfaces that Google and hardware vendors both write to, allowing, for instance, Android's camera code to work with a vendor's camera hardware. There were about 60 of these in Android Oreo, for things like the camera, audio, location, and more. In Oreo these didn't cover every possible device type, so to allow hardware vendors to make their own device-type HALs, Google created Android's Vendor NDK. Imagine something like Samsung's Iris scanner on the Galaxy S9—if iris scanners were not an official device type in Android O, Samsung could make its own device type with the VNDK. Having more device types in Android P leads to less work for software makers, professional or otherwise.

Ars: Was there anything that surprised you when you finally rolled out Treble? It's been out about six months now.

Malchev: We've kept our heads down working on the next thing.

Ars: Well, for instance I was visiting the Android Auto section, and they have car companies already running Android P. Especially for car companies, that seems like something that would have been impossible in the past.

Malchev: It's a really good feeling to know that what you theorized and you believed strongly is true. The first time when it gets this independent confirmation, it's an awesome feeling, because, really, however certain you are, there's like a flicker of doubt.

Burke: Wait, you told me you were totally confident about it! (laughs)

Malchev: I was totally confident! Like I got Dave's support by lying to him that this was definitely going to work, when in fact I didn't know. (laughing)

But these developer previews are the proof, right? It's amazing. These are vastly different manufacturers. We do very different things to Android; then for them to do their different things to Android on time, to launch a beta, proves that the work that we've done so far really is enough for them to do their thing and launch this year.

Ars: Are you expecting this to happen every year, where developer previews come out on this wide collection of phones?

Malchev: I want it to only get broader and broader, yeah.

Burke: Yeah, I think so. And I think the timing's interesting. I think the industry's maturing to a point where cost efficiencies are now the things that CFOs, and device makers, and silicon vendors care a lot about. They see what we're doing and are like, "Oh, I can save a lot of money if I just use the standard image and I don't make changes there? Okay, fine, let's do it." Where it used to be, in the beginning, they would say "Oh it's open source, let's do stuff!" Now they realize they are better off just making smaller changes, and, like, reusing what's already there. If you just look at how to streamline the whole ODM industry on the hardware side, we're just trying to enable out the software side.

Malchev: That whole "Be together, not the same" ethos is, you don't want to force conformity, but you want to enable that flexibility without the costs that are inherent in doing it by modifying lines of source code, right? Because every time you have to recompile Android, it's an untold amount of money that has to be spent to verify it and approve it.

What's new in Android P?
Ars: One of the more interesting Android P-era changes is requiring the Play Store apps to target newer versions of Android. Is that going to change the way you look at future Android versions, knowing you don't have to support a ton of legacy APIs when everyone has to be on a new version? Do you start turning off old APIs?

Burke: It's an interesting question. It wasn't our primary objective. Our primary objective is just to get app developers using the latest capabilities. Like, for example, when run time permissions came in, in Marshmallow [Android 6.0], and there are still apps that targeted pre-N [Android 7.0], and that just shouldn't be happening at this point. So it was a loose end that we hadn't tidied up. We're trying to be very thoughtful about how we introduce it. So it's August for new apps, and then November for updates to existing apps that have to conform to it.

I think there is an interesting question, like, when you look forward in a couple years and then you think about 32-bit and 64-bit. You can definitely start trying to steer the app ecosystems so that it's in lockstep with how the silicon's evolving. So it will give us some levers for that, but that was not our primary goal.
You're totally right to bring it up, because it gives us that opportunity to remove old APIs. It's just the last thing we want to do is trash developers. It's really hard to build an app and be successful, and that's the challenge of running a business. The last thing we want to do is make it harder for people. On the other hand it's in everyone's interest to move forward in time, so we'll think it through, and, yeah, you're right, it'll give us some options, but again, not the primary result.

Ars: So there's Wi-Fi RTT support in Android P? Wasn't there support for this before in Android 5.0? There was an RTT manager, I think?

Burke: Yeah, so there was. So this is 802.11mc. So it was in the system for use by the system, but now we've added a public API that apps can use, can get access to that information.

Ars: Oh, so it was not public before, and it is now?

Burke: Yeah. It wasn't available for third-party apps, and so now it is. There's going to be a talk later here on one-meter location. I think it'll show a video of indoor navigation. It's really cool to see it. The video is literally, like, turn-by-turn navigation of a person walking in a building using just RTT. And it's really clever stuff.

One of their challenges with it is it seems like it's easy. Like it's like, oh, you know, it's two-sided RTT. So your phone sends a ping to the access point (AP), the AP timestamps it, and then sends it back to you. And you just know the speed of light and it'll figure out the distance. But the real challenge is, to figure out your location, you actually have to ping multiple access points, and, the thing is, they don't arrive at the same time. So you've got to—I think it's called "multilateration"—you've got to solve the constraint equation to figure out where you are. I know, it's really cool stuff. Very excited about it.

There really was a Google I/O talk about Wi-Fi RTT (Round Trip Time). GPS requires open sky and is good for about 5-meter accuracy, but Wi-Fi RTT proved that if you have several Wi-Fi points around, it can do better than that, with it eventually reaching 1-2 meter accuracy. The video of indoor navigation starts at about 5:30 into the talk. Android P is getting both an 802.11mc API for developers to play with directly, and eventually it will be included in Google's easy-to-use "fused location provider" API, which automatically uses a number of signals (GPS, Wi-Fi, cellular, gyroscope) to determine location.

While Android will support Wi-Fi RTT, you'll also need several Wi-Fi RTT-compatible routers or beacons (Google Wi-Fi will get 802.11mc at the end of the year), and you'll need to know where those Wi-Fi points are on a map. At the RTT Google I/O talk, Google said it plans to crowdsource the location of Wi-Fi routers in the future, to make RTT actually work. Google Maps already does crowdsourced router locations for its current, less accurate Wi-Fi location scheme. 

Ars: So there used to be a developer option in Android O that would switch the GPU renderer between "default" and "Skia." Now it's gone, and I think Android P is running Skia by default now...

Burke: Hmmm, where are we with that? I'm trying to remember what's in this release and what's not public yet.

Ars: Well what was the old system? And why are you switching to Skia? 

Burke: In the early days, Android had no hardware rendering of its standard widget set. It basically was on top of Skia, which is software rendering, and then in parallel we created something called the GL render, which is built in OpenGL. And there were two paths. In the system there was the hardware accelerator path that would sort of go outside of Skia and to the GL render. And then you had Skia for other pieces. But, long story short, some of the GL render was using Skia. It's a little bit weird.

Malchev: Some Skia started moving to GL over time, but not all of it.

Burke: I'm actually fine talking about this. So we're working on a GL backend to Skia. We're cleaning the architecture. So basically, the architecture we're moving towards will have Android's UI framework talk to Skia which will then talk to a hardware-accelerated backend to Skia. Rather than this weird world where something would attempt to talk to the GL renderer. So that's it, we're cleaning it up. I just can't remember how much is in P versus later. I don't think it's fully landed. I know it isn't fully landed. But that's where we're going. You'll be able to see that in the source code. It's not going to be a big secret.

Skia, by the way, is an open source graphics engine primarily developed by Google. After asking around a bit more at the show I got a few more details like Skia enabling things like faster shadow rendering, which is heavily used in Material Design. Hopefully I'll be able to nail down a more complete answer soon, but this was a good first step in the research process. 

Update: After this interview went up, Burke clarified that the Skia GL back end is actually completed in Android P.

Ars: Why is there a new fingerprint reader API in Android P? What was wrong with the old one? It seemed fine.

Burke: There's multiple reasons, but the main one is that we're seeing device makers like Vivo wanting to support under-glass fingerprint. And so you want to be able to have a standard UI, otherwise the apps will be all confused. If we could go back in time we would've created it that way, like having a standard system dialog, so it can adapt.
Ars: Oh, it makes the "press your finger here" UI?

Burke: Yeah. So that's the main reason. And then it's generalized into biometrics, as well, because more biometrics are coming online, so that's the main reason. So yeah, the "FingerprintManager" API is deprecated in favor of BiometricPrompt.

An app getting "confused" is a good point. An under-screen fingerprint reader would require a pop-up over top of the current app so that your touch input doesn't trigger something in the app. With a lockdown on floating windows for security purposes, this would be tough to do without Google's involvement and a standardized API.

Ars: So about these background activity changes. You're going to use AI to be more aggressive about turning off apps now? 

Burke: OK, so we're trying to save battery, and we have four buckets, the first of which is apps that you are going to use in the next three hours, and then the last one is apps you're not going to use until at least 24 hours from now, and then it's gradated between that. Then we're using AI to predict which apps go in which bucket depending on your usage patterns. Each bucket has different levels of access to the network and CPU. So, we're basically trying to make a system where only the apps that you're actually going to use next have higher access to battery, network, and CPU.

Ars: When you say "access" to the phone, you mean JobScheduler windows?

Burke: Yeah, it's basically JobScheduler windows. And so a good reason why the API update floor helps is that we want to move everyone to JobScheduler so that it all can streamline through that, rather than some of the older ways of doing things.

You know, that one's pretty interesting. We could train a model on everyone's usage and we'd have a pretty interesting model that would tell you how Twitter gets used. We didn't want to do that, so we built a pretty elaborate system that's used on-device. It has a generic usage model, and when you're using your phone, we're looking at how you use it. Let's say you're on your Twitter, we look at how you use Twitter, and we find the closest matching models. It's more like personalization than on-device training. And being on-device addresses any of the privacy issues around it.

JobScheduler is an API that started in Android 5.0 Lollipop as part of Project Volta. It's Android's traffic cop for background CPU activity and network access. Rather than letting every app access the network individually, JobScheduler batches up unimportant requests for background processing, allowing the device to sleep longer and save battery, then wake up periodically for a window of background activity from all your apps. Before, JobScheduler could also shut down apps you wouldn't use while on battery, but now, for Android P's "Adaptive Battery" feature, it seems like there will be a more nuanced version of this depending on app usage.

Coming soon (hopefully): Faster updates
Ars: Any final words before we pack up?

Burke: I personally spend a lot of time with device makers and silicon vendors, and they love Project Treble, it's like the biggest improvement for them. It was a hard project, because it touched so many parts in the system that we really went in, and it required everybody across my team, from media framework to locations, to UI toolkits, to do something. It was a big investment, and that's why, with O, we have less consumer-facing features because we actually just spent our engineering budget on doing Treble.

Then in P, we went back to more user, consumer-facing features. But, what's awesome with this release is we have lots of cool features, and we're now reaping the benefit of the investment in O by getting more devices out more quickly.

Ars: Oh, is there a story behind which devices with Treble got the Android P beta? Were there some manufacturers who were just like, "No, we don't want to do the beta?"

Burke: Yeah, not every device. Not every Treble device got a beta.

Malchev: What's more important is, we need to pilot this process of corporationalizing Treble. Well, I want to point this out—it's important to work with these big companies around the world so that we can proof the process out. Once we show that this works for whatever device they choose to do a beta for, then they expand this to more and more of their portfolio until it covers everything.
Ars: Yeah, and you can point to it and say, "Look, this actually works."

Malchev: Yes, yeah.

Ars: So we're going to get a ton of day-one updates when Android P ships, right?

Burke: Yeah, or Iliyan gets fired! (laughing) Well, yeah, I mean, that's the goal.

Malchev: (laughing) That's the goal, yeah. It's psychologically important for the next release to happen shortly after AOSP is published. And we're working on shortening that window as much as possible. 


No comments