Home
News
You are here

TalkBack can read images even if your phone is offline – thanks to the on-device Gemini Nano

0comments

By Sebastian Pier

Published: Sep 04, 2024, 6:57 AM

Android Apps

A screenshot from the Android Developers Blog for the TalkBack functionality.

TalkBack, the indispensable Android feature for people who have blindness or low vision, gets a lot more useful – and powerful – thanks to the Gemini Nano with multimodality model.

There's an extensive blog piece on the Android Developers Blog, where the team opens up about the latest enhancement of the screen reader feature from the Android Accessibility Suite.

Today, thanks to Gemini Nano with multimodality, TalkBack automatically provides users with blindness or low vision more vivid and detailed image descriptions to better understand the images on their screen.

– Android Developers Blog, September 2024

TalkBack includes a feature that provides image descriptions when developers haven’t added descriptive alt text. Previously, this feature relied on a small machine learning model called Garcon, which generated brief and generic responses, often lacking specific details like landmarks or products.

The introduction of Gemini Nano with multimodal capabilities presented an ideal opportunity to enhance TalkBack’s accessibility features. Now, when users opt in on eligible devices, TalkBack leverages Gemini Nano’s advanced multimodal technology to automatically deliver clear and detailed image descriptions in apps like Google Photos and Chrome, even when the device is offline or experiencing an unstable network connection.

Google's team provides an example that illustrates how Gemini Nano improves image descriptions. First, Garcon is presented with a panorama of the Sydney, Australia shoreline at night – and it might read: "Full moon over the ocean". Gemini Nano with multimodality, however, can paint a richer picture, with a description like: "A panoramic view of Sydney Opera House and the Sydney Harbour Bridge from the north shore of Sydney, New South Wales, Australia". Sounds far better, right?

Utilizing an on-device model like Gemini Nano was the only practical solution for TalkBack to automatically generate detailed image descriptions, even when the device is offline.

The average TalkBack user comes across 90 unlabeled images per day, and those images weren't as accessible before this new feature. The feature has gained positive user feedback, with early testers writing that the new image descriptions are a “game changer” and that it’s “wonderful” to have detailed image descriptions built into TalkBack

– Lisie Lillianfeld, product manager at Google

When implementing Gemini Nano with multimodality, the Android accessibility team had to choose between inference verbosity and speed, a decision partly influenced by image resolution. Gemini Nano currently supports images at either 512 pixels or 768 pixels.

While the 512-pixel resolution generates the first token almost two seconds faster than the 768-pixel option, the resulting descriptions are less detailed. The team ultimately prioritized providing longer, more detailed descriptions, even at the cost of increased latency. To reduce the impact of this delay on the user experience, the tokens are streamed directly to the text-to-speech system, allowing users to begin hearing the response before the entire text is generated.

While I'm not yet boarding the AI hype train fully, AI-powered features like this are stunning – just think about the potential! And then, there are stories like this one that makes you want to tone down this "wonderful" progress of ours:

Leaked pitch deck proves that your phone is listening to what you say

View Full Bio

Sebastian, a veteran of a tech writer with over 15 years of experience in media and marketing, blends his lifelong fascination with writing and technology to provide valuable insights into the realm of mobile devices. Embracing the evolution from PCs to smartphones, he harbors a special appreciation for the Google Pixel line due to their superior camera capabilities. Known for his engaging storytelling style, sprinkled with rich literary and film references, Sebastian critically explores the impact of technology on society, while also perpetually seeking out the next great tech deal, making him a distinct and relatable voice in the tech world.

Read the latest from Sebastian Pier