Home
News
You are here

Google Gemini new functions and features

By Sebastian Pier

Updated: May 14, 2024, 2:12 PM

0comments

Apps Google

Google Gemini new functions and features

Google's I/O conference started – after a not-so-brief brief Marc Rebillet techno fiesta – with a focus on AI and, in particular, Google's AI: Gemini.

If 2023 was the year that catapulted 'AI' out there, 2024 is going to be the year to put (Google's) AI in everyone's hand, home and head.

Google CEO Sundar Pichai highlighted that today, all of Google's 2 billion user products use Gemini. This is just the start of it, as Pichai said:

We’re still in the beginning of our Gemini era.

Okay, let's check it out!

AI Overviews coming now

Google kicked off the I/O 2024 event with a major announcement: the rollout of its Search Generative Experience (SGE) labs feature to US users, scheduled within the week.

AI Overviews will automatically answer specific searches in the US, offering concise explanations at the top of search results pages before the traditional list of links. Over the next few days, hundreds of millions of users in the US will experience AI overviews, with plans to expand to over a billion users worldwide by the end of the year.

Soon, you’ll be able to adjust your AI Overview with options to simplify the language or break it down in more detail. This can be particularly useful if you’re new to a topic, or if you’re trying to simplify something to satisfy your kid’s curiosity.

AI Overviews will help with increasingly complex questions. Rather than breaking your question into multiple searches, you can ask your most complex questions, with all the nuances and caveats you have in mind, all in one go.

For example, maybe you’re looking for a new yoga or pilates studio, and you want one that’s popular with locals, conveniently located for your commute, and also offers a discount for new members. Soon, with just one search, you’ll be able to ask something like "find the best yoga or pilates studios in Boston and show me details on their intro offers, and walking time from Beacon Hill."

Beyond finding the right answer or information for a complex question, Search will also be able to plan with you.

With planning capabilities directly in Search, you can get help create plans for whatever you need, starting with meals and vacations. Search for something like "create a 3-day meal plan for a group that’s easy to prepare," and you’ll get a starting point with a wide range of recipes from across the web.

With advancements in video understanding, you can now search using videos. For instance, if you bought a record player at a thrift shop and the needle arm is drifting unexpectedly, you can simply search with a video of the issue. This saves you from trying to describe the problem in words and provides an AI Overview with troubleshooting steps and resources.

Video search will soon be available for Search Labs users in the U.S. in English, with plans to expand to more regions over time.

Talk to your gallery with Ask Photos

Gemini is making its way further into the Photos app, which soon will be able to complete tasks you tell it to do.

In the upcoming months, Google Photos will introduce context-aware voice and text prompts to help users search for specific images or details within images. The Ask Photos feature goes beyond conventional image searches by utilizing Gemini to recognize image content. For example, it can detect a car license plate and prompt users to inquire about a specific plate number on a particular car model, providing accurate identification.

The rollout of Ask Photos is expected to begin in the coming months, with a tentative release timeframe set for summer.

"Double the tokens, please!"

Pichai also revealed that Gemini 1.5 Pro, the newest iteration of its AI model, will now be accessible to all users through the Gemini Advanced app. The public version comes with a context window of 1 million tokens. Additionally, Google has upgraded Gemini 1.5 Pro to handle 2 million tokens, but this feature will be limited to developers in a private preview.

In AI, a token is like a building block or a piece of a puzzle. It's a small unit of information that represents something meaningful, like a word or a part of a sentence. Tokens help AI understand and process language by breaking it down into manageable pieces, making it easier for computers to analyze and generate text.

AI will scan your inbox with Gemini Pro in Workspace Labs

Gemini in Gmail is set to revolutionize email management by offering a comprehensive search feature that summarizes your entire email history in a convenient sidebar.

Starting today, Gemini in the side panel of Gmail, Docs, Drive, Slides and Sheets will use Gemini 1.5 Pro. With a longer context window and more advanced reasoning, Gemini can answer a wider variety of questions and provide more insightful responses. Plus, it's easy to get started with summaries that will appear in the side panel, suggested prompts and more.

This solution addresses the common issue of sifting through numerous emails to find relevant information. With Gemini, users can simply request a summary of emails from a specific contact, receiving a concise bullet-point list of key details and quick access to the original emails. In a one-minute demo, Google showcased how users can swiftly respond to emails directly from the Gemini sidebar, streamlining the communication process.

For the Gmail mobile app, there are three useful AI upgrades:

Summarize emails: With this feature, Gemini can analyze email threads and provide a summarized view directly in the Gmail app. Simply tap the summarize button at the top of your email thread to get the highlights. This will be available to Workspace Labs users this month, and to all Gemini for Workspace customers and Google One AI Premium subscribers next month.
Contextual Smart Reply: Soon, Gemini in Gmail will offer even more detailed and nuanced suggested replies based on context from your email thread. With Contextual Smart Reply, you can edit or simply send as-is. This will be available to Workspace Labs users on mobile and web starting in July.
Gmail Q&A: Soon when you click the new Gemini icon in the mobile app, Gemini in Gmail will offer helpful options, like “summarize this email,” “list the next steps” or “suggest a reply.” And similar to the side panel on desktop, you can use the open prompt box when you have more specific requests. For instance, you could ask Gemini to “find the bid from the roofing contractor” that’s buried somewhere in your inbox. Gmail Q&A will be available to Workspace Labs users on mobile and web starting in July.

Audio Overviews

Google is improving NotebookLM, its AI tool for understanding documents, by adding "audio overviews" that create a podcast-style conversation between two speakers.

This upgrade is great for people who prefer learning by listening rather than reading. In a demo, NotebookLM was given some physics lessons to work with. It then made a conversation between two speakers, explaining how basketball relates to the physics topic, like force, when asked by Google's Josh Woodward.

Gemini 1.5 Flash

Google is introducing a new model called Gemini 1.5 Flash, designed to be fast and efficient.

Gemini 1.5 Flash is "great at summarizing, chatting, captioning images and videos, extracting data from long documents and tables, and more," wrote Demis Hassabis, CEO of Google DeepMind, in a blog post. Hassabis explained that Google made Gemini 1.5 Flash because developers wanted a model that was lighter and cheaper than the Pro version announced in February.

Gemini 1.5 Flash is in between Gemini 1.5 Pro and Gemini 1.5 Nano, Google's smallest model that runs directly on devices. Even though it's lighter than Gemini Pro, it's still powerful.

Imagen 3 is here to blow you away

Also, Google announced two new AI tools for media creation: Veo, which can create high-quality 1080p videos, and Imagen 3, the latest version of its text-to-image framework.

Google says Veo understands natural language and visual concepts to generate the video you want. These AI-generated videos can be over a minute long and include advanced cinematic techniques like timelapses.

Imagen 3 is described as Google's highest-quality text-to-image model, producing highly detailed and photorealistic images with fewer errors. Google claims Imagen 3 is better at understanding and managing detailed prompts and handles text more effectively than previous versions.

Enter Trillium

Next, Google introduced the 6th generation of Google Cloud TPUs called Trillium. These new AI-specific hardware units support Google's latest AI models like Gemini 1.5 Flash, Imagen 3, and Gemma 2.0.

Trillium offers a 4.7 times increase in performance per chip compared to the previous TPU v5e, with double the memory and bandwidth. It includes a third-generation SparseCore accelerator for processing large data sets in ranking and recommendation tasks.

Google claims Trillium can train AI models faster with lower latency and cost, and it's their most energy-efficient TPU yet, using 67% less energy than the previous version.

Full Multimodal Capabilities Coming to Gemini Nano

Android is set to become the first mobile operating system to feature a built-in, on-device foundation model with the introduction of Gemini Nano. This innovation aims to deliver fast and secure experiences while keeping user information private. Starting with Pixel devices later this year, the latest model, Gemini Nano with multimodality, will be launched. This upgrade will enable phones to process not only text input but also understand contextual information such as sights, sounds, and spoken language.

Later this year, Gemini Nano’s multimodal capabilities will be integrated into TalkBack, providing richer and clearer descriptions for people with blindness or low vision. TalkBack users encounter an average of 90 unlabeled images daily. This update will help by offering more details about photos from family or friends and descriptions of clothing styles and cuts when shopping online. Since Gemini Nano operates on-device, these descriptions are provided quickly and work even without a network connection.

A new feature is being tested using Gemini Nano to provide real-time alerts during phone calls if it detects patterns commonly associated with scams. For instance, you would receive an alert if a “bank representative” urgently asks you to transfer funds, pay with a gift card, or requests personal information like PINs or passwords—requests banks typically do not make. This protection happens entirely on-device, ensuring your conversation remains private. More details about this opt-in feature will be shared later this year.

Let Gemini Advanced plan your vacation

Planning trips can be time-consuming, so this is where Gemini Advanced will soon kick in and help you.

Picture this scenario: You tell Gemini you're heading to Miami for Labor Day with your family. Your son loves art, and your husband craves fresh seafood. Can Gemini pull flight and hotel details from your Gmail and assist in planning the weekend?

Gemini does more than just provide generic suggestions. It considers your flight schedule, dining preferences, and local attractions. By accessing your Gmail for flight information, tapping Google Maps for nearby restaurant and museum suggestions, and utilizing Search for additional activities, Gemini creates a personalized itinerary. Whether it's a walking tour of the Design District or beach time, Gemini ensures your day is filled with activities that match your interests. Plus, the itinerary updates automatically if you make changes or add more details.

This dynamic planning experience will be available on Gemini Advanced in the coming months.

Personalized Gems and Live for Gemini Advanced

Gemini Advanced subscribers will soon have the option to create Gems for an even more personalized experience. Gems are customized versions of Gemini tailored to your preferences. Whether you need a gym buddy, sous chef, coding partner, or creative writing guide, Gems can be designed to suit your needs.

Creating a Gem is straightforward. You simply describe what you want your Gem to do and how you want it to respond. For example, you could request a running coach to provide daily plans with a positive and motivating attitude. Gemini will then take your instructions and, with a single click, create a Gem that fulfills your specific requirements.

Also, Google is introducing new ways to interact with Gemini more naturally, whether you're texting or speaking. With Gemini in Google Messages, you can now chat with it within the same app you use to message your friends.

In the upcoming months, the tech giant will be launching Live for Gemini Advanced subscribers, offering a new mobile conversational experience. This feature utilizes cutting-edge speech technology to make conversing with Gemini more intuitive. With Gemini Live, you can engage in a conversation with Gemini and choose from various natural-sounding voices for its responses. You can also speak at your own pace or interrupt with clarifying questions, mimicking a real conversation.

For instance, if you're preparing for a job interview, you can go Live and ask Gemini to assist you. It can help you rehearse and even suggest skills to emphasize during the interview. Later this year, you'll also be able to use your camera during Live sessions, enabling discussions about your surroundings.

Circle to Search and your (son's) homework

Since its debut at Samsung Unpacked in January 2024, Circle to Search has been enhanced with new features such as full-screen translation, and its availability has been extended to more Pixel and Samsung devices.

As of today, Circle to Search can assist students with their homework, providing them with a deeper understanding rather than just delivering answers, directly from their phones and tablets. When students encounter a problem they're stuck on, circling the prompt prompts Circle to Search to offer step-by-step instructions for solving a range of physics and math word problems, all without leaving their digital materials. Later this year, Circle to Search will expand its capabilities to solve even more complex problems involving symbolic formulas, diagrams, graphs, and beyond.

Currently available on over 100 million devices, Circle to Search aims to double its reach by the end of the year, with plans to extend the experience to more devices.

SynthID for text and video

"As the outputs from our models become more realistic, we must also consider how they could be misused", Google top officials say. Last year, Google introduced SynthID, a technology that adds imperceptible watermarks to AI-generated images and audio so they’re easier to identify, and to protect against misuse. Today, SynthID is expanding to two new modalities: text and video.

View Full Bio

Sebastian, a veteran of a tech writer with over 15 years of experience in media and marketing, blends his lifelong fascination with writing and technology to provide valuable insights into the realm of mobile devices. Embracing the evolution from PCs to smartphones, he harbors a special appreciation for the Google Pixel line due to their superior camera capabilities. Known for his engaging storytelling style, sprinkled with rich literary and film references, Sebastian critically explores the impact of technology on society, while also perpetually seeking out the next great tech deal, making him a distinct and relatable voice in the tech world.

Read the latest from Sebastian Pier