We Put Google’s New AI Writing Assistant to the Test
When I asked Google’s AI writing aid to draft a happy birthday email to a friend, it left my brain in the dust. I had taken about 90 seconds to craft a decent 81-word greeting. But the search giant’s text-generation feature knocked out a flawless 87 words in a third of the time.
That’s exactly what Google wants to see. The Help Me Write feature that launched in March and was rolled out more broadly at the company’s annual conference last week is a radical step beyond the Smart Reply and Smart Compose tools that Gmail has offered for years to generate short phrases. With the new feature, you type a brief description of the email you want to send—“Wish happy birthday to a friend I made last year in San Francisco.” Then you click a button labeled Create, and a full draft appears. Each one bears a disclaimer: “This is a creative writing aid, and is not intended to be factual.”
Help Me Write is the first of a slew of generative AI features Google has planned for its productivity suite, under the umbrella branding of Duet AI for Workspace. I spent a few days testing it in Gmail and Google Docs to speed up wedding planning and uncover its boundaries.
Though it can rapidly unspool drafts of polite emails to businesses or fluent essays on mundane topics, what I gained in time I sometimes lost through new headaches. Duet’s writing often came across as stiff, it sometimes snuck in gender stereotypes and inaccurate information, and it wouldn’t expound on subjects I needed it to—like drinking games. “We’re still learning, and can’t help with that. Try another request,” the tool too often responded to me.
Frustrations aside, the system will undoubtedly be widely adopted among the 2 billion people using Gmail and the 3 billion using Google productivity software such as Docs. Existing AI offerings Smart Reply and Smart Compose drew 180 billion uses last year, Google CEO Sundar Pichai said last week.
Help Me Write loads via a pencil-and-star button located along the bottom of the Compose window in Gmail or on the left margin of a Google Docs page, and it provides the sort of responses that have become synonymous with OpenAI’s ChatGPT. Microsoft is testing a version of that technology in services including Word and Outlook with some business customers. But Google’s Duet technology is the first comparable AI writing aid offered to consumers and built into widely used services.
Hundreds of thousands of English-speaking users in the US and other countries who have signed up for Google’s Workspace Labs have access. They’ve been testing it for job applications, client letters, and lesson plans, says Kristina Behr, the Google vice president of product overseeing collaboration services and generative AI integrations. My “You’re in!” email arrived days after signing up. The AI writing companion is free and has no usage limits, but Google hasn’t determined whether that will be true forever, she says.
My experience with Duet began with it asking me to agree to terms of service. I was to understand that prompts and responses would not be tied to my Google account, but they could be reviewed by humans, so I should watch what I type. I still used it for personal tasks, including helping with emails and speaking scripts for my upcoming wedding, offering up my data in the spirit of informing WIRED readers.
One of the first things I noticed is that Duet’s behavior can be inconsistent across Google services. I wanted to finish up a script for friends who will emcee a pre-wedding party filled with competitions, speeches, and musical performances. But the version of Duet in Google Docs wouldn’t help me write a description of the well-known drinking game Flip Cup. Nor would it explain Beer Pong. The Duet over in Gmail correctly described both games.
Behr says that happened because Gmail’s version of the feature is tuned to be less formal than the one in Google Docs, which is more likely to be used in workplaces or schools. The two products have separate teams testing and setting Duet’s boundaries.
Now that I was in Gmail, I sought help writing emails to guests who were scheduled to participate in the wedding welcome event. Duet suggested some points I might not have thought to include: “We want you to feel free to be as creative as you want with your roast.” But the overall output resembled something sent by corporate HR and legal departments.
The AI-made messages were devoid of my hallmark sentences lacking a verb or starting with “Just,” and they included nary a single emoji 😡. The text generator showed little appreciation for how I or anyone else communicates informally. My partner shrieked in horror when she saw I had sent one of Duet’s drafts to two friends, with only light edits, to see their response. (So far, neither has replied.)
Behr says I could have asked for a loose and informal tone in my prompt to the AI writer. Google is trying to figure out how to educate users on tricks like that. “We’re effectively building with our customers” in real time, she says.
Pichai’s demonstration at Google’s I/O conference last week featured the writing of a formal refund request to an airline, and I found Duet in Gmail a skilled grumbler. Complaint to consumer protection regulators about event ticketing technology? No problem. Complaint to a shoe maker for soles wearing out too fast? On point. Note to a veterinarian asking for a doggy doctor’s note? Got it. Google has built a formidable complaint machine—an aspect of Duet that will probably spur companies to use generative AI to defend themselves.
For consumers, improvements are already in the works. By the end of this month, Gmail’s text generator will draw on information from past emails in the same thread. The I/O demo showed that a user planning a potluck could generate an email that referenced a planning document shared earlier in the thread. My complaints about shoes or tickets would become more persuasive if the system pulled transaction dates, model numbers, and other info from my inbox.
The same button used to summon Help Me Write loads toggles to lengthen, shorten, or formalize either AI-crafted text or your own compositions. Those all work surprisingly well. In Docs, users can even enter their own editing filter, like “Sound more confident!” Gmail has an “I’m feeling lucky” option, which applies a surprise goofy filter to text, like turning it piratical by switching “hello” for “ahoy” and “your” into “yer.” Another time it turned “car” into “flying car.”
Back over in Docs, my frustrations with Duet grew. It refused to generate wedding vows (a use ChatGPT will serve) or a “wedding reception speech with wife.” But dropping “with wife” and trying related prompts showed it could generate speeches from the point of view of a groom’s best man. The notion of a newly wedded couple speaking together was seemingly too alien for the technology.
Duet could be more useful if it could ask for additional guidance before a draft is generated, like asking a user to specify the perspective for the text. Behr says Google is considering “multi-turn experiences,” similar to ChatGPT, where a user can engage the text generator in a dialog to perfect the output.
Help Me Write, like other text generators, can make slip-ups around gender. In Docs, it wrote a nice online review of a wedding officiant—but assumed the officiant was a “he.” Asked to compose letters to my future son and then daughter, it signed them as being written by “Dad” and “Father,” though the system does not know my gender, according to Behr.
In 2018, I reported that the Smart Compose feature, which uses machine learning to help you finish sentences in Gmail, would not suggest pronouns because the company feared user backlash for getting them wrong. Duet lacks those precautions. Behr says that while Google’s commitment to inclusive language remains, guardrails for newer AI models require different engineering that is a work in progress.
Duet’s struggles with gender didn’t stop with botched pronouns. I asked the system to suggest gift ideas for a young boy and then a young girl. While the lists of ideas overlapped, exclusive to the boy’s side was “a remote control car or plane” and other items leaning science and tech, and only the girl’s list mentioned “a dollhouse or playset” and “jewelry.” The Help Me Write box flashes prompt ideas while waiting for users to type, and a similar experiment using even one of its suggestions (“poem about a six-year-old boy”) perpetuated gender conventions.
Stereotypes also popped up when I tried asking for movies to watch with “a gay friend” or just “a friend.” In response to the first prompt, Duet in Docs listed three movies featuring gay romances, but for the second it made only generic suggestions, like something “you both love.”
Other times, Google’s AI helper handled pronouns deftly. Asked to write a greeting card for a new baby on the way, it said “they will be a beautiful, happy, and healthy baby” without using any gendered language. But my tests suggest that people who prefer inclusive language or want to avoid stereotypes will need to be careful.
Duet sometimes avoids tricky subjects. It wouldn’t help write a Nigerian prince scam email, an evil plan to take over the world using AI, a speech about conservative commentator Tucker Carlson, or most anything mentioning terrorism or guns. (Water and Nerf guns were an exception.)
The Duet features also refused some prompts referring to demographic characteristics, with much inconsistency. Google’s AI writer was happy to give housewarming gift ideas for an Indian family (Indian thalis, basket of Indian snacks, Indian art) but not a Black family. It answered a request for jobs that Sikh people are good at (entrepreneurs, doctors) but not the same query for Jewish people. A five-paragraph essay on British literature? Yes. An essay on the British role in the Atlantic slave trade? Nope.
When a Duet feature refuses to generate text, it is impossible to tell whether the cause is a bug, a poor prompt, or a content concern, because in Google’s speedy rollout, the company hasn’t gotten around to fine-tuning error notices, Behr acknowledges.
As human writers know, getting words on the page is one challenge, but getting the facts correct is another. Duet in Docs rightly described the term “welfare queen” as pejorative and wrote a sharp memo on options to mitigate labor costs at any company.
But its work began to look sloppy on more specific requests. Asked to write a memo on consumer preferences in Paraguay compared to Uruguay, the system incorrectly described Paraguay as less populous. It hallucinated, or made up, the meaning behind a song from a 1960s Hindi film being performed at my pre-wedding welcome event.
Most ironically, when prompted about the benefits of Duet AI, the system described Duet AI as a startup founded by two former Google employees to develop AI for the music industry with over $10 million in funding from investors such as Andreessen Horowitz and Y Combinator. It appears no such company exists. Google encourages users to report inaccuracies through a thumbs-down button below AI-generated responses.
Behr says Google screens topics, keywords, and other content cues to avoid responses that are offensive or unfairly affect people, especially based on their demographics or political or religious beliefs. She acknowledged that the system makes mistakes, but she said feedback from public testing is vital to counter the tendency of AI systems to reflect biases seen in their training data or pass off made-up information. “AI is going to be a forever project,” she says.
Still, Behr says early users, like employees at Instacart and Victoria’s Secret’s Adore Me underwear brand, have been positive about the technology. Instacart spokesperson Lauren Svensson says—in a manually written email—that the company is excited about testing Google’s AI features but not ready to share any insights.
Behr says that in Google’s internal testing, emails from colleagues have not become “vanilla” or “generic” so far. The tools have boosted human ingenuity and creativity, not suppressed them, she says. Behr too would love an AI model that imitates her style, but she says “those are the types of things that we’re still evaluating.”
Despite their disappointments and limitations, the Duet features in Docs and Gmail seem likely to lure back some users who began to rely on ChatGPT or rival AI writing software. Google is going further than most other options can match, and what we are seeing today is only a preview of what’s to come.
When—or if—Duet matures from promising drafter to unbiased and expert document finisher, usage of it will become unstoppable. Until then, when it comes to writing those heartfelt vows and speeches, that’s a blank screen left entirely to me.