Skip to main content

Amazon Releases 'Create with Alexa', New Generative AI-Driven Storytime Feature For Kids

Create with Alexa uses machine learning models to generate unique childrens' stories with visuals and audio.
Created on November 29|Last edited on November 30
AI models like DALL·E 2 and GPT-3 have already been used to help kids, along with their parents, write, illustrate, and self-publish original picture books.
Using the tools directly, however is difficult for kids to do alone, especially young children. Generative AI becomes more accessible day by day, so it's about time kids got one of their own to play with.
Today, Amazon has released a new feature for Alexa devices called Create with Alexa, which lets kids guide an AI model through creating original and unique stories. Though not as free-form as broader generative models like DALL·E 2 and GPT-3, Create with Alexa should be able to keep kids' little brains active and occupied for a good while.

Using Create with Alexa

The process begins with choosing a theme, mood, and characters from a selection shown on the screen. From there, the underlying AI model generates a completely unique short story for kids to enjoy, complete with visuals and sounds. Even if the same prompts are selected each time, a new story will always emerge.
Create with Alexa was made specifically for kids (and by extension, the adults when they want a break from being the bedtime story-creating machine); It's creation was informed by what kids would want to see in such a tool - such as including dinosaurs, as suggested by Eshan Bhatnagar (head of product for Alexa AI)'s son.
As of right now, Create with Alexa is available only in the United States and only support English. It's also only available on Alexa devices which have screen on them, so there's no audio-only option.


Create with Alexa under the hood

Create with Alexa uses many individual AI models working together through a full pipeline to produce complete experiences.
First up is the planner model which takes user input (theme, character, mood, etc) and generates a long list of keywords which will guide generation down the line. The story generator model creates several stories based on these keywords - the most coherent of which is ultimately chosen as the story to present. Carefully constructed, Human-written stories were the backbone for training these models.
The raw story text is fed through two additional NLP models to clean up the story and simultaneously prepare it better for models down the line to interpret. The first one replaces potentially ambiguous words, like pronouns, with the entities they refer to, while the second constructs a relationship graph for all entities through a story.
The cleaned and organized data is sent to the scene generation pipeline which first selects the best background image to use (from a selection of pre-made assets), and then decides which objects to place in the scene and how they should be placed (again, pre-made assets).
Music generation is a little more complicated here: While the basis for music generation is still working from pre-made assets, it generates it's own melodies and, with the help of a two text-processing models, scores the music to match the text as it's being read out, such as when the music should swell up or a sound effect should trigger.

Find out more

You can read about the release of Create with Alexa on the release blog post, and if you want more information on how Create with Alexa works, take a look at the in-depth blog post from it's initial unveiling in September.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.