Team: Pranav Nair, Su Fang

Tools: Procreate, Illustrator, Adobe XD, iMovie, Alexa Skill, Storyline App

Research Methods: Video editing, Prototyping, Voice User Interface design, Interaction design, Interface testing

Contributions: User research, Ideation, Interaction design, Interface testing, Video production and editing

Homework 3: Auditory User Interface


At its essence, Storyline exists to increase the immersiveness of a child’s bedtime story, as told improvisationally by their parent, by providing ambient sound effects (e.g. rainfall, swords swinging, wolves howling etc.) based words or utterances spoken by the parent. Additionally, it uses natural language processing (NLP) algorithms understand the tone and context of the story being told, and presents sound effects accordingly.  

Storyline also reduces cognitive load from the parent’s perspective by helping them remember the story that they told the night before, and presenting this as a reminder in the ‘previously on……’ section. It also plays a lullaby at the end of the story to help the child go to sleep.




We felt that a voice user interface (VUI) would be an interesting match for this problem space of improvisational storytelling since the traditional means of storytelling requires individuals to solely use their voice. In contrast to many traditional VUIs, which primarily support system-to-user interaction, the purpose of our design is to further augment this existing human-to-human interaction of storytelling. For this particular assignment, we narrowed our scope to improvisational storytelling by parents, when putting their children to bed at night.



Our primary user would be a parent working with Amazon’s Alexa (via an Echo device) and our app ‘Storyline’ to come up with an engaging and immersive story to tell their child before they put them to bed.


The secondary user of our interface would be the child/children themselves who engage with Storyline to get hyped up about the story they’re about to hear during the priming phases of the experience (which we describe below as the trailer.)



In this particular design, we’ve defined auditory UI as an interface where users can provide voice-based input and receive speech/non-speech output.


Based upon our initial ideation, we determined a few basic tasks our system would need to accomplish to make our experience engaging for both our primary and secondary users. These have been outlined below:


  • Play ambient sounds based on word triggers provided by the user. The word triggers would be provided to the user in the form of cards (an example shown below). Each card would have a set of words which subscribe to a specific theme, these cards would primarily be used for maintaining consistency and tone throughout the story.


  • Prime users by triggering trivia interactions to increase interest. Some early feedback we received from our colleagues led us to introducing this concept of ‘priming’ our user for the story that’s about to take place. This feature was introduced to generate hype/excitement within the child for the story.


  • Remember and remind users of previous stories. While we were charting out our ideal user experience, another feature we realised we wanted to integrate was for the system to be able to remember and remind users of stories previously told to engage some form of continuity between interactions.

  • Understand user mood. Given the peripheral nature of interactions with our system, we wanted our system to be smart enough to understand whether the users were in the correct mood to be engaged with in the first place. (e.g. “Hey Scott, how was your day?”.... System would wait for a “Good”, “Not Bad” or “Okay” before proceeding with trivia/further engagement.)




Based on these tasks, we then determined the requirements that a system must fulfill:

  • Play audio. This functionality is necessary for an system-user interaction in an auditory user interface. Moreover, the purpose of audio is to further build on the existing aural modality of storytelling and make the experience more immersive.

  • Consistency. The system must play consistent sound effects, in the relevant contexts, to ensure that the story told is cohesive and most closely resembles traditional, human storytelling.

  • Remembers previous session. Similar to a human telling a story, the system must be able able to remember previous storytelling sessions, in order to allow for subsequent stories to build on previous ones.

  • Differentiate between users. The system must be able to tell who the primary and secondary users are in order to play the correct cues. Additionally, since there may be more than one parent or caregiver who is telling a story to the child, the system must be able to know who the storyteller is and which audio cues are associated with which user.

  • Understand context of words. Given the contextual nature of language, the system must be able to understand linguistic nuances, such as qualifying phrases which might have implications for interpreting meaning and determining user intent (e.g. “there was no snow on the ground”).

Being able to recover gracefully from errors. While there are bound to be some errors or delays in interpreting user intent, the system must allow for the user to recover from such instances quickly. Additionally, the system should provide a way for the user to also recover from a mistake that they made (e.g. forgetting what took place during the previous storytelling session).



Based on the features and requirements we listed above, we ideated on what the ideal user experience of Storyline would look like, which we have presented in a storyboard format below. Each frame is presented with a narrative walkthrough to provide some additional context.

Screen Shot 2019-03-27 at 10.51.07



Using  our storyboard as reference,  we generated a feature list that would need to exist for Storyline to become a full-realised product in the market. The list was created with the idea that our users would be interacting implicitly with Storyline throughout the day, not just during the bedtime routine. This would enable Storyline to provide a more comprehensive user experience.

  • Trailers. Once parents have decided what theme they are going to use for their story, Storyline proceeds to generate hype for the child by playing ambient sounds related to the theme when the child returns home, as well as presenting the child with random bits of trivia to get the child excited for what’s to come next.

  • Adapt to user’s mood. Storyline uses questions to initially probe the child to gauge their mood, and decides how to proceed accordingly.

  • Remember past storylines. While brainstorming our concept, we realised our users may be interested in some form of continuity with their stories (i.e., the ability for the parent to remember and refer to past stories). Given this, we decided that it should be the system’s responsibility— not the user’s— to keep a record of previous stories and their associated sound effects.  

  • Cards for consistency. One of the initial challenges we faced with our interface was the ability for it to recognize tone/context within a given story. For example, it might be more appropriate for a story with a darker tone to have more ominous sound effects.

  • Ambient music for sleep assist. At the end of each story, Storyline would transition from the last sound effect played to a lullaby that would assist the child in falling asleep and provide a gentle transition out of the story.


Considering how our design works with existing technologies, namely Amazon Alexa, there are a few system constraints that we wanted to acknowledge:

  • Location within home for interaction. Amazon Alexa is currently only accessible through Amazon Echo speakers, and as a result of the physical locality constraint of these speakers, the system, as well as all of the interaction with the system, would need to be located in multiple locations (such as the kitchen and the child’s bedroom) in order for the experience that we designed to take place. It could be that there is a small Echo Dot in the child’s bedroom, part of a bigger network of Echo devices, but in order to support the concept of immersion, these speakers must be available at multiple points within the home.

  • Limited to word sets provided by each theme. To reduce the number of interruptions that occur during the storytelling experience due to errors, we introduced the idea of ‘cards’ which would contain a fixed set of words our users could pull from to tell their story. This may in practice limit the scope of a story to a specific setting or theme.

  • Requires user to monitor system functionality at all stages. Because the system’s primary means of providing feedback to the user is by playing sound effects or by simply stating that it did not understand the user’s command, the user must take on the role of monitoring the system’s awareness of their inputs.

  • System runs for a long duration.  Because the experience that we have designed takes place through an extended period of time, which may range from morning until the bedtime story is actually told, the user would need to run Storyline concurrently with other apps that they use throughout the day. For example, during the trailer phase of the experience, the parent may also want to use Alexa to help provide instructions while they’re cooking, and this cannot take place at the same time as the trailer.


Story flow.png

In the figure above, we present the interaction flow for Storyline. The diagram takes you through every stage during the day and how Storyline interfaces with Alexa and highlights the inputs provided by the parent and child, as well as the corresponding output provided by Alexa. The input words that trigger a reaction from Alexa have been highlighted in bold.



Before designing our prototype, we decided to familiarize ourselves with Amazon’s VUI (Voice User Interface) developer guidelines and best practices. This assisted us in understanding the design frameworks currently in place to help Echo Devices receive input/output. We catered our design and interactions to fit within these frameworks so that we could implement a working prototype.


In the sections below, we present an example for each stage of an utterance spoken by the user (U), and the corresponding output that would be triggered by Alexa (A). The quotes in black are the ones that are incorporated in our prototype, and the ones in gray are utterances that we imagine would also trigger an analogous response in a higher-fidelity prototype.


Stage I: Trailer Setup


  • U: “Let’s figure out what story we wanna tell Scott tonight”

    • A: “ Sounds good, what were you thinking?”

  • U: “I have an idea for what I wanna tell Scott…..”

    • A: “Sure, let’s hear it.”

  • U: “I dont really know what story to tell Scott, help figure it out”...

    • “Happy to help! Lets try and set a theme first….and then take it from there… you have rainforests, ice mountains, and volcanoes….”

** System now knows the them/context of the story being told. Prepares sound effects and trivia for building excitement**


Stage II: Trailer

  • A: “Hey Scott, how was your day?”

  • ****trigger trivia mode*****

  • U: Good/ Bad/ Okay

    • A: (if Good): “Nice! Did you know that….”

    • B: (if Bad): “Oh no! Hope I can make you feel better….”

    • C: (if OKay):”

** System detects mood of the child through a simple interaction. Calibrates its response accordingly.**


Stage III: Story


Below, we have provided an example story that a parent might tell using Storyline. This example is presented in a Mad-Libs style, where the words inside of the brackets serve as the trigger words for the system, to tell it which sound effect to play, but could be replaced with different words if a different story were to be told. The ellipses (...) denote areas where a story might be further fleshed out by the parent. For the sake of demonstrating our design’s functionality, our prototype has been built with the words currently shown within the brackets, but we imagine a fully developed system working with a much wider range of cues.


To interact with our prototype, navigate to the Alexa app store. You can read a snippet of the story and try using some of the cues.

“Once upon a time, there was a hiker in the [forest]. He had been out hunting reindeer for several days, and now he was finally on his journey home. Along the way, he heard a noise, which sounded like [an animal crying] for help. It was a dark night, the hunter could not see anything in his vicinity… he reaches in his bag for matches to [light his torch] to get a better view.


The hiker hears sound in the distance… it sounds like [a swing, creaking in the wind]. “But how could that be?” the hunter thought. It’s not even a windy day, not even the trees are rustling.


Then the hiker hears a noise: it sounds like [keys jingling]... Then he sees scratch marks… and paw prints… And all this reminds him of his dog named Buddy, who he once had months ago, before he disappeared into the forest. The hiker starts feeling hopeful, and he’s thinking, “It seems impossible… but maybe I’ll see Buddy again one day.” And he follows the nearby trail that leads him closer to [the sound]. As he moves deeper into the woods. He notices multiple claw marks and starts to worry. “Was buddy taken by a pack of  [ wolves]?” He very carefully and silently hangs his reindeer bag on a tree, and suddenly hears [branches and leaves rustling behind him].


He [draws his bow] and proceeds with caution. As he moves closer he starts to hear sounds, not howls, but more like [low, playful growls]…. He must be getting close… His [heart starts beating faster] and his [breath starts to get shallower]... He isolates the nearest source of the [low growling sounds] and prepares himself to attack if he needs to… He [draws his bow taut]… he takes aim…when suddenly… he sees tiny puppy, who lets out a sharp [yap!]


A tiny puppy pokes his head out from the bushes, [snarling] at the hiker…….and then Buddy appears behind him!


… As it turns out the hiker realized that Buddy found himself a family of his own and was happily settled with them in the forest. The hunter gifts them some of the reindeer that he had hunted, said his farewells, and continued on his way home, [whistling] the same tune that he used to hum when he and Buddy used to walk home together.  The end.

Stage IV: End

U: “ …. And that’s all for tonight Scott. Good night! Sweet Dreams”

** System transitions from last sound effect to lullaby***



To prototype our platform, we explored a plethora of Alexa Skill integrations.  Given the timeline and scope of this project, we decided to avoid programming and instead create a minimum viable product (MVP). However, to create our MVP we still had certain requirements that our prototyping platform would need to meet:

  • Ability to play audio files for the background ambient sounds

  • Ability to detect overlapping audio input: as the story progress, the system would need to be able to adapt and update sound effects real-time

  • Ability to ‘test’ and ‘share’ a link to the prototype

  • Be able to simulate ‘timed delays’
    Through browsing VUI prototyping blogs, we stumbled upon Invocable: a prototyping platform that
    lets you generate interactions using simple I/O flow diagrams that can be exported and tested using Alexa. It matched all the criteria we had outlined above for minimum requirements as it could play audio files, export directly to Alexa Skill, and detect overlapping audio. Since we had our design created within Alexa Skill frameworks, it did not take us very long to generate our first set of interactions. We followed an iterative design process, constantly testing our prototype by exporting it to Alexa Skill to make sure the interactions were correct and made sense when engaged in real-time.

For our video, we wanted to try to simulate what our interface might look like when running on a real Echo device. As neither of us owned an Echo, we used for emulating the experience. A browser-based interface to Alexa that allows developers who are working with the Alexa Skills Kit (ASK) to test skills in development.  



While we were able to build a functioning prototype for Storyline, the prototype presented in our video does not reflect the actual user interface as we were constrained by the platforms we used to realize our idea. Some of these hurdles have been highlighted below:

  • Linearity of flow. Although our prototype is interactive, it is still at an early stage of development and hence fairly linear. Users currently cannot jump in and out of the ‘stages’ presented earlier in this document. They must complete each stage sequentially before moving to the next ‘stage.’

  • Interactivity. The prototype is currently limited to a specific set of words that we have provided under the dialogues and utterances section. At a higher fidelity, this prototype would accommodate a broader set of words and phrases that could trigger the same interaction.

  • User recognition. Currently the prototype is unable to correctly recognize ‘who is speaking’ which is an important aspect during some stages of interaction as the user triggering Storyline changes. (e.g. “Hey Scott, how was your day?” Is answered by the child, not the parent.)

  • Can’t actually share a link. Unfortunately, and Alexa Skills Kit do not allow us to share a link to our prototype. Hence, we are forced to publish the app on the Alexa Store and there is no real way for us to test its functionality as neither of us possess an Echo device.




Below we present a snapshot flow diagram we created using Invocable to design our interactive prototype:

Screen Shot 2019-03-27 at 11.36.16

Link to video/Prototype Demo




One of the major challenges that we faced was accounting for the nuances of language, which we as humans recognize fairly intuitively, but a system may not. For example, for the problem space of improvisational storytelling, it is critical that the sounds match both the tone and the intent of the speaker, that the system differentiates between different speakers, and differentiates between voice directed towards the story versus voice input between the primary and secondary users interacting while using Storyline.


To address the issues of tone and contextual information during the storytelling experience, we created the theme cards to help guide the user towards providing input that would be easier for the system to process and to respond to the user in an appropriate manner. By doing so, we eliminated some of the potential sources for error (user-system mismatch in intent or setting). The issues of differentiating between speakers and recognizing relevant voice input, however, we did not address with this specific prototype, due to the constraints of our prototyping options and timeline.


Another major challenge we faced with designing a voice interface was the fact that every time we tested it, there was not visual confirmation of the device actually receiving out command. This uncertainty significantly increased the amount of time it took us to debug our interaction flow.



One of the biggest takeaways for us from this project was that designing for voice is deceptively difficult— although we communicate with other humans using language, ironically, it’s not the modality that we’re used to designing for. This prompted us to more deeply consider the nature of the problem space, the needs of the user, and how a system might solve their issue, without jumping too quickly to solutions.


Additionally, because designing for speech was relatively new territory for us, we had to spend a considerable amount of time researching the tools that we had to work with and choose them based on our specific needs. Given the scope of the project, we were looking for prototyping softwares that did not require coding, allowed us to test the prototype ourselves, and allowed us to share our design with others for evaluation.



Creating Storyline was a valuable experience in that it pushed us to design outside of our comfort zone of two-dimensional interfaces and wireframes, and it prompted us to further explore a modality which is often overlooked. Moreover, not only are VUIs often overlooked, but they are also on the rise, given the recent developments in Internet of Things (IOT) platforms and smart home devices (Google Assistant, Siri, Alexa etc.).  Additionally, completing this assignment entailed more of a design sprint as opposed to going through a full product development cycle. This pushed us to focus more keenly on ideating and building, and less on need-finding and analysis. While our design could definitely be improved with input from users and further user testing, focusing on ideating and building was a valuable perspective shift. For example, in order to even build the prototype, we faced several hurdles and were forced to consider what we were trying to achieve. At the end of the day, our goal was to communicate our concept, and we did the best we could using the tools at our disposal.