This document discusses the Internet of Things (IoT) and an Internet of Notes application. It describes common IoT concepts like smart hubs, gateways, triggers, actions, properties, and recipes. It also discusses implementing an Alexa skill to approve travel requests using voice commands. The document provides information on developing Alexa skills, including using the Alexa Developer Console, Lambda functions, AWS services, and ask-sdk library. It emphasizes reusability and continuous improvement of code.
Human Factors of XR: Using Human Factors to Design XR Systems
The Internet of Notes Things
1. The Internet
of (Notes)
Things
Keith Strickland
JavaScript Guru – Red Pill Now
The New Chapter Begins
#domino2025
Keith Strickland
JavaScript Guru – Red Pill Now
Peter Presnell
CEO – Red Pill Now
22. My name is
mentioned on
twitter
I take a photo
A contact is
added
Receive an e-
mail from my
boss
A meeting is
about to start
It is raining
outside
Jira task is
assigned to
me
New item
added
New customer
added
Triggers
24. Actions
Approve travel
request
Turn off lights
Create an
invoice
Post message
to a space
Create item in
SharePoint list
Lower
temperature
4°
Create new to-
do
Send text
message
Create contact
Great! I will show you how to get started, the web sites you need to know about and some of the documentation.
To me, the most difficult part of building an Alexa skill was figuring out all the things you need to get up and running before you write any code. Hopefully I can deliver some knowledge here so that you can start developing your own Alexa skill without too much hassle.
When setting up your account take into consideration the name and email address you use.
If you need to setup a business account, use a different name, or add the company name to your name. This is so you can add your business account to your Alexa household and be able to tell the difference between the 2 accounts. When I first did this I went remove my business account from the household and there were 2 entries labeled “Keith Strickland” with no way to differentiate between the 2, I ended up removing the wrong one. Once you do this it’s 180 days before that account can join another household. So Developer Beware
The major parts of an Alexa skill are:
Interaction Model
Interfaces
Endpoints
The interaction model is what defines the structure of your skill and how it should find the code to run for requests. It’s made up of 4 different parts:
Invocation Phrase
Intents
Slot Types
Dialog Model
This is the phrase that a user will speak to start your skill and open a session. It should be at least 2 words and can’t be any of Alexa’s restricted launch words. Once the Invocation Phrase is spoken it will run the “LaunchRequestHandler” which we’ll get into with the code
”An intent represents an action that fulfills a user's spoken request. Intents can optionally have arguments called slots.”
Defining these makes it possible for you to provide different responses to different requests. For example in our “hello-engage” skill, there are several intents.
Finding Sessions
Yes
No
General Greeting
Then we implemented some of the built in intents
Cancel
Stop
Help
Repeat
There will be corresponding functions within your code to handle each different intent you define which is what builds the response to the user’s request.
“A set of likely spoken phrases mapped to the intents. This should include as many representative phrases as possible.”
You will want to define a bunch of sample utterances for each Intent. Get some help here coming up with utterances. This is how Alexa will determine what is being requested and then to allow your code to build a response. Also, these will set the requirements for your skill. It’s best to do this with a group of people as each person will ask the same question differently. The more utterances you can provide the better your skill will be.
There are 2 types of Slots:
Built-In – Definitions to handle all sorts of things like company names, people names, room names. In total probably about 50 different types to choose from
Custom - A representative list of possible values for a slot. Custom slot types are used for lists of items that are not covered by one of Amazon's built-in slot types.
Think of this as a variable that is part of your utterance, for example a name, a place or anything that can be used to drive a lookup
You provide a list of possible values, and then define optionally an ID and a bunch of Synonyms if needed
When defining these they certainly need to be tested by speaking to either an Echo device or the Test console. This is so you can find differences in what you Say and how Alexa interprets what you said. For example, in my testing with my accent “Peter Presnell” would be interpreted as “Peter Prescott”.
In this case I added “Peter Prescott” as a Synonym to ”Peter Presnell”. Now while this isn’t really feasible for something like a name, it is a way to “correct” issues that you encounter
Of course in most cases you can’t take that particular approach, but since ”Peter Prescott” isn’t a speaker at Engage 2018, we could take this approach
Also be aware that any special characters like quotes, parens, pound sign, asterisk, etc. are not allowed to be in the synonym or value of a slot definition. This can cause problems, example:
The name of this session is ”The Internet of (Notes) Things” and that’s what the title is in the database. However, the value of that slot definition is “The Internet of Notes Things”, so I can’t really query the database properly for that particular session name.
This is an optional piece of the interaction model.
“A structure that identifies the steps for a multi-turn conversation between your skill and the user to collect all the information needed to fulfill each intent. This simplifies the code you need to write to ask the user for information.”
Even though our hello-engage skill works on a “multi-turn conversation” we didn’t use the Dialog Model. We instead created Yes and No intents that are smart enough to know what to do next based on your answer
This is where you enable specialty features for your skill. For Example:
An audio player – Will give you the ability to control streaming of audio
Display Interface – Enables ability to send Card data to an Echo Show or Echo Spot
Video App– Enables ability to control streaming of video
Alexa Gadget – “Alexa Gadgets is a new category of connected products that enhance voice interactions with compatible Amazon Echo devices. The Gadgets Skill API and the Gadgets SDK enable developers to build experiences and products that turn an Amazon Echo device into a hub for interactive play.”
This is where you will define the AWS:ARN ID of your Lambda Function. This is how Alexa knows where your code resides.
A Lambda function is an anonymous function. In other words, it’s not bound to an identifier.
There are several programming languages that can be used to write your lambda function:
C#
Go
Java 8
Python
JavaScript (ES5)
On the AWS Services Console, you can write your code directly in that web page. However it’s quickly becomes a problem as you have no TypeAhead support and it’s just a slower workflow. We recommend that you develop your skill locally and upload it to your lambda function using the aws-cli.
With Red Pill’s JavaScript developers we found node.js v6 to be the easiest to get up and running quickly. Since we’re using node.js we already had all the relevant skills and development environments installed on our machines, so it wasn’t a big learning curve here with the exception of a lack of Alexa documentation. However there are lots of examples on how to do things.
The ask-sdk is a node module for working with the Alexa API. Before the ask-sdk was the alexa-sdk. This did not provide TypeScript Typings and the documentation was not very good. It was very difficult to find examples or API documentation.
The ask-sdk is now 3 different modules: ask-sdk, ask-sdk-core and ask-sdk-model. These modules provide TypeScript Typings and robust typeahead support. If your editor supports linking to source you can follow those links to see how things should be implemented. The Interfaces of the ask-sdk are documented somewhat but that documentation is much better than for the previous alexa-sdk. The biggest barrier to using these SDKs is the lack of API documentation.
The AWS Command Line Interface makes it possible to upload code to a Lambda function. I highly recommend going this route to automate creating a zip file and uploading your code to your Lambda function.
Some of the barriers here is just the difficulty of following the documentation to getting all the things needed to setup authentication. Once you get the authentication setup it’s fairly easy to search for what you’re trying to do.
While all of this is well and good, there are a LOT of opportunities here for improvement, especially in the Name and Session Title resolution. I would think something like Watson could analyze Alexa’s translated name to match a speaker since that is one of the biggest problems with this particular skill.
We could pass the name that Alexa thought she heard to Watson and Watson could figure out what name was meant based on the conference speakers.
Also, Watson could probably come in handy to translate session titles. For example the title of this session “The Internet of (Notes) Things”. Alexa doesn’t know what to do with the parens in this title. She doesn’t understand that there should be a different tone applied because it’s in parens. Watson could help convert this using “Natural Language Processing” to a proper speech tag with an emphasis tag around ”(Notes)”. Likewise there are other Session titles which contain quotes around a word that should also be surrounded with an emphasis tag.
Could we do this ourselves without the use of Watson? Of course, but it would be a lot of ugly tedious code that would be prone to breakage and confusion when revisited at a later date.