Sikuli-Slides is a tool that allows users to automate and test graphical user interfaces (GUIs) by annotating screenshots in PowerPoint slides. It uses computer vision to recognize and interact with elements on the screen based on the annotated slides. Sikuli-Slides also enables the creation of interactive tutorials by linking GUI interactions to narrations and on-screen text. The tool supports different modes for automating tests, guiding users through tutorials, and developing new slideshows.
10. • Uses Computer Vision.
• No scripting API support or source code access.
• Interacts with anything you see on the screen.
• OCR support.
• Works on Web-based UIs.
• Works on virtual machines and remote desktops
Why Sikuli?
15. • Automate and test GUIs by using screenshots and
annotating them.
• Make visual automation accessible to everyone.
• Use a tool that most users already know how to use.
• Reinventing Computer-based tutoring.
Sikuli-Slides
16. • PowerPoint is already a popular tool for creating test
cases.
• Online tutorials that include annotated screenshots.
Motivation
17. How to tell computers
how to interact with
applications?
22. • Most users already know how to use PowerPoint.
• Office Open XML file format a.k.a OOXML.
• DrawingML
• Shapes, pictures, etc.
• Data Interoperability.
Why PowerPoint
23. Document
Parser Visual
Automation
Processor
Java API
C++ Engine
OpenCV
java.awt.Robot
PowerPoint
Document
(.pptx file)
System Architecture
25. Action Shape
Left click Rectangle
Right click Oval
Double click Frame
Keyboard typing Text Box
Open default browser Cloud
Text ……..
www.sikuli.org
Drag and drop Rounded Rectangle connected by
an arrow pointing to the drag and
drop direction.
45. • How to add audio narrations to the slides and sync
the audio with the GUI input action to add more
interactive experience when running the slides?
• How to annotate the screen with text so users can
layer the screen with informative text that explains
what’s running on the screen?
Adding audio or narration
47. • Support three modes in which you can run
presentation slides in:
• Action mode.
• Tutorial mode.
• Development mode.
Where are we now?
48. • Sikuli-Slides:
• Uses Computer Vision.
• Can run presentation slides on Windows/Mac/Linux.
• Makes visual automation accessible to everyone.
• Features a new way to create computer based tutorials.
Conclusions
Automation has always been an important part of personal computing. These tools were limited to system administrators or “power users” who are familiar with scripting languages. Manual performing of repetitive tasks is time consuming and labor intensive. Most users know how tedious it can be to perform menial and repetitive tasks like launching applications and web pages, inserting data into text fields, resizing image files, and typing out frequently used words.
Action(s) lets you build workflows that accomplish manual chores quickly, efficiently, and effortlessly. You don’t have to know any scripting languages or write any code. Instead, you create and execute automation “workflows” simply by dragging and dropping each individual step of a process. It’s like creating a kitchen recipe.
GUIs constitute a large part of the software code.
A GUI represents the information and actions available to a user through graphical icons and visual indicators such as secondary notation.
Current GUI testing techniques are incomplete, ad hoc, and largely manual. The most common tools use record-playback techniques. A test designer interacts with the GUI, generating mouse and keyboard events. The tool records the user events, captures the GUI session screens, and then stores the session—usually as a script.
Software testing is already labor and resource intensive—often accounting for 50 to 60 percent of total software development costs—and GUI testing poses further difficulties that traditional software testing techniques do not adequately address.
For example, in Android there’s a testing framework that’s called the Monkey, which is a command line tool that sends random events to your device. Some scripts are tied to x/y pixel coordinate
Sikuli means God’s eyes in the native american Huichol (==wichol) people. It refers to the ability to see invisible things. It’s a visual approach to search and automation of GUI using screenshots.
Template matching[1] is a technique in digital image processing for finding small parts of an image which match a template image
One of the issues with image based ui tools is we need to capture all target images and work on them
QA Engineers use PowerPoint to show their test cases to the team. PowerPoint makes it better.
Example: PowerStory is a popular agile tool that allows you to create your use cases in PowerPoint and then just add UI Mockups to the slide to make a UI Storyboard
We annotate the screenshots/images with shapes, text, arrows and more to draw viewer's attention and make our points clear.
Based on XML and ZIP technologies.
OOXML files are ZIP archives containing various XML files (parts) and organized into single package.
his breaking up or chunking of the data into pieces makes it easier and quicker to access data and reduces the chances of data corruption.
The parts can contain any type of data; to keep track of the data type of each part without relying on file extensions, the type for each part is specified in a file within the package called [Content_Types].xml. The relationships of the parts to the package as well as relationships that any part may have are abstracted from the parts and stored separately in relationship files--one for the package as a whole and one for each package that has relationships. In this way references are stored only once and can therefor be easily changed when necessary.
DrawingML is the language for defining graphical objects such as pictures, shapes, charts, and diagrams within ooxml documents. It also specifies package-wide appearance characteristics, i.e., the package's theme
The document parser is a standard SAX parser to parse XML files.
Visual Automation Processor, performs pixel based computation and cropping images, and performs search on the screen, and Façade to java API.
The core pattern matching algorithm was implemented in C++ using OpenCV, an open source computer vision library. The full API was implemented using the Java Robot class to execute Mouse and Keyboard actions. All components of the system have been tested on Mac OSX 10.7, Windows 8, and Ubuntu Linux.
There are 3 kinds of software tutorials: 1) video tutorials that the user views, 2) interactive tutorials where the user follows on-screen instructions (and—in some cases—watches short instruction movies), whereupon he/she does the tutorial exercises and receives feedback depending on his/her actions; and 3) webinarswhere users participate in real-time lectures, online tutoring, or workshops remotely using web conferencing software.