Data Analytics and Industry-Academic Partnerships: An Irish Perspective
John Breslin - NUI Galway - @johnbreslin
linkd.in/johnbreslin
Data Analytics and
Industry-Academic Partnerships:
An Irish Perspective
First: Ireland and Chile!
• John/Juan Garland, governor of Valdivia; Ambrosio O’Higgins (from
Sligo), Chile governor and founder of first transcontinental postal
service; his son Bernardo O’Higgins, first president of independent
Chile; John/Juan McKenna and Thomond O’Brien, Chilean-Irish
independence fighters; grandson Benjamin McKenna, writer and
liberal politician; George O’Brien, founder of the Chilean navy;
Patricio Lynch, Chilean-Irish naval hero (grandfather from Galway)
and ancestor of Che Guevara; drill head used in rescue of Atacama
miners was made in Ireland (County Clare)
• http://brenspeedie.blogspot.com/2010/11/what-did-irish-ever-do-
for-chile.html (a colleague of mine from NUI Galway wrote this)
• http://www.irlandeses.org/0610griffin1.htm
• http://en.wikipedia.org/wiki/Irish_Chilean
1. Menlo Castle (origin of Menlo Park
in Silicon Valley, California)
2. Computer Museum of Ireland (at DERI)
3. NUI Galway (where Stoney, namer of
the “electron”, was a prof.)
4. Java’s (Zachary Quinto, AKA Spock,
waited on tables here)
OnePageCRM
Insight
Ex Ordo
1
2
3
4
5
G A L W A YT E C H M A P
European Microcity of the Future
5. Claddagh (birthplace of the Claddagh
Ring, and Angel from Buffy!)
6. Ignite TTO Business Innovation Centre
7. Galway Technology Centre
8. Innovation in Business Centre (at GMIT
9. Marine Institute
7
8
6
@johnbreslin
@technologyvo
@startupgalw
#upgalway
#gaillimhabu
v1.201405271
bit.ly/galway
NUI Galway in brief
• Established in 1845:
• One of Ireland’s seven universities
• 105 hectare campus (260
acres)
• 120 links with universities
around the world
• 17,300 students:
• 12,500 undergraduates, 3,600
postgraduates, 1,200 other
• 2,541 staff:
• 1,078 academics, 1,015 admin and
support, 448 research
• 90,000 alumni in over a hundred
countries
Famous alumni
• Alice Perry, first female graduate engineer in the
world, 1849
• Michael O’Shaughnessy, a Civil Engineering
graduate from the University in the 1880’s, was San
Francisco Chief Engineer, and commissioned the
Golden Gate Bridge
• Honorary degrees to Nelson Mandela, Hillary Clinton
• TV and movie star Martin Sheen (The West Wing’s
President Bartlet) studied here in 2006/2007
• Includes the largest School of Engineering in Ireland
(finished 2011, 14,000 square metres, €43 million)
• Information Technology, Electrical and Electronic
Engineering, Biomedical Engineering, Mechanical
Engineering, Civil Engineering, DERI (now Insight)
The College of
Engineering and
Informatics
Insight Centre for Data
Analytics
Incorporating DERI (Digital Enterprise
Research Institute) at NUI Galway
TURNING DATA
INTO DECISIONS
Using sensor data to
optimise forestry resources
during harvesting.
Stem volume prediction, yield
management etc.
Remote sensing +
autonomous cutter control.
TURNING DATA
INTO KNOWLEDGE
Creating a network of knowledge.
Data ⇒ Semantics
Semantics ⇒ Discovery
e.g. discovering links
between drugs, genes,
and diseases.
MINING PATTERNS
FROM REAL-TIME
SOURCES
Physical shockwaves travel at 4.8km/sec but
knowledge of the earthquake traveled at 70km/
sec to Galway.
14:42 Earthquake strikes."
"
14:43 First tweet from @Bacanalnica in
nearby Managua."
"
14:44 120 secs later the first tweets are posted.
Case studies: finding
insights in business data
1. Finding expertise and content
2. Holistic energy management
3. AYLIEN text analytics
4. Social analytics for
recommendation and
communities
• Saffron (saffron.deri.ie) extracts knowledge from text,
with business applications in expert finding,
community detection, recommender systems, and
enterprise search, e.g.:
1. Ecommerce system with Kennys Bookshop to
analyse book descriptions and reviews to extract a
fine-grained book topic categorisation for use in
book recommendation to customers
2. EnRG entity relatedness for applications in
semantic search (EnRG is built over a large matrix
on Wikipedia and using the DBpedia ontology)
1. Expertise and content
2. Holistic energy
management
Managing energy related to:
• Office IT
• Data centres
• Facilities
• Business travel
• Daily commutes
Keep in mind business context:
• Energy expended
• Finances required
• Resource allocation
• Human resources
• Asset management
More challenges
• Technology and data
interoperability: data
scattered among different
systems, multiple
incompatible technologies
make it difficult to use
• Interpreting dynamic and
static data: sensors, ERP,
BMS, assets databases
• Need to proactively identify
efficiency opportunities
• Empowering actions and
including users in the loop
• Understanding of direct
and indirect impacts of
activities
• Embedding impacts
within business
processes
• Engaging users
Applications
Energy Analysis
Model
Complex Events
Situation Awareness
Apps
Energy and
Sustainability Dashboards
Decision Support
Systems
LinkedData
Support
Services
Entity
Management
Service
Data
Catalog
Complex Event
Processing
Engine
Provenance Search &
Query
Sources
Adapter Adapter Adapter Adapter Adapter
Energy saving applications
Energy awareness
Semantic event processing
Collaborative data management
Cloud of energy data
Linked sensor middleware
Resource Description Framework (RDF)
Semantic sensor networks
Constrained application protocol (CoAP)
Linked energy intelligence
3. AYLIEN text analytics
• AYLIEN is based in Dublin, backed by SOS Ventures
• 7 employees, started as B2C, switched to B2B in
2014
• Vision to “extract reality from data” (information
retrieval domain)
• Research collaboration with NUI Galway through
John (sentiment analysis on large-scale social media)
• http://bresl.in/aylientechcrunch
Text Analysis API (TAA)
● A package of easy-to-use
tools for extracting information
and insights from any text
● Language detection
● Supported Languages (EN, DE,
PT, ES, IT, FR)
● 168 customers
● Academic, ad intelligence and
brand protection, sentiment
analysis/opinion mining, PR and
media, CRMs, education,
psychology/interest graph
● Endpoints:
o Extraction
o Classification
o Summarisation
o Concept/entity extraction
o Hashtag suggestion
o Sentiment analysis
How is TAA used?
o Three major methods for deploying text analytics
services: API (via the “cloud”), on-premises
deployments, other integrations
o TAA is mainly provided using an API/subscription
model (monthly) via Mashape or 3scale
o Additional integrations with Google Spreadsheets
and other platforms (Telerik, Azuqua)
o In future: on-premises deployment, subscription
(yearly), custom solutions (bespoke)
Under the TAA hood…
• Based on Machine Learning (ML)
techniques (supervised,
unsupervised and semi-supervised)
• Extraction: useful for scraping text,
media and metadata from web pages
• Annotate text, media and metadata in a
training set
• Extract a set of heuristic rules and use
them to extract text, media and metadata
• Summarisation: Extracting n-best key
sentences from a document, based
on heuristics and a sentence
similarity matrix (initially), learning
over time
• Classification and document-level
sentiment analysis: assigning a label
to any piece of text (“sports”,
“technology”, “positive”, “negative”)
• Create word vectors from an annotated
dataset
• Train a classifier, use it to predict future
classes for a new instance
• Similar to a spam filter
• Concept extraction: Find what is
mentioned in a document and
disambiguate them based on
contextual clues e.g. Apple is
mentioned, how do we find out if it’s
the fruit or the company?
TAA market (SMEs)
Segments: SMEs, enterprises
Size: “many times the $2bn
forecast”
US, UK, Germany, Spain, India
See “Text Analytics 2014: User
Perspectives on Solutions and
Providers”
Market: natural language
processing [related markets:
machine learning, text mining]
SME segment: AlchemyAPI,
Semantria (Lexalytics), Textalytics,
Fluxifi
Enterprise segment: SAS, IBM,
Lexalytics
Main target: SMEs
Differentiators: feature-richness,
quality, price, progression
http://www.programmableweb.com/news/
how-5-natural-language-processing-apis-
stack/analysis/2014/07/28
4. Social analytics
• Some applications include cross-domain
recommendations, community detection and
evolution monitoring
• SemStim (Cisco)
• Whassapi (Volvo Ocean Race)
• SociaLens (ROBUST)
How does it work?
User profile with DBpedia URIs
from multiple source domains
Cross-domain recommendation
algorithm using DBpedia as
background knowledge
Input Background knowledge C