4. Getting to The Data
Energy Company Options:
• No API
• Data Download (Not Realtime)
• Tight Coupling
5. Lesson
No ready packaged full solution = Opportunity
Building the ‘Novel’ on What Exists – Go find your bricks
Photo Credit: Justin Hamilton
6. Where I Ended
• MQTT
• Kafka
• ELK Stack
• Cassandra
• 1 Linux box
• Programming language choice – Python
• Sensor Peripherals (Efergy Electricity
Monitor and R820 Tuner)
7. Getting to the Source
• Updates Every 10
Secs
• Understand the
impact of each
event
• Getting to data is
necessary … don’t
be afraid to ask for
expert help
Elite Classic
photo credit: Jude
9. Intercepting Data
R820 Tuner RTL-SDR (USB Dongle)
• Software Defined Radio
• A computer based radio scanner
• Enables easy signal processing of TV and
Radio Signals
10. Decoding the Data
• C program to
decode
baseband data
• New tools and
skillsets for data
transformation
• Polyglot
11. • Frequency -
4331.51Mhz
• Bandwidth -
200khz
• Resampled to
96kHz
• Tuner gain -
49.6dB
Decoding the Data
14. 1st Class Citizen Data
• Data that is critical to value stream needs
to be shared
• Enterprise Options:
• ETL (Extract-Transform-Load)
• APIs
• Data Stream
16. Requirements Met by Both
• Support for data streams (pub/sub)
• Decouple consumers from producers
• Offer measure of reliability (QoS 1, 2 &3)
• Address messaging needs
• Forward compatibility (accommodate new
consumers)
17. The Smart Broker for Dumb Clients
A TCP protocol made to support lightweight
messages (2 byte overhead per message)
• Easy to parse for computers (length prefixed
length)
• Broker keeps track of client state (Last Will
Message)
• Guarantees message delivery through retries
• Security (usr/pwd over TLS)
• Dynamic topics
• Arduino or Raspberry Pi
18. The Dumb Broker for Smart Clients
A distributed commit log with support for high
throughput and message durability even at scale
• Distributed Log aggregator
• Client keeps track of what messages it has
processed
• Meant for server side environments (server to
server)
• Static topics (topic - static, key - dynamic)
• Short window of persistence
19. Lesson
• Look at your options
• Avoid paralysis of analysis
• Start with what you know … you can
always optimize later
• Guard against premature optimization
21. Drawing Insights with Visualization
• A way to interact with data
• Monitor data
• Starting point for decision making
• ELK Stack
Logstash Elastic Kibana
22. • What time of the day do I use the most power?
• Trend by day of the week
• Energy spikes? (Where, When and What)
23. Are there Anomalies?
• First define Normal
• Normal
• Something behaves in a consistent way with
respect to itself over time
• Something behaves in a consistent way with
respect to things around it
• Anomaly
• Change with respect to self as a function of time
• Relative diff compared to peers within a
population
24. Using Unsupervised Learning
• Very small sample of +ve scenarios (above
threshold) : 0 -20
• Large number of -ve samples
• Where future anomalies may look nothing
like past anomalies or samples in training
set
• Gaussian (normal) distribution
25. Why Not Use Supervised
• You have some labels, why not use
supervised?
• Supervised better when there’s a large
number of both -ve and +ve labels
• Supervised Algorithm needs enough view
into what an anomaly could possibly be
• Future anomalies might look nothing like
past anomalies
26. • Anomaly detection using Machine Learning
• ML model based on Bayesian Stats
• Forgetting to unplug the iron or stove left on
27. Feature Selection
• Transforming features to be more
Gaussian
• Features that might take unusual large or
small values in the event of an anomaly