1. What about Little Data?
Big Data Forum Lightning Brief
Matthew Carroll, GM of 42six
March 19, 2014
2. Who the heck are we?
2!
Technologies
Accumulo (active contributor)
Ozone (active contributor)
Storm
Niagara Files
Apache Kafka
Titan (active contributor)
CouchDB
MongoDB (active contributor)
Cassandra
HBase
Neo4J
OpenStack
Puppet
Programs
RTRG
Intelink
Kickawesome
* Ozone & Apps Mall
* Ozone Mobile (DISA)
* Red Disk (Army-INSCOM)
* ORION (DIA)
FireTruck (DIA-DCTC)
Coral Reef (Army/NMEC)
WE ARE CSC’S BIG DATA SERVICES ARM PROVIDING
OPERATIONALLY FOCUSED SOLUTIONS AND CONSULTING TO THE
U.S. INTELLIGENCE COMMUNITY, INTERAGENCY & PRIVATE SECTOR.
3. Big Data’s Little Brother
3!
Big Data is what organizations know
about entities — be they people,
places, things, etc. Data is
aggregated from a large number of
sources, assembled into a massive
data store, and analyzed for
patterns. The results are more
accurate predictions, more targeted
communications, and more
personalized services.
Good For: General questions, recommendations
for all users, SNA, anomaly detection
Little Data is what we know about
ourselves. What we search. Who
we know. What we care about. How
we spend our time. We’ve always
had a sense for these things — after
all, it’s our jobs. But thanks to the
combination of social and cloud
technologies, it’s easier than ever
to gain insight into our own
behavior.
Good For: Individual recommendations,
personal goals, efficiency analysis, individual
pattern detection
Big Data gets all the attention but there is value, if not more, in Little
Data…
4. So how do I collect Little Data?
4!
AS THE GOVERNMENT MIGRATES TO DISTIRBUTED SYSTEMS,
SPECIFICALLY PaaS, ENGINEERING TEAMS NEED TO FOCUS ON
INDIVIDUAL BASED LOGGING & ANALYTICS.
1. Simple APIs to insert logging of user activity including time in
app, how many searches executed, common search terms, etc.
2. Insert “actions” into traditional web apps like was this report
interesting or value percentage.
3. Hook into task management systems when you can…track what
the user is working on, where they are and for how long.
4. Build personalized analytics pages with time and space
visualizations to help users see context in their activity
5. Some Examples
5!
Graph Clustering: What are the groups
that make up my activity?
Geospatial: What are the locations
associated with my activity?
Temporal Activity: When are you
active and what are you doing?
Textual: What do you search, what are
your common topics?
6. Ok I have my Little Data
6!
ONCE HOOKED INTO THE ARCHITECTURE AND USERS HAVE
ACCESS TO THEIR DATA, SETUP PERSONALIZED ANALYTIC
ENGINES FOR REMINDERS AND AUTOMATED ALERTING
1. Design GUI for event-based processing first, i.e. node.js, etc. Think through
dynamic updates based on user actions not necessarily on new data in the
system
2. Think like IFTTT – set the stage for the user to define desired personalized
goals, e.g. 20% search, 50% read, 30% write.
3. Design data models with users in mind. Index data in a way to key off of
users. Treat users as first class citizens.
4. Think about useful information to guide general user experience, i.e.
reminder that you typically search for “X” term when in this app on Fridays.