Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
PUZZLING- HOW CONTEXT ACCUMULATES from Structure:Data 2012
1. PUZZLING
SPEAKER: Jeff Jonas
Chief Scientist
Entity Analytics
IBM
Tuesday, November 27, 12
2. Puzzling
How Context Accumulates
Jeff Jonas, IBM Distinguished Engineer
Chief Scientist, IBM Entity Analytics
Email: jeffjonas@us.ibm.com
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
Tuesday, November 27, 12
6. State of the Union: “Pixel Analytics”
Observation Consumer
Space (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
7. State of the Union: “Pixel Analytics”
Observation Consumer
Space (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
8. State of the Union: “Pixel Analytics”
Red Puzzle Piece Analytics
Observation Consumer
Space (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
9. State of the Union: “Pixel Analytics”
Red Puzzle Piece Analytics
Green Puzzle Piece Analytics
Observation Consumer
Space (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
10. State of the Union: “Pixel Analytics”
Red Puzzle Piece Analytics
Green Puzzle Piece Analytics
Blue Puzzle Piece Analytics
Observation Consumer
Space (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
11. Without context … quality predictions are hard to come by.
Tuesday, November 27, 12
12. Context definition
Better understanding something by
taking into account the things around it.
Tuesday, November 27, 12
13. First … The Data Must Find the Data
Observation Consumer
Space (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
14. First … The Data Must Find the Data
Context Accumulation
Observation Persistent Consumer
Space Context (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
15. First … The Data Must Find the Data
Relevance Detection
Context Accumulation
Observation Persistent Consumer
Space Context (An analyst, a system,
the sensor itself, etc.)
Tuesday, November 27, 12
16. Big Data
Pile of ____ In Context
Tuesday, November 27, 12
17. Big Data [in context]. New Physics.
Tuesday, November 27, 12
18. Big Data [in context]. New Physics.
§More data: better the predictions
– Lower false positives
– Lower false negatives
§More data: bad data good
– Suddenly glad your data is not perfect
Tuesday, November 27, 12
19. Big Data [in context]. New Physics.
§More data: better the predictions
– Lower false positives
– Lower false negatives
§More data: bad data good
– Suddenly glad your data is not perfect
§More data: less compute
Tuesday, November 27, 12
40. Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
Tuesday, November 27, 12
41. Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
35min “I think some pieces are missing.”
Tuesday, November 27, 12
42. Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
35min “I think some pieces are missing.”
37min “Looks like a bunch of hillbillies on a porch.”
Tuesday, November 27, 12
43. Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
35min “I think some pieces are missing.”
37min “Looks like a bunch of hillbillies on a porch.”
44min “Hillbillies, playing guitars, sitting on a porch,
near a barber sign … and a banjo!”
Tuesday, November 27, 12
47. Incremental Context – Incremental Discovery
47min “We should take the sky and grass off the table.”
Tuesday, November 27, 12
48. Incremental Context – Incremental Discovery
47min “We should take the sky and grass off the table.”
2hr “Let’s switch sides, and see if we can make sense
of this from different perspectives.”
Tuesday, November 27, 12
49. Incremental Context – Incremental Discovery
47min “We should take the sky and grass off the table.”
2hr “Let’s switch sides, and see if we can make sense
of this from different perspectives.”
2hr10m “Wait, there are three … no, four puzzles.”
Tuesday, November 27, 12
50. Incremental Context – Incremental Discovery
47min “We should take the sky and grass off the table.”
2hr “Let’s switch sides, and see if we can make sense
of this from different perspectives.”
2hr10m “Wait, there are three … no, four puzzles.”
2hr17m “We need a bigger table.”
Tuesday, November 27, 12
51. Incremental Context – Incremental Discovery
47min “We should take the sky and grass off the table.”
2hr “Let’s switch sides, and see if we can make sense
of this from different perspectives.”
2hr10m “Wait, there are three … no, four puzzles.”
2hr17m “We need a bigger table.”
2hr18m “I think you threw in a few random pieces.”
Tuesday, November 27, 12
54. How Context Accumulates
§ With each new observation one of three assertions are made:
1) Un-associated; 2) Placed near like neighbors; or 3) Connected
Tuesday, November 27, 12
55. How Context Accumulates
§ With each new observation one of three assertions are made:
1) Un-associated; 2) Placed near like neighbors; or 3) Connected
§ Must favor the false negative
Tuesday, November 27, 12
56. How Context Accumulates
§ With each new observation one of three assertions are made:
1) Un-associated; 2) Placed near like neighbors; or 3) Connected
§ Must favor the false negative
§ New observations sometimes reverse earlier assertions
Tuesday, November 27, 12
57. How Context Accumulates
§ With each new observation one of three assertions are made:
1) Un-associated; 2) Placed near like neighbors; or 3) Connected
§ Must favor the false negative
§ New observations sometimes reverse earlier assertions
§ As the working space expands, computational effort increases
Tuesday, November 27, 12
58. How Context Accumulates
§ With each new observation one of three assertions are made:
1) Un-associated; 2) Placed near like neighbors; or 3) Connected
§ Must favor the false negative
§ New observations sometimes reverse earlier assertions
§ As the working space expands, computational effort increases
§ Given sufficient observations, there can come a tipping point.
Thereafter, confidence improves while computational effort
decreases!
Tuesday, November 27, 12
62. Sensemaking on Streams
§ Each person gets one piece per round, no collaborating
§ Only work the piece
– Figure out where it goes
– If you stumble upon something else worth fixing, fix it
– When there is no more to do on that piece, stop and say you are done
§ If you have new insight, tell me
§ Each assembler is timed for each piece
Tuesday, November 27, 12
65. Deep Reflection and Consolidation
§ Every 10 rounds (40 pieces) you can re-consider what is already known and
collaborate while doing this
§ Spend as much time as needed, until not much more can be accomplished
§ Puzzle chunks are counted before and after
Tuesday, November 27, 12
71. Noteworthy Events
§ @ 1.3% (4 pieces) new insight: “It’s Las Vegas and Sahara Hotel!”
Tuesday, November 27, 12
72. Noteworthy Events
§ @ 1.3% (4 pieces) new insight: “It’s Las Vegas and Sahara Hotel!”
§ @ 4% (12 pieces) the first two pieces connect
Tuesday, November 27, 12
73. Noteworthy Events
§ @ 1.3% (4 pieces) new insight: “It’s Las Vegas and Sahara Hotel!”
§ @ 4% (12 pieces) the first two pieces connect
§ @ 37% (112 pieces) a puzzle piece is processed by a “pipeline” in 2.7 seconds
– Why? “Never seen anything like it.”
§ @ 48% (144 pieces) new insight:
– “Big welcome Las Vegas sign with everything from the strip around it.”
Tuesday, November 27, 12
74. Noteworthy Events
§ @ 1.3% (4 pieces) new insight: “It’s Las Vegas and Sahara Hotel!”
§ @ 4% (12 pieces) the first two pieces connect
§ @ 37% (112 pieces) a puzzle piece is processed by a “pipeline” in 2.7 seconds
– Why? “Never seen anything like it.”
§ @ 48% (144 pieces) new insight:
– “Big welcome Las Vegas sign with everything from the strip around it.”
§ @ 65% (196 pieces) the first false positive is detected and corrected
Tuesday, November 27, 12
75. Noteworthy Events
§ @ 1.3% (4 pieces) new insight: “It’s Las Vegas and Sahara Hotel!”
§ @ 4% (12 pieces) the first two pieces connect
§ @ 37% (112 pieces) a puzzle piece is processed by a “pipeline” in 2.7 seconds
– Why? “Never seen anything like it.”
§ @ 48% (144 pieces) new insight:
– “Big welcome Las Vegas sign with everything from the strip around it.”
§ @ 65% (196 pieces) the first false positive is detected and corrected
§ @ 75% (224 pieces) new insight: “It is getting easier.”
Tuesday, November 27, 12
82. Lessons Learned
The last piece was almost as fast as the first.
Tuesday, November 27, 12
83. Lessons Learned
The last piece was almost as fast as the first.
Deep reflection (batch-based pattern discovery)
was significantly
more important than I had thought.
Tuesday, November 27, 12
87. SIBOS Conference 2011
§ 100 executives, 10 teams
§ 10 puzzles, 10 small tables
Tuesday, November 27, 12
88. SIBOS Conference 2011
§ 100 executives, 10 teams
§ 10 puzzles, 10 small tables
§ Duplicate and missing pieces
Tuesday, November 27, 12
89. SIBOS Conference 2011
§ 100 executives, 10 teams
§ 10 puzzles, 10 small tables
§ Duplicate and missing pieces
Lessons:
Tuesday, November 27, 12
90. SIBOS Conference 2011
§ 100 executives, 10 teams
§ 10 puzzles, 10 small tables
§ Duplicate and missing pieces
Lessons:
1. They learned federated search bites.
Tuesday, November 27, 12
91. SIBOS Conference 2011
§ 100 executives, 10 teams
§ 10 puzzles, 10 small tables
§ Duplicate and missing pieces
Lessons:
1. They learned federated search bites.
2. I watched as an early bias misdirected
their attention … but then over time
new observations corrected this bias.
Tuesday, November 27, 12
104. Puzzling Project #4: Commentary
§ Despite having only 100 pieces and eight collaborating eyeballs:
– We began to suspect there were missing and random pieces
– We had an alarming number of false positives1
– It took significantly more effort/time than expected
1
The primary source being one overly intoxicated pipeline.
Tuesday, November 27, 12
105. Puzzling Project #4: Commentary
§ Despite having only 100 pieces and eight collaborating eyeballs:
– We began to suspect there were missing and random pieces
– We had an alarming number of false positives1
– It took significantly more effort/time than expected
§ Why? Common shapes and that vast purple haze (lots of ambiguity).
1
The primary source being one overly intoxicated pipeline.
Tuesday, November 27, 12
107. Experiment #4: Notes to Self
§ Excessive ambiguity drives computational cost way up
Tuesday, November 27, 12
108. Experiment #4: Notes to Self
§ Excessive ambiguity drives computational cost way up
Tuesday, November 27, 12
109. Experiment #4: Notes to Self
§ Excessive ambiguity drives computational cost way up
§ Some drunk people get unreasonably optimistic
Tuesday, November 27, 12
111. My Recommendations
§ Context Accumulation
– Investments in general purpose information fusion will often yield greater value
than investments in specialized, single-sensor, algorithms (pixel analytics).
Tuesday, November 27, 12
112. My Recommendations
§ Context Accumulation
– Investments in general purpose information fusion will often yield greater value
than investments in specialized, single-sensor, algorithms (pixel analytics).
§ Real-time Analytics Over Big Data
– If something can be engineered for real-time, do that. There is a competitive
advantage when one can respond intelligently while a transaction is still happening.
Tuesday, November 27, 12
113. My Recommendations
§ Context Accumulation
– Investments in general purpose information fusion will often yield greater value
than investments in specialized, single-sensor, algorithms (pixel analytics).
§ Real-time Analytics Over Big Data
– If something can be engineered for real-time, do that. There is a competitive
advantage when one can respond intelligently while a transaction is still happening.
§ Deep Reflection
– Do not underestimate the value of periodic deep reflection (pattern discovery). Do
this more often. And put this emerging insight to immediate use via feedback loops.
Tuesday, November 27, 12
114. Related Blog Posts
Puzzling: How Observations Are Accumulated Into Context
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
Data Finds Data
Big Data. New Physics.
General Purpose Sensemaking Systems and Information Colocation
Data Beats Math
And Easy to Reach
Email: jeffjonas@us.ibm.com
Twitter: http://www.twitter.com/jeffjonas
Tuesday, November 27, 12
115. Puzzling
How Context Accumulates
Jeff Jonas, IBM Distinguished Engineer
Chief Scientist, IBM Entity Analytics
Email: jeffjonas@us.ibm.com
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
Tuesday, November 27, 12