SlideShare a Scribd company logo
1 of 26
Download to read offline
Data Diffing Based Software
Architecture Patterns
Huahai Yang
Juji Inc.
What is diffing?
• Given two elements a and b,calculate the difference d between
them
• Function (diff a b) ;=> d
• Function (patch a d)
• Such that (= b (patch a d))
• Or: (= b (patch a (diff a b)))
• These are normally true:
• (not= (diff a b) (diff b a))
• (= (diff a c) (concat (diff a b) (diff b c)))
• (< (size d) (min (size a) (size b)))
• (< (time (patch a d)) (time (diff a b)))
Evolution of diffing (1)
• Earliest diff was developed by Doug
McIIroy on Unix at Bell Lab in 1974
• Works on text file, work units are lines
of text
• Purpose: Reduce storage necessary to
maintain multiple versions of file.
• Use: compare content, track changes,
verifying output, version control
Evolution of diffing (2)
• Diffing in 3D graphics programming
• World modeled as a scene graph
• Only re-render changed subtrees
• Purpose: performance optimization
• Conceptually simple programming
model: render everything
• Inspired react.js
• Clojurescript wrapper of react could
be faster than react due to faster
diffing with immutable data
Evolution of diffing (3)
• Data oriented programming
• Data, not text
• Data are directly meaningful for code, no need for parsing or decoding
• Generic data literals, not specialized opaque programming constructs
• Diff input and output are both data
• Diffing as a software architecture consideration, not just an
implementation detail, impacting
• Delineation of system components
• Data model design
• API design
Diffing enables decoupling
• diff & patch functions are generic and blind
• They don't have to understand their input for them to work
• Semantic asymmetry between sender and receiver enforces separation of
concerns
• Also support a kind of natural encapsulation, not forced like in OOP
• d is still open for inspection if the receiver chooses to
• Graded, receiver don’t need know a lot, but can know a lot if choose to
Sender
(diff a a’) ;=> d
d
Receiver
(patch a d) ;=> a’
Diffing encourages data model reuse
• Thanks to diffing, data duplication between components are faithful and
cheap
• Advantageous to reuse the same data model throughout the system,
dramatically simplifying system
Diffing tracks changes
• Thanks to diffing, each version of the
world state can be cheaply saved
and replayed to recover originals
• Application statefulness can be
externalized and managed
Editscript: a Clojure data
diffing library
• https://github.com/juji-io/editscript
• Works for vector, list, set and map
• Edits are a vector of vectors:
• Path
• Op :+, :-, or :r
• Value
• Diffing algorithms
• Quick: fast
• A* : optimal diff size
Case study: Juji Studio UI Re-design
• Complete UI redesign
• Re-implementation
• One month
turnaround
• Mainly due to
switching from a
resource-oriented API
to a diffing based API
Before
Case study: Juji Studio UI Re-design
• Complete UI redesign
• Re-implementation
• One month
turnaround
• Mainly due to
switching from a
resource-oriented API
to a diffing based API
After
UI Data model: config doc
• Single Page Application (SPA) in cljs
• States in an EDN document – config doc
• SPA, server and DB all having copies of
config doc
Config
doc
SPA Server DB
GraphQL
Config
doc
Config
docAPI
Traditional GraphQL API
• Resources oriented
(RESTful)
• Server side config doc is
the truth
• API is CRUD on server
resources
• i.e. paths in the config
doc
• Repetitive CRUD calls for
each and every type of
nodes
• Thousands lines of Lacinia
schema
Diffing based GraphQL API
• All logic is in SPA
• API is CRUD on config doc
• Update is sending diffs
• SPA periodically sends to
server:
(diff doc-prev doc-now)
• Server applies the diff, saves
the doc in DB, replies with
config doc SHA
• SPA validates SHA, if
different, sends config doc
to overwrite
• Removed all API calls on
paths and nodes
Case study: externalize application states
• How to scale highly stateful application?
• E.g. Juji initiates an agent (rep) for each chat session on a server node, the
state of each rep is stored in an atom
• What if the server node become unavailable?
Server Node
API
Gateway
Case study: externalize application states
• Each rep sends diff of its state to a persistent log (e.g. Kafka)
• E.g. At each utterance, rep sends (diff state-prev state-now)
• When a server becomes unavailable, API gateway forward traffic to
another server, which recovers the agent state from the persistent
log, by simply sequentially applying all diffs to a shared initial state.
Server Node
API
Gateway
Persistent Log
diff
Case study: reduce component dependency
• Stateful components depend on one another
• Introducing user invokable system functions,
leads to circular dependency, e.g.
(juji.func.system/cleanup-chat rep)
System
Rep
Reps
Rep
Subs
func.system
[:rt jujiid]
• Instead of depending on
namespaces that contain
subscriptions
• Watch reps atom
• Inspect its diff between old
and new
• Handle the case when a rep
is removed or cleaned
• i.e. sending :user-left
message to channels, and let
the subscriptions clean
themselves up
Case study: synchronize collaborative editing
• Multiple parties sending diffs
• Out of sync when lines cross path
• Difficult yet common problem
• E.g. enable multiple users editing the same
chat at the same time
• Locking has bad UX
• Three-way merge has high latency
A A
(diff A A’)
(diff A A’’)
Differential Synchronization
• Diffing based synchronization
method
• Scalable
• Fault-tolerant
• Low latency
• Developed by Neil Fraser in
2009
• Used by Google Docs
• Client-server
case
• Use two
shadows
• Fault
tolerant case
• Keep a
backup
shadow
• Scaling
Data modeling guideline: Don’t use vector
• Minimize unnecessary use of ordered data structure, e.g. vector or
list
• Diffing algorithm is slow for ordered data, because order is a strong
constraint to satisfy
• Ordered O(mn) vs. Unordered O(m+n)
• The implicit order of data elements are often source of incidental complexity
• Meaningful order is often based on data fields
• Sets or maps suffice in most cases
[ {} {} {} … ]
Bad
{ {} {} {} … } #{ {} {} {} … }
Good
Conclusion
• Diffing offers a few properties that lead to
• Simplified software architecture
• Enhanced system decoupling
• Easier scaling of stateful application
• Better solution to data synchronization problem
• Worthwhile to consider diffing based software architecture
• Particularly for data-oriented programming
Thank you!
• Huahai Yang @huahaiy
• Juji Inc. https://juji.io

More Related Content

What's hot

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissSpark Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesDataWorks Summit
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Sparkfelixcss
 
Why You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series DatabaseWhy You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series DatabaseInfluxData
 
Training Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataTraining Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataContinuent
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
dryadOrleans
dryadOrleansdryadOrleans
dryadOrleansb0rAAs
 

What's hot (20)

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use Cases
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
Why You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series DatabaseWhy You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series Database
 
Training Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataTraining Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your Data
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
dryadOrleans
dryadOrleansdryadOrleans
dryadOrleans
 

Similar to Data Diffing Based Software Architecture Patterns

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?Mogens Heller Grabe
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014Derek Collison
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWSAmazon Web Services
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database designSalehein Syed
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Alexey Rybak
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 

Similar to Data Diffing Based Software Architecture Patterns (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS
 
Data Science
Data ScienceData Science
Data Science
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 

Recently uploaded

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 

Data Diffing Based Software Architecture Patterns

  • 1. Data Diffing Based Software Architecture Patterns Huahai Yang Juji Inc.
  • 2. What is diffing? • Given two elements a and b,calculate the difference d between them • Function (diff a b) ;=> d • Function (patch a d) • Such that (= b (patch a d)) • Or: (= b (patch a (diff a b))) • These are normally true: • (not= (diff a b) (diff b a)) • (= (diff a c) (concat (diff a b) (diff b c))) • (< (size d) (min (size a) (size b))) • (< (time (patch a d)) (time (diff a b)))
  • 3. Evolution of diffing (1) • Earliest diff was developed by Doug McIIroy on Unix at Bell Lab in 1974 • Works on text file, work units are lines of text • Purpose: Reduce storage necessary to maintain multiple versions of file. • Use: compare content, track changes, verifying output, version control
  • 4. Evolution of diffing (2) • Diffing in 3D graphics programming • World modeled as a scene graph • Only re-render changed subtrees • Purpose: performance optimization • Conceptually simple programming model: render everything • Inspired react.js • Clojurescript wrapper of react could be faster than react due to faster diffing with immutable data
  • 5. Evolution of diffing (3) • Data oriented programming • Data, not text • Data are directly meaningful for code, no need for parsing or decoding • Generic data literals, not specialized opaque programming constructs • Diff input and output are both data • Diffing as a software architecture consideration, not just an implementation detail, impacting • Delineation of system components • Data model design • API design
  • 6. Diffing enables decoupling • diff & patch functions are generic and blind • They don't have to understand their input for them to work • Semantic asymmetry between sender and receiver enforces separation of concerns • Also support a kind of natural encapsulation, not forced like in OOP • d is still open for inspection if the receiver chooses to • Graded, receiver don’t need know a lot, but can know a lot if choose to Sender (diff a a’) ;=> d d Receiver (patch a d) ;=> a’
  • 7. Diffing encourages data model reuse • Thanks to diffing, data duplication between components are faithful and cheap • Advantageous to reuse the same data model throughout the system, dramatically simplifying system
  • 8. Diffing tracks changes • Thanks to diffing, each version of the world state can be cheaply saved and replayed to recover originals • Application statefulness can be externalized and managed
  • 9. Editscript: a Clojure data diffing library • https://github.com/juji-io/editscript • Works for vector, list, set and map • Edits are a vector of vectors: • Path • Op :+, :-, or :r • Value • Diffing algorithms • Quick: fast • A* : optimal diff size
  • 10. Case study: Juji Studio UI Re-design • Complete UI redesign • Re-implementation • One month turnaround • Mainly due to switching from a resource-oriented API to a diffing based API Before
  • 11. Case study: Juji Studio UI Re-design • Complete UI redesign • Re-implementation • One month turnaround • Mainly due to switching from a resource-oriented API to a diffing based API After
  • 12. UI Data model: config doc • Single Page Application (SPA) in cljs • States in an EDN document – config doc • SPA, server and DB all having copies of config doc Config doc SPA Server DB GraphQL Config doc Config docAPI
  • 13. Traditional GraphQL API • Resources oriented (RESTful) • Server side config doc is the truth • API is CRUD on server resources • i.e. paths in the config doc • Repetitive CRUD calls for each and every type of nodes • Thousands lines of Lacinia schema
  • 14. Diffing based GraphQL API • All logic is in SPA • API is CRUD on config doc • Update is sending diffs • SPA periodically sends to server: (diff doc-prev doc-now) • Server applies the diff, saves the doc in DB, replies with config doc SHA • SPA validates SHA, if different, sends config doc to overwrite • Removed all API calls on paths and nodes
  • 15. Case study: externalize application states • How to scale highly stateful application? • E.g. Juji initiates an agent (rep) for each chat session on a server node, the state of each rep is stored in an atom • What if the server node become unavailable? Server Node API Gateway
  • 16. Case study: externalize application states • Each rep sends diff of its state to a persistent log (e.g. Kafka) • E.g. At each utterance, rep sends (diff state-prev state-now) • When a server becomes unavailable, API gateway forward traffic to another server, which recovers the agent state from the persistent log, by simply sequentially applying all diffs to a shared initial state. Server Node API Gateway Persistent Log diff
  • 17. Case study: reduce component dependency • Stateful components depend on one another • Introducing user invokable system functions, leads to circular dependency, e.g. (juji.func.system/cleanup-chat rep) System Rep Reps Rep Subs func.system [:rt jujiid]
  • 18. • Instead of depending on namespaces that contain subscriptions • Watch reps atom • Inspect its diff between old and new • Handle the case when a rep is removed or cleaned • i.e. sending :user-left message to channels, and let the subscriptions clean themselves up
  • 19. Case study: synchronize collaborative editing • Multiple parties sending diffs • Out of sync when lines cross path • Difficult yet common problem • E.g. enable multiple users editing the same chat at the same time • Locking has bad UX • Three-way merge has high latency A A (diff A A’) (diff A A’’)
  • 20. Differential Synchronization • Diffing based synchronization method • Scalable • Fault-tolerant • Low latency • Developed by Neil Fraser in 2009 • Used by Google Docs
  • 22. • Fault tolerant case • Keep a backup shadow
  • 24. Data modeling guideline: Don’t use vector • Minimize unnecessary use of ordered data structure, e.g. vector or list • Diffing algorithm is slow for ordered data, because order is a strong constraint to satisfy • Ordered O(mn) vs. Unordered O(m+n) • The implicit order of data elements are often source of incidental complexity • Meaningful order is often based on data fields • Sets or maps suffice in most cases [ {} {} {} … ] Bad { {} {} {} … } #{ {} {} {} … } Good
  • 25. Conclusion • Diffing offers a few properties that lead to • Simplified software architecture • Enhanced system decoupling • Easier scaling of stateful application • Better solution to data synchronization problem • Worthwhile to consider diffing based software architecture • Particularly for data-oriented programming
  • 26. Thank you! • Huahai Yang @huahaiy • Juji Inc. https://juji.io