SlideShare une entreprise Scribd logo
1  sur  25
Cataloging
                The Art & Science of it...
                           Utkarsh
                    Principal Architect @
                        Flipkart.com



Sunday 3 March 13
Art vs Science
                             Imaginative                  Free
                                                          Form
                                               Creative




           Measurable            Formulative


                    Methodical               Set
                                           Patterns




Sunday 3 March 13
What is Cataloging?
  • Catalog
    A list or itemized display usually including descriptive information
    or illustrations.
  • Cataloging
    a. To list or include in a catalog
       b. To classify according to a categorical system


       We define it as:
       Cataloging is the process of managing the inventory of products
       through the entire lifecycle of creating, updating, de-
       provisioning/re-provisioning and deletion.

                                                                     3

Sunday 3 March 13
Why is the problem
                       interesting?
   • Ever growing - “size”
   • Dynamic nature of the Metadata - “elasticity”
   • Association(s) between data elements -
     “flexibility”
   • Flux of changes - “variability”
   • De-coupled systems & Data Ownership -
     “data duplication”



                                                     4

Sunday 3 March 13
How do we solve it?
     • Be Comprehensive & Imaginative
     • Be Methodical & Flexible




     • Work with Patterns & Create new Patterns
     • Be a Composer, be an artist (blend where required)


                                                            5

Sunday 3 March 13
What do we solve?
     • Identify Data Elements
     • Identify Relationships b/w Data Elements
     • Identify Data Usage patterns (Query patterns)
     • Create an ideal representation: Logical Model
     • Characterize the Data Store(s)
     • Architect the Catalog Data Cluster
     • Define Views/Interface(s)




                                                       6

Sunday 3 March 13
Identify Data Elements
            Product                       Stock                     Sellers
             Biblio



                       Product                    Category                    Product
                       Variants                                                SLAs



            Supplier                 Product                     Taxation
                                     Images


                           Pricing                Contributors
                                                                               ?

            Be Comprehensive ; Be Imaginative !!



                                                                                        7

Sunday 3 March 13
Identify Relationships
                                            ?
                                                                  Compilation
             Physical                                                 1
             Product                                has A
                             is A

                                                                  Compilation
                                                                      2
                                      Book
                                                      has A
                     belongs to
                                                belongs to
                                  belongs to
                    Year                                 Author


                                    Genre


       Be Comprehensive ; Be Imaginative !!


                                                                                8

Sunday 3 March 13
Identify Data Query Patterns
     •   Is the querying real-time or offline (customer perspective)
     •   Is the query “Id” based or use of filters (adhoc or pre-defined)
     •   Is the query linking multiple data elements
     •   Understand: Query SLAs at ever increasing scale
     •   Question: why is the client writing such a query


         Eg:
     a. Book with a specific title Secret of the Nagas
     b. Books by Chetan Bhagat published in 2012
     c. Books which are Thrillers, published post 2005 written in Hindi and
        published by Rupa Publications




                                                                              9

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”




                                    10

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”




                                    11

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”
         “Title” + “Publisher” + “Edition”




                                             12

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”
         “Title” + “Publisher” + “Edition”
         “Title” + “Publisher” + “Edition” + “Variant”




                                                         13

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”
         “Title” + “Publisher” + “Edition”
         “Title” + “Publisher” + “Edition” + “Variant”
         “Title” + “Publisher” + “Edition” + “Variant” + ??

         Be Imaginative - an Artist’s brush stroke !!




                                                              14

Sunday 3 March 13
Logical Model
     Schema
     Entities as Tables    + Rich Query Support       Relational
                                                      Databases:
                           + Built-in support for
                           Relationships                  * MySQL,
     Relationships as                                 Oracle, Postgres
     Constraints           + Indexes                  et al


     Queries supported     - Elasticity
     through indexes          * Frequent addition/
     and joins             deletion of columns
                              * Growing secondary
                           indexes
                           - Not optimized for some
                           use-cases
                             * Key-Values
                              *Data Blobs/ Graphs


                                                                   15

Sunday 3 March 13
Logical Model
         Semi-Schema
                             + Flexibility:
         Blobs (Documents)                       Document Stores:
                             “Documents” are
         of Data             less rigid            * MongoDB,
                                                 CouchBase et al
                             + Query Language
         Linkages between    to retrieve based
         Documents           on content of
                             “Document”
         Queries supported
         through document    - Complex
         identifiers and      Relationships are
         document            non-trivial
         references          - “Linked”
                             Document Queries
                             may not be
                             optimized


                                                               16

Sunday 3 March 13
Logical Model
         No Schema
         Data Blobs           + Elasticity           Other NoSQL
                                 * Variability of    Stores:
                              data format              * HBase, RIAK,
         Rules/Relationship                          Cassandra, et al
         definitions              * Secondary
                              Indices
                              + Tunable
         Queries supported    performance
         through data
         “views”, indexes,
         search based on      - Relational data is
         reverse indexing     a force-fit (sub-
         etc ...              optimal)
                              +/- Querying
                              models are specific
                              to Stores


                                                                   17

Sunday 3 March 13
Catalog Data Cluster

                    Catalog       Biblio     Product
                     Data         Data        Data




                                   UGC      Compliance
                                    on        Data
                                 Products
  - “View”/”Data” Partitions
  - Blend multiple data stores
  - Interfaces provide view to
                                    ?        Pricing/
  the underlying data
                                            Accounting
  - Scale uniformly for data
  elements



                                                       18

Sunday 3 March 13
Data Store Characterization
     • Data characteristics:                • Elasticity
           - Reliability (availability          - increase in scale
             and redundancy)                    - evolving catalog
           - Consistency                          definitions


     • Querying capability
           - Support for indexes            • SLAs
           - Filters; secondary                - Volumes
             indexes
                                               - Throughput
           - linkages/relationships
                                               - Latencies

          Be Comprehensive; be Methodical but be unbounded by
          choices - a Scientist who has a palet of colors in hand !!


                                                                       19

Sunday 3 March 13
Data Store Characterization
    • CAP: which 2 we pick? can data store help configure
      any 2?                     A




                        C                P

    • Operational ease (monitoring, reporting, config
      mgmt ..)
    • Pluggability with Distributed Computing platforms


                                                          20

Sunday 3 March 13
Define Views & Interfaces
      •   Cataloging has multiple use-cases
          which are business centric                  View Layer
                                                Precomputed View(s)
      •   Use-cases evolve; and so do the
          “view” to the data                            Dynamic View(s)

      •   “Views” as multiple interpretations
                                                   Data Access Interface
          of the data;
      •   De-coupled with the underlying
          data                                     Data 1          Data 2

      •   Underlying data form has to be
          elastic                                  Data 3          Data 4
      •   Overlayed views have to be
          adaptive



                                                                            21

Sunday 3 March 13
Architect for Scale &
                       Performance
                Identify
             Usage Patterns                  Right
                                          Tools for Job


                        Right
                     Abstractions                Pluggable
                                              Solution Stacks


                              Decoupled
                                Data                    Offline
                                                      Processing




                                                                   22

Sunday 3 March 13
Measure, Monitor & Evolve
     • SLAs change; system has to be adaptive
     • Start off with established goals; benchmark and
       meet the initial set goals
     • Changes are gradual; plan at the first symptom
     • Listen for system(s) not coping up
     • Always work towards incremental changes; entire
       overhaul of the systems will be counter productive

           Be Curious, have doubts, deeply introspect -
           be the ultimate Scientist !!



                                                            23

Sunday 3 March 13
Change is constant ... adapt

     • Requirements evolve
     • Business introduces flux
     • Data interpretations grow

     • Be flexible, adaptive, imaginative......
       work as a Scientist who appreciates
       Art !!


                                                 24

Sunday 3 March 13
Thank you !
                      My Co-ordinates:
                    utkarsh@flipkart.com




                                          25

Sunday 3 March 13

Contenu connexe

Tendances

Illuminating the potential of Scrum by comparing LeSS with SAFe
Illuminating the potential of Scrum by comparing LeSS with SAFeIlluminating the potential of Scrum by comparing LeSS with SAFe
Illuminating the potential of Scrum by comparing LeSS with SAFe
Rowan Bunning
 

Tendances (20)

Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
 
Return on Investment (ROI) of Lean & Agile Methods
Return on Investment (ROI) of Lean & Agile MethodsReturn on Investment (ROI) of Lean & Agile Methods
Return on Investment (ROI) of Lean & Agile Methods
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Chris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing workChris OBrien - Azure DevOps for managing work
Chris OBrien - Azure DevOps for managing work
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Lean Change Agent - Applying Lean and Agile to Change Management
Lean Change Agent - Applying Lean and Agile to Change ManagementLean Change Agent - Applying Lean and Agile to Change Management
Lean Change Agent - Applying Lean and Agile to Change Management
 
Illuminating the potential of Scrum by comparing LeSS with SAFe
Illuminating the potential of Scrum by comparing LeSS with SAFeIlluminating the potential of Scrum by comparing LeSS with SAFe
Illuminating the potential of Scrum by comparing LeSS with SAFe
 
Microservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaMicroservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and Kafka
 
Scaled Agile Framework (SAFe) 4.5 Tutorial ...
Scaled Agile Framework (SAFe) 4.5 Tutorial ...Scaled Agile Framework (SAFe) 4.5 Tutorial ...
Scaled Agile Framework (SAFe) 4.5 Tutorial ...
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Exploring Agile Transformation and Scaling Patterns
Exploring Agile Transformation and Scaling PatternsExploring Agile Transformation and Scaling Patterns
Exploring Agile Transformation and Scaling Patterns
 
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
 
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
 
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...Developer Journey at Zalando - Idea to Production with Containers in the Clou...
Developer Journey at Zalando - Idea to Production with Containers in the Clou...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Getting Started - Introduction to Backlog Grooming
Getting Started - Introduction to Backlog GroomingGetting Started - Introduction to Backlog Grooming
Getting Started - Introduction to Backlog Grooming
 
Scrum training
Scrum trainingScrum training
Scrum training
 

En vedette

Scaling systems using change propagation across data stores
Scaling systems using change propagation across data storesScaling systems using change propagation across data stores
Scaling systems using change propagation across data stores
Jagadeesh Huliyar
 
Real Time Fulfilment Planning
Real Time Fulfilment PlanningReal Time Fulfilment Planning
Real Time Fulfilment Planning
Jagadeesh Huliyar
 
Mongo for aadhaar
Mongo for aadhaarMongo for aadhaar
Mongo for aadhaar
MongoDB
 
Cataloging of nonbook materials edited
Cataloging of nonbook materials editedCataloging of nonbook materials edited
Cataloging of nonbook materials edited
Ime Amor Mortel
 

En vedette (11)

Scaling systems using change propagation across data stores
Scaling systems using change propagation across data storesScaling systems using change propagation across data stores
Scaling systems using change propagation across data stores
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Real Time Fulfilment Planning
Real Time Fulfilment PlanningReal Time Fulfilment Planning
Real Time Fulfilment Planning
 
Mongo for aadhaar
Mongo for aadhaarMongo for aadhaar
Mongo for aadhaar
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Cataloging of nonbook materials edited
Cataloging of nonbook materials editedCataloging of nonbook materials edited
Cataloging of nonbook materials edited
 
What Is Cataloging?
What Is Cataloging?What Is Cataloging?
What Is Cataloging?
 
How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Cataloging: The Art and Science of it

  • 1. Cataloging The Art & Science of it... Utkarsh Principal Architect @ Flipkart.com Sunday 3 March 13
  • 2. Art vs Science Imaginative Free Form Creative Measurable Formulative Methodical Set Patterns Sunday 3 March 13
  • 3. What is Cataloging? • Catalog A list or itemized display usually including descriptive information or illustrations. • Cataloging a. To list or include in a catalog b. To classify according to a categorical system We define it as: Cataloging is the process of managing the inventory of products through the entire lifecycle of creating, updating, de- provisioning/re-provisioning and deletion. 3 Sunday 3 March 13
  • 4. Why is the problem interesting? • Ever growing - “size” • Dynamic nature of the Metadata - “elasticity” • Association(s) between data elements - “flexibility” • Flux of changes - “variability” • De-coupled systems & Data Ownership - “data duplication” 4 Sunday 3 March 13
  • 5. How do we solve it? • Be Comprehensive & Imaginative • Be Methodical & Flexible • Work with Patterns & Create new Patterns • Be a Composer, be an artist (blend where required) 5 Sunday 3 March 13
  • 6. What do we solve? • Identify Data Elements • Identify Relationships b/w Data Elements • Identify Data Usage patterns (Query patterns) • Create an ideal representation: Logical Model • Characterize the Data Store(s) • Architect the Catalog Data Cluster • Define Views/Interface(s) 6 Sunday 3 March 13
  • 7. Identify Data Elements Product Stock Sellers Biblio Product Category Product Variants SLAs Supplier Product Taxation Images Pricing Contributors ? Be Comprehensive ; Be Imaginative !! 7 Sunday 3 March 13
  • 8. Identify Relationships ? Compilation Physical 1 Product has A is A Compilation 2 Book has A belongs to belongs to belongs to Year Author Genre Be Comprehensive ; Be Imaginative !! 8 Sunday 3 March 13
  • 9. Identify Data Query Patterns • Is the querying real-time or offline (customer perspective) • Is the query “Id” based or use of filters (adhoc or pre-defined) • Is the query linking multiple data elements • Understand: Query SLAs at ever increasing scale • Question: why is the client writing such a query Eg: a. Book with a specific title Secret of the Nagas b. Books by Chetan Bhagat published in 2012 c. Books which are Thrillers, published post 2005 written in Hindi and published by Rupa Publications 9 Sunday 3 March 13
  • 10. Identification is Non Trivial Example “Book” Identification --> “Title” 10 Sunday 3 March 13
  • 11. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” 11 Sunday 3 March 13
  • 12. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” “Title” + “Publisher” + “Edition” 12 Sunday 3 March 13
  • 13. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” “Title” + “Publisher” + “Edition” “Title” + “Publisher” + “Edition” + “Variant” 13 Sunday 3 March 13
  • 14. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” “Title” + “Publisher” + “Edition” “Title” + “Publisher” + “Edition” + “Variant” “Title” + “Publisher” + “Edition” + “Variant” + ?? Be Imaginative - an Artist’s brush stroke !! 14 Sunday 3 March 13
  • 15. Logical Model Schema Entities as Tables + Rich Query Support Relational Databases: + Built-in support for Relationships * MySQL, Relationships as Oracle, Postgres Constraints + Indexes et al Queries supported - Elasticity through indexes * Frequent addition/ and joins deletion of columns * Growing secondary indexes - Not optimized for some use-cases * Key-Values *Data Blobs/ Graphs 15 Sunday 3 March 13
  • 16. Logical Model Semi-Schema + Flexibility: Blobs (Documents) Document Stores: “Documents” are of Data less rigid * MongoDB, CouchBase et al + Query Language Linkages between to retrieve based Documents on content of “Document” Queries supported through document - Complex identifiers and Relationships are document non-trivial references - “Linked” Document Queries may not be optimized 16 Sunday 3 March 13
  • 17. Logical Model No Schema Data Blobs + Elasticity Other NoSQL * Variability of Stores: data format * HBase, RIAK, Rules/Relationship Cassandra, et al definitions * Secondary Indices + Tunable Queries supported performance through data “views”, indexes, search based on - Relational data is reverse indexing a force-fit (sub- etc ... optimal) +/- Querying models are specific to Stores 17 Sunday 3 March 13
  • 18. Catalog Data Cluster Catalog Biblio Product Data Data Data UGC Compliance on Data Products - “View”/”Data” Partitions - Blend multiple data stores - Interfaces provide view to ? Pricing/ the underlying data Accounting - Scale uniformly for data elements 18 Sunday 3 March 13
  • 19. Data Store Characterization • Data characteristics: • Elasticity - Reliability (availability - increase in scale and redundancy) - evolving catalog - Consistency definitions • Querying capability - Support for indexes • SLAs - Filters; secondary - Volumes indexes - Throughput - linkages/relationships - Latencies Be Comprehensive; be Methodical but be unbounded by choices - a Scientist who has a palet of colors in hand !! 19 Sunday 3 March 13
  • 20. Data Store Characterization • CAP: which 2 we pick? can data store help configure any 2? A C P • Operational ease (monitoring, reporting, config mgmt ..) • Pluggability with Distributed Computing platforms 20 Sunday 3 March 13
  • 21. Define Views & Interfaces • Cataloging has multiple use-cases which are business centric View Layer Precomputed View(s) • Use-cases evolve; and so do the “view” to the data Dynamic View(s) • “Views” as multiple interpretations Data Access Interface of the data; • De-coupled with the underlying data Data 1 Data 2 • Underlying data form has to be elastic Data 3 Data 4 • Overlayed views have to be adaptive 21 Sunday 3 March 13
  • 22. Architect for Scale & Performance Identify Usage Patterns Right Tools for Job Right Abstractions Pluggable Solution Stacks Decoupled Data Offline Processing 22 Sunday 3 March 13
  • 23. Measure, Monitor & Evolve • SLAs change; system has to be adaptive • Start off with established goals; benchmark and meet the initial set goals • Changes are gradual; plan at the first symptom • Listen for system(s) not coping up • Always work towards incremental changes; entire overhaul of the systems will be counter productive Be Curious, have doubts, deeply introspect - be the ultimate Scientist !! 23 Sunday 3 March 13
  • 24. Change is constant ... adapt • Requirements evolve • Business introduces flux • Data interpretations grow • Be flexible, adaptive, imaginative...... work as a Scientist who appreciates Art !! 24 Sunday 3 March 13
  • 25. Thank you ! My Co-ordinates: utkarsh@flipkart.com 25 Sunday 3 March 13