SlideShare une entreprise Scribd logo
1  sur  58
Converting Data to Information
29/04/2011         DataCyte (Pty) Ltd   1
DataCyte Group of Companies

 •     Founded in 1998
 •     Previously known as World Wide Objects
 •     Privately owned and funded
 •     Development done in Pretoria, South Africa
 •     Expanding to create distribution and partner network
 •     Building relationships with ISVs




29/04/2011                                                    2
DataCyte Timeline
     1998 - Product was conceptualized, developed first version by late 1999.
     2000 - Lodge Patent Application
     2001 - Rated 5-10 years before IBM grid computing initiative by
      DARPA/CSC/Lockheed Martin
            - Awarded United States of America Department of Defense contract
     2002 - Defense contract suspended due to war on terror
     2003 - Return to South Africa due to declaration of war against “terror”
             - Start delivering healthcare systems to South African market
     2005 - Return to the US market with Healthcare and hi-tech value proposition
     2006 - Benchmark data analysis capabilities with Zirmed, prove a 50% in size reduction
      and 10x faster
            - Entered into business relationship with Dr PatrickSoon-Shiong of Abraxis
            Biosciences and American Pharmaceutical Partners Inc.
     2008 - A conflict of product direction emerged with Dr Patrick Soon-Shiong - resulted
      in termination of the relationship. All Intellectual Property rights reverted back to
      DataCyte.


29/04/2011                                                                                     3
DataCyte Timeline cont.
     2008 - Cedars Sinai Cancer and Proteomic Research Unit (UCLA) benchmark
           - DXS Health Care Systems Technology Partnership (www.dxs-systems.com)
           - Trash Can Kids Technology Partnership (www.trashcankidz.com )
           - Electronic Price Labeling Technology Partnership
           - Interactive Television/Phumelela Technology Partnership
           (www.phumelela.com)
     2009 - Establish strategic partnership Health One Global (www.healthoneglobal.com )
           - IR Global Partnership to deliver international roaming at dramatic discounted
           rates and enabling prepaid customer to also roam.
           - Barlow World Logistics Product Development
           - Re-engage with the United Sates Department of Defense through US presence
           - Granted US Patent #7571442




29/04/2011                                                                                   4
29/04/2011   e-Merchandising (Pty) Ltd t/a Revelation Systems   5
DataCyte Timeline cont.
     2010
 April - Booz Allen Hamilton (www.boozallen.com) presents DataCyte as future data
 solution at American Association for the Advancement of Science. AAAS (www.aaas.org)
 is the largest paid circulation of any peer-reviewed general science journal in the world,
 founded in 1848, and is considered one of the global authorities in the direction of Science,
 Engineering and Innovation
 May - Launch Interactive Television, 400 units rolled out in TABS. Prime Media is
 currently finalizing purchase of advertising slots for 12 month period.
 June - Launch of DXS (dxssynergy.com) web based system to the USA market as part of
 its global rollout. A global vendor in the provisioning of healthcare related systems.
 June – Launch of Trash Can Kids (www.trashcankidz.com)
 June – Launch of Process Discovery Product with 2 customers going live this month. The
 system has already being adopted by a large defense manufacturer.
 June - E-Discovery product launched with EM (The largest non-life actuarial consultant
 firm in UK)




29/04/2011                                                                                       6
DataCyte Timeline cont.

    2010
 June - Negotiation started with Bytes Technology and its Med-e-mass
 (www.medemass.com) subsidiary to underpin their current suite of management system
 with a comprehensive EHR Solution for the South African market.
 July - Health One Global (www.healthoneglobal.com.au) launches Personal Electronic
 Health Record and Medical Management record in Australia. This launch coincides with the
 launch of the Australian Government Unique personal health identifier, with the support of
 the Australian Automobile Association and the Royal Academy of Physicians as a first step
 to provide the Australian a health record management service. The Australian government
 has legislated that all citizens must have these records in place by 2013.




29/04/2011                                                                                    7
Why DataCyte?




29/04/2011    e-Merchandising (Pty) Ltd t/a Revelation Systems   8
Computing Challenge
 •     The “Global Village” has “Global Data”
       •     Boundaries removed
       •     Information flow is more pervasive
 •     Physical Storage
       •     Users store more data than ever before
       •     Little new development in Data Retrieval Systems
 •     Processing
       •     More processing required to retrieve similar data
       •     Little development in Computing Processing Systems
 •     Present Business Tendencies
       •     Swing back to centralized systems
       •     Swing back to thin client




29/04/2011                        DataCyte (Pty) Ltd              9
DataCyte Patented Solution
•     Performance not dependent on number of records

•     No single point of vulnerability
      •      No central registry
      •      Information redundantly distributed
      •      RAIS – Redundant Array of Inexpensive Servers

•     Dynamic, Intelligent Information
      •      Contextual „named‟ links between data entities
      •      Dynamic data structure
      •      Pervasive Associations

•     Self-managing, Distributed Information Structures
      •      Any Entity must have „independence of existence‟
      •      Entities „self-aware‟ of environment

•     Web-enabled with open interface - Apache

29/04/2011                         DataCyte (Pty) Ltd           10
PerformanceFeatures
•     Access by association
•     Fully distributed storage system
•     DataCyte data storage is 10% of the size of traditional
      systems
•     Sustainable data creation at 400 000 cytes per second on
      a standard desktop computer
•     Random data access speed of over 250 000 cytes per
      second on a standard desktop computer
•     Caches up to 25 000 000 cytes in 2Gb memory
•     Can access from 250 000 000 000 cytes in sub-
      millisecond
•     Runs on Linux and Windows


29/04/2011                  DataCyte (Pty) Ltd                   11
DataCyte Technology

Cyte
       • Parent Registry
       • Child Registry
       • BLOB content
           • Any data form
             • Code
                 • Lua
                 • Others possible
       • Flags
           • Security/Access
             control
             • Content type, etc
       • Native methods
          • Provided by service

29/04/2011                           DataCyte (Pty) Ltd   12
Access Models
•    Multiple Logical Models: Data and application layer

         Network        Structured              Containment




29/04/2011                 DataCyte (Pty) Ltd                 13
Case Studies




29/04/2011     DataCyte (Pty) Ltd   14
Case Studies                                                  DataCyte Pilot/Test Process
    Not part of test –
     steady source
                                                    Current State Process


                           DataStage ETL
                                                                    MS SSIS                   MDX
        BAH DW            Process Timing:        PME Data                                                 MS SSRS
                                                                              MS SSAS         Query
        Oracle              4.5 hrs daily          Mart                                                   Reports
                                                  Oracle –
                         6.5 hrs for closings     750GIG
                          (2 times month)




                                                             DataCyte Test

                                            DataCyte Extraction &                               Web
        BAH DW                                   Translation                                                  Crystal
                                                                                   DataCyte     Service
        Oracle                                                                                                Reports
                                                                                     Fact
                                                                                    Maps



29/04/2011                                                    DataCyte (Pty) Ltd                                        15
Case Studies
•     BAH: Implementation of BI Reporting
                                       Oracle                        DataCyte
      •      Database size                 750Gb                     51Gb (6,8%)
      •      Retrieval speeds:
              Indexed Random Access 0,152 secs                       0,008 secs (5,2%)
              Indexed Step Thru     0,963 secs                       0,016 secs (1,66%)
              Unindexed Step Thru   2.515 secs                       10.78 secs (425%)

      •      Hardware Platform:
              ±US$2 000 000                               US$ 1 000
              SunTM Grid Rack 400, SAN                    Low-end DesktopVM
              1 x Staging areas, 1 x Cube storage         2,33GH processor 200GB HDD

      •      Software: ±US$3 000 000                      Software: ± US$ 500 000
              ELT Toolkit,                                DataCyte
              2 x Oracle 11g
              MS HyperCube

29/04/2011                           DataCyte (Pty) Ltd                                   16
Other Case Studies
•     Proteomic Research Unit
      •      Database Size
                 1,3Tb in Oracle                        60Gb in DataCyte
      •      Retrieval speeds:
                 1½ minutes                             < 1sec in DataCyte
                 1 - 2 days                             < 11-66 mins in DataCyte
      •      Hardware Platform:
                 SunTM Grid Rack of 400                 Toshiba Laptop
                 Sun FireTM x64 servers                 1,86GH processor
                                                        7 200rpm drive


•     UCS SAP database
      •      860Gb in DB2 database 100Gb in DataCyte
      •      Queries up to 1000 times faster




29/04/2011                         DataCyte (Pty) Ltd                              17
Applications Developed

 •      Knowledge Management Systems
        o    e-Learning Systems
        o    Interactive TV Management Systems
        o    Medical Information Systems


 •      Health Management – “Single Patient Record”
        •    Practice Management
        •    Clinic Management System
        •    Pathology Laboratory Management
        •    Clinical Trials System
        •    Hospital Management System




29/04/2011                     DataCyte (Pty) Ltd     18
Applications Developed
•      Data Warehousing
       o     ETL
       o     “Data Cube”
       o     Lawgistics
       o     Fraud Detection
•      SME Payroll System
•      Process Management Server
       o     Document Tracking Systems
       o     Business Process Modeling
       o     Supply Chain Management System
•      Computational Performance Systems
       o     Biometrics
       o     Proteomic and Genomic Analysis
       o     Shortest Path Routing



29/04/2011                      DataCyte (Pty) Ltd   19
DataCyte Benefits

 •     90% reduction in hardware requirements

 •     10 to 1000 time speed improvement

 •     Ability to populate archive/warehouse in real-time

 •     Ability to access archived data faster than existing on-
       line live system

 •     Extension of life of live systems

 •     Greater security due to ALL history on-line


29/04/2011                   DataCyte (Pty) Ltd                   20
Contact Details
•     DataCyte (Pty) Ltd
      • 489 Clarence Street        Tel:    +27 12 993 1256
          Waterkloof Glen          Fax:    +27 12 993 2412
          Pretoria
•     Michael F Salomon            CEO
      • Cell:        +27 82 552 5411
•     Peter Salemink               COO
      • Cell:        +27 83 677 2783
•     Daniel Opland                Technology Evangelist
      • Cell:        +27 83 312 5947




29/04/2011                    DataCyte (Pty) Ltd             21
Customers
      •      Booz Allen Hamilton Inc

      •      South African Fraud Prevention Service

      •      TrashCanKidz Limited

      •      Broadband Interactive TV System

      •      PayStaffOnline (Pty) Ltd

      •      360 Link-up Limited / EMC Limited




29/04/2011                        DataCyte (Pty) Ltd   22
Back-up Slides




29/04/2011           DataCyte (Pty) Ltd   23
Technology Overview
Database Management System
• Access
   • Object
   • SQL
   • Cyte

Etymology: “Cyte”
• Ancient Greek word κύτος (kýtos)
   • Container or Receptacle
   • Human body → part of cell that keeps everything together

Developed in C++
• Runs on Windows and Linux

ODBC, XML and Web Service access

Apache module: mod_dsa
• HTTP(S), FTP, WSDL, SOAP, …
Technology Overview
• Store: any form of data → „Cytes‟
   • Serialized and persisted on creation (more later)
   • Accessed by association in a contextual / stateful manner
       • Collectively form multiple intersecting hierarchies
   • Each Cyte has the potential to form part of a distributed cloud
   • Virtualize disparate data → single federated view
   • Contain application business logic
       • Lua (www.lua.org)

• Lua
   • Powerful, fast, lightweight scripting language
   • Embedable
   • Lua is widely used:
      • Industrial Applications (Adobe: Photoshop Lightroom)
      • Games (Blizzard: World of Warcraft)
      • Embedded Systems (Ginga, Digital TV in Brazil)
   • Lua Server Pages
      • Tag-based Web applications that dynamically generate
         Web pages
Technology Overview
Execution Layers
• Application Layer
• Data / Engine Layer
Technology Overview

Basic Performance
(1.6Ghz Dual Core, 3Gb RAM, 7 200 rpm drive)

Sustained creation speed
•   400 000 cytes per second
Sequential access speed
•   400 000 cytes per second
Random access speed
•   250 000 cytes per second
Cache
•   25 000 000 cytes in 2Gb memory
Access
•   Any element from 250 billion elements in under a millisecond
Data Structure

Cyte
  • Parent Registry
  • Child Registry
  • BLOB content
      • Any data form
       • Code
           • Lua
           • Others possible
  • Flags
      • Security/Access
       control
       • Content type, etc
  • Native methods
     • Provided by service
Data Structure
Data Structure




Demonstration: IDE
Data Structure

Logical representations: data & application layer
• Network representation
• Structured representation
• Containment representation




  Network              Structured    Containment
Data Structure

Complexity vs Simplicity
• Simpler → faster learning curve

• Translation layer
     • RDBMS
         • Programmed
         • Maintained
              • Adding features, fixing bugs, improvements
              • Collectively comprise 80% of lifetime cost

    • DataCyte
        • No translation layer
             • Saving: Development (Time and Cost)
        • Integrated into database layer (a la EJB)
             • e.g. Cytes with application logic
Data Structure


Impedance of Mismatch (Translation Layer)
• Maintenance and Development (RDBMS)
   • Different mapping → mismatch and integrity violation
        • Subtle Issues
        • Difficult to locate (time + money)
   • Lower impedance of mismatch in DataCyte
        • No translation layer → natural modelling of data

Architecture: Simple
• Option: logically structure and constrain → RDBMS + ODBC
• Multiple logical views of the same data
   • Facilitates conformance to multiple standards
Discovery



   Logical Model
                           1       2   3       4   External


                           Conceptual Model        Conceptual
   Physical
                   Model


                               1           2       Physical

Cytes
→ Logical representation of physical storage
→ Navigational construct
                each navigation → physical disk read
→ Brokered by DataCyte service
Discovery
Each Cyte is addressed as:
       IP address + Disc + File + Position in file
Discovery

Cytes contain application logic
   • Variables → pointers
       • Access stored data
       • Execute application code




 Variable     =
Discovery

Contextual Execution

 Demonstration: IDE



                       …
Discovery

• Cyte Discovery
   • Relative Paths
       • in reference to executing Cyte
             • getAge() example

   • Absolute paths
       • in reference to root Cyte




                   ?
Query Approach
Query Approach

• Types of Queries
   • Without indexes
        • each record is checked in turn
   • Indexed
        • filtered records

• Query approach (same as RDBMS)
      Know what you are looking for
            AND
   Where you want to look for it

• Query Steps:

   STEP 1: IDENTIFY                   STEP 2: POPULATE
     RESULT SET                         RESULT SET
Query Approach
• At time of query
   RESULT SET IDENTIFICATION
      • Improved indexes

   RESULT SET POPULATION (Compound)
      • Traverse logical layer (minimal reads)
      • Context = Stateful Results
                   vs
      • Additional external lookups




    Additional External       Navigate through Logical Structure
         Lookups                 O(n) → O(1)
Query Approach

Define: Contextual / Stateful Results
Query Approach

Define: Contextual / Stateful Results
Query Approach

Addressing schema
   • Defines context of access
   • Cyte → Unique ID within local file system
       • Offset within file
       • Cytes simply exist within file system
            • No global registry

Multiple Contexts = Multiple Addresses




 Address = Context = Chain of ID‟s (named)
Query Approach

• Query Language
   • Show of hands: SQL users
   • Similar to XPath
       • Parent or child Cytes (multiple criteria)
   • SQL Interface
       • Cytes that conform to relational model


• Lower complexity of architecture
   • More natural language
   • Steeper learning curve (learn more faster)
Performance & Scalability

• Sub-linear Performance Degradation
    • Logical Layer → Directed Searches
    • Example: Geo-spatial modelling

• Instantiation
    • Full control over level
    • No class hierarchy

• Multiple Logical Structures
   • Same data, different context
    • Multi-dimensional searches → single dimension
Performance & Scalability
• OLTP
   • Architecture marries Structured + Networking paradigms
   • Container Topology
        • Allows extensible heterogeneous data structuring
   • 3-stage versioning protocol
        • Balance: performance and integrity

• Data Footprint
   • Encoding and Compressing on storage

• No Intermediate link tables




                       Intermediate
 Products                  Table                Ingredients
Performance & Scalability
•   Proof of Concept: 2008
•   Cancer research hospital (Los Angeles)
•   Considerable funding
•   A proteomic analysis problem – blood analysis study
     • Data mining to search for cancer markers
     • 50 data samples
     • 250 billion data elements
     • 1.3 Tb in Oracle

• Results are from the same data set and same queries
                             Cancer Center          DataCyte
       Single criteria queries   1½ minutes              < 1 sec
       Complex Queries           ± 1 – 2 days            < 11-66 mins
       Hardware                  Sun™ Grid Rack 400      Laptop
                                 Sun Fire™ x64 servers
       Data footprint            1.3Tb                   60Gb
Performance & Scalability


• No Intermediate link tables




                    Intermediate
   Products             Table      Ingredients
Query Approach

Define: Contextual / Stateful Results
Query Approach
Define: Contextual / Stateful Results
Data Storage

o Serialized

o Encoded                      Open
                   Encode                  Decode
o Compressed
                              Encoded
o Pages
                                          Decompress
                  Compress
o Caching Stack              Encoded &
                             Compressed
o Data Distance
Data Storage
o   Leaf nodes
o   Stack management
o   Partial Decoding
o   Data Management
Performance
Four Spheres of Influence during Design
Data Storage
•   Enterprise Cloud Storage
•   Soft RAIS using commodity hardware
•   RAIS provides soft parallelized, grid computing
•   Soft RAIS enables redundant distribution of cytes
•   Granular scalability and full sharability of resources
•   Elastic auto provision of service and resources
•   Unified access to data through multiple data models
•   New programming approaches unconfined by
    • old designs and
    • existing programming languages
   • to tackle the new data flood.
• Green
    • Footprint
    • Power usage – running, cooling and start-up
Data Storage

• External Data Sources
   • Lua add-on libraries: LuaCOM and ADOLua

• Access: Data Services
   • MSSQL          NCLI           (Native Client
     Interface)
   • DB2            OLEDB          (Object Linking and
     Embedding)
   • Oracle         OLEDB          (Object Linking and
     Embedding)
Security
• Security implemented by the service on the cyte level
• Domain-based, inclusion, exclusion
• Cyte-to-cyte communication is encrypted
• Redundant distribution of cytes provides additional
  security
• Contextual access provides further flexibility for
  security
    • Child and parent presentation
• Hardware encryption of storage is preferable
• Cyte granularity enables
    • Blind security information retaining associations
    • Cleansed health data with relationships
• DataCyte can integrate with existing authentication /
  authorization systems (LDAP, Active Directory)
Disaster Recovery
• Transaction-based with roll-back
• All transactions are Atomic, Consistent, Isolated and
  Durable (ACID)
• 3-state versioning protocol
   • My old
   • My new
   • Yours
•     provides fine grain control
• Balance between performance and integrity
  mitigation
• Each service can partial recover from physical loss
• Redundancy could provide complete recovery

Contenu connexe

Tendances

From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersDenodo
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Data warehousinginterviewquestionsanswers
Data warehousinginterviewquestionsanswersData warehousinginterviewquestionsanswers
Data warehousinginterviewquestionsanswerssasap777
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012Jos van Dongen
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 
Silverton cleversafe-object-based-dispersed-storage
Silverton cleversafe-object-based-dispersed-storageSilverton cleversafe-object-based-dispersed-storage
Silverton cleversafe-object-based-dispersed-storageAccenture
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Rob Grim
 
Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Jane Roberts
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseOsama Hussein
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Kent Graziano
 

Tendances (20)

From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Data warehousinginterviewquestionsanswers
Data warehousinginterviewquestionsanswersData warehousinginterviewquestionsanswers
Data warehousinginterviewquestionsanswers
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Data lakes
Data lakesData lakes
Data lakes
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
DW 101
DW 101DW 101
DW 101
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Silverton cleversafe-object-based-dispersed-storage
Silverton cleversafe-object-based-dispersed-storageSilverton cleversafe-object-based-dispersed-storage
Silverton cleversafe-object-based-dispersed-storage
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...
 
Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0Whitepaper-The-Data-Lake-3_0
Whitepaper-The-Data-Lake-3_0
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 

Similaire à DataCyte - The Future of Data Storage & Retrieval

Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDenodo
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsDavid Bennett
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture Daniele Bailo
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big DataPaul Barsch
 
Managing the financial services data explosion
Managing the financial services data explosionManaging the financial services data explosion
Managing the financial services data explosionLaura Hood
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeDenodo
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureSingleStore
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 

Similaire à DataCyte - The Future of Data Storage & Retrieval (20)

Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data Fabric
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture Designing TCS e-Infrastructure: data, metadata and architecture
Designing TCS e-Infrastructure: data, metadata and architecture
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Managing the financial services data explosion
Managing the financial services data explosionManaging the financial services data explosion
Managing the financial services data explosion
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

Dernier

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Dernier (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

DataCyte - The Future of Data Storage & Retrieval

  • 1. Converting Data to Information 29/04/2011 DataCyte (Pty) Ltd 1
  • 2. DataCyte Group of Companies • Founded in 1998 • Previously known as World Wide Objects • Privately owned and funded • Development done in Pretoria, South Africa • Expanding to create distribution and partner network • Building relationships with ISVs 29/04/2011 2
  • 3. DataCyte Timeline  1998 - Product was conceptualized, developed first version by late 1999.  2000 - Lodge Patent Application  2001 - Rated 5-10 years before IBM grid computing initiative by DARPA/CSC/Lockheed Martin - Awarded United States of America Department of Defense contract  2002 - Defense contract suspended due to war on terror  2003 - Return to South Africa due to declaration of war against “terror” - Start delivering healthcare systems to South African market  2005 - Return to the US market with Healthcare and hi-tech value proposition  2006 - Benchmark data analysis capabilities with Zirmed, prove a 50% in size reduction and 10x faster - Entered into business relationship with Dr PatrickSoon-Shiong of Abraxis Biosciences and American Pharmaceutical Partners Inc.  2008 - A conflict of product direction emerged with Dr Patrick Soon-Shiong - resulted in termination of the relationship. All Intellectual Property rights reverted back to DataCyte. 29/04/2011 3
  • 4. DataCyte Timeline cont.  2008 - Cedars Sinai Cancer and Proteomic Research Unit (UCLA) benchmark - DXS Health Care Systems Technology Partnership (www.dxs-systems.com) - Trash Can Kids Technology Partnership (www.trashcankidz.com ) - Electronic Price Labeling Technology Partnership - Interactive Television/Phumelela Technology Partnership (www.phumelela.com)  2009 - Establish strategic partnership Health One Global (www.healthoneglobal.com ) - IR Global Partnership to deliver international roaming at dramatic discounted rates and enabling prepaid customer to also roam. - Barlow World Logistics Product Development - Re-engage with the United Sates Department of Defense through US presence - Granted US Patent #7571442 29/04/2011 4
  • 5. 29/04/2011 e-Merchandising (Pty) Ltd t/a Revelation Systems 5
  • 6. DataCyte Timeline cont.  2010 April - Booz Allen Hamilton (www.boozallen.com) presents DataCyte as future data solution at American Association for the Advancement of Science. AAAS (www.aaas.org) is the largest paid circulation of any peer-reviewed general science journal in the world, founded in 1848, and is considered one of the global authorities in the direction of Science, Engineering and Innovation May - Launch Interactive Television, 400 units rolled out in TABS. Prime Media is currently finalizing purchase of advertising slots for 12 month period. June - Launch of DXS (dxssynergy.com) web based system to the USA market as part of its global rollout. A global vendor in the provisioning of healthcare related systems. June – Launch of Trash Can Kids (www.trashcankidz.com) June – Launch of Process Discovery Product with 2 customers going live this month. The system has already being adopted by a large defense manufacturer. June - E-Discovery product launched with EM (The largest non-life actuarial consultant firm in UK) 29/04/2011 6
  • 7. DataCyte Timeline cont.  2010 June - Negotiation started with Bytes Technology and its Med-e-mass (www.medemass.com) subsidiary to underpin their current suite of management system with a comprehensive EHR Solution for the South African market. July - Health One Global (www.healthoneglobal.com.au) launches Personal Electronic Health Record and Medical Management record in Australia. This launch coincides with the launch of the Australian Government Unique personal health identifier, with the support of the Australian Automobile Association and the Royal Academy of Physicians as a first step to provide the Australian a health record management service. The Australian government has legislated that all citizens must have these records in place by 2013. 29/04/2011 7
  • 8. Why DataCyte? 29/04/2011 e-Merchandising (Pty) Ltd t/a Revelation Systems 8
  • 9. Computing Challenge • The “Global Village” has “Global Data” • Boundaries removed • Information flow is more pervasive • Physical Storage • Users store more data than ever before • Little new development in Data Retrieval Systems • Processing • More processing required to retrieve similar data • Little development in Computing Processing Systems • Present Business Tendencies • Swing back to centralized systems • Swing back to thin client 29/04/2011 DataCyte (Pty) Ltd 9
  • 10. DataCyte Patented Solution • Performance not dependent on number of records • No single point of vulnerability • No central registry • Information redundantly distributed • RAIS – Redundant Array of Inexpensive Servers • Dynamic, Intelligent Information • Contextual „named‟ links between data entities • Dynamic data structure • Pervasive Associations • Self-managing, Distributed Information Structures • Any Entity must have „independence of existence‟ • Entities „self-aware‟ of environment • Web-enabled with open interface - Apache 29/04/2011 DataCyte (Pty) Ltd 10
  • 11. PerformanceFeatures • Access by association • Fully distributed storage system • DataCyte data storage is 10% of the size of traditional systems • Sustainable data creation at 400 000 cytes per second on a standard desktop computer • Random data access speed of over 250 000 cytes per second on a standard desktop computer • Caches up to 25 000 000 cytes in 2Gb memory • Can access from 250 000 000 000 cytes in sub- millisecond • Runs on Linux and Windows 29/04/2011 DataCyte (Pty) Ltd 11
  • 12. DataCyte Technology Cyte • Parent Registry • Child Registry • BLOB content • Any data form • Code • Lua • Others possible • Flags • Security/Access control • Content type, etc • Native methods • Provided by service 29/04/2011 DataCyte (Pty) Ltd 12
  • 13. Access Models • Multiple Logical Models: Data and application layer Network Structured Containment 29/04/2011 DataCyte (Pty) Ltd 13
  • 14. Case Studies 29/04/2011 DataCyte (Pty) Ltd 14
  • 15. Case Studies DataCyte Pilot/Test Process Not part of test – steady source Current State Process DataStage ETL MS SSIS MDX BAH DW Process Timing: PME Data MS SSRS MS SSAS Query Oracle 4.5 hrs daily Mart Reports Oracle – 6.5 hrs for closings 750GIG (2 times month) DataCyte Test DataCyte Extraction & Web BAH DW Translation Crystal DataCyte Service Oracle Reports Fact Maps 29/04/2011 DataCyte (Pty) Ltd 15
  • 16. Case Studies • BAH: Implementation of BI Reporting Oracle DataCyte • Database size 750Gb 51Gb (6,8%) • Retrieval speeds: Indexed Random Access 0,152 secs 0,008 secs (5,2%) Indexed Step Thru 0,963 secs 0,016 secs (1,66%) Unindexed Step Thru 2.515 secs 10.78 secs (425%) • Hardware Platform: ±US$2 000 000 US$ 1 000 SunTM Grid Rack 400, SAN Low-end DesktopVM 1 x Staging areas, 1 x Cube storage 2,33GH processor 200GB HDD • Software: ±US$3 000 000 Software: ± US$ 500 000 ELT Toolkit, DataCyte 2 x Oracle 11g MS HyperCube 29/04/2011 DataCyte (Pty) Ltd 16
  • 17. Other Case Studies • Proteomic Research Unit • Database Size 1,3Tb in Oracle 60Gb in DataCyte • Retrieval speeds: 1½ minutes < 1sec in DataCyte 1 - 2 days < 11-66 mins in DataCyte • Hardware Platform: SunTM Grid Rack of 400 Toshiba Laptop Sun FireTM x64 servers 1,86GH processor 7 200rpm drive • UCS SAP database • 860Gb in DB2 database 100Gb in DataCyte • Queries up to 1000 times faster 29/04/2011 DataCyte (Pty) Ltd 17
  • 18. Applications Developed • Knowledge Management Systems o e-Learning Systems o Interactive TV Management Systems o Medical Information Systems • Health Management – “Single Patient Record” • Practice Management • Clinic Management System • Pathology Laboratory Management • Clinical Trials System • Hospital Management System 29/04/2011 DataCyte (Pty) Ltd 18
  • 19. Applications Developed • Data Warehousing o ETL o “Data Cube” o Lawgistics o Fraud Detection • SME Payroll System • Process Management Server o Document Tracking Systems o Business Process Modeling o Supply Chain Management System • Computational Performance Systems o Biometrics o Proteomic and Genomic Analysis o Shortest Path Routing 29/04/2011 DataCyte (Pty) Ltd 19
  • 20. DataCyte Benefits • 90% reduction in hardware requirements • 10 to 1000 time speed improvement • Ability to populate archive/warehouse in real-time • Ability to access archived data faster than existing on- line live system • Extension of life of live systems • Greater security due to ALL history on-line 29/04/2011 DataCyte (Pty) Ltd 20
  • 21. Contact Details • DataCyte (Pty) Ltd • 489 Clarence Street Tel: +27 12 993 1256 Waterkloof Glen Fax: +27 12 993 2412 Pretoria • Michael F Salomon CEO • Cell: +27 82 552 5411 • Peter Salemink COO • Cell: +27 83 677 2783 • Daniel Opland Technology Evangelist • Cell: +27 83 312 5947 29/04/2011 DataCyte (Pty) Ltd 21
  • 22. Customers • Booz Allen Hamilton Inc • South African Fraud Prevention Service • TrashCanKidz Limited • Broadband Interactive TV System • PayStaffOnline (Pty) Ltd • 360 Link-up Limited / EMC Limited 29/04/2011 DataCyte (Pty) Ltd 22
  • 23. Back-up Slides 29/04/2011 DataCyte (Pty) Ltd 23
  • 24. Technology Overview Database Management System • Access • Object • SQL • Cyte Etymology: “Cyte” • Ancient Greek word κύτος (kýtos) • Container or Receptacle • Human body → part of cell that keeps everything together Developed in C++ • Runs on Windows and Linux ODBC, XML and Web Service access Apache module: mod_dsa • HTTP(S), FTP, WSDL, SOAP, …
  • 25. Technology Overview • Store: any form of data → „Cytes‟ • Serialized and persisted on creation (more later) • Accessed by association in a contextual / stateful manner • Collectively form multiple intersecting hierarchies • Each Cyte has the potential to form part of a distributed cloud • Virtualize disparate data → single federated view • Contain application business logic • Lua (www.lua.org) • Lua • Powerful, fast, lightweight scripting language • Embedable • Lua is widely used: • Industrial Applications (Adobe: Photoshop Lightroom) • Games (Blizzard: World of Warcraft) • Embedded Systems (Ginga, Digital TV in Brazil) • Lua Server Pages • Tag-based Web applications that dynamically generate Web pages
  • 26. Technology Overview Execution Layers • Application Layer • Data / Engine Layer
  • 27. Technology Overview Basic Performance (1.6Ghz Dual Core, 3Gb RAM, 7 200 rpm drive) Sustained creation speed • 400 000 cytes per second Sequential access speed • 400 000 cytes per second Random access speed • 250 000 cytes per second Cache • 25 000 000 cytes in 2Gb memory Access • Any element from 250 billion elements in under a millisecond
  • 28. Data Structure Cyte • Parent Registry • Child Registry • BLOB content • Any data form • Code • Lua • Others possible • Flags • Security/Access control • Content type, etc • Native methods • Provided by service
  • 31. Data Structure Logical representations: data & application layer • Network representation • Structured representation • Containment representation Network Structured Containment
  • 32. Data Structure Complexity vs Simplicity • Simpler → faster learning curve • Translation layer • RDBMS • Programmed • Maintained • Adding features, fixing bugs, improvements • Collectively comprise 80% of lifetime cost • DataCyte • No translation layer • Saving: Development (Time and Cost) • Integrated into database layer (a la EJB) • e.g. Cytes with application logic
  • 33. Data Structure Impedance of Mismatch (Translation Layer) • Maintenance and Development (RDBMS) • Different mapping → mismatch and integrity violation • Subtle Issues • Difficult to locate (time + money) • Lower impedance of mismatch in DataCyte • No translation layer → natural modelling of data Architecture: Simple • Option: logically structure and constrain → RDBMS + ODBC • Multiple logical views of the same data • Facilitates conformance to multiple standards
  • 34. Discovery Logical Model 1 2 3 4 External Conceptual Model Conceptual Physical Model 1 2 Physical Cytes → Logical representation of physical storage → Navigational construct each navigation → physical disk read → Brokered by DataCyte service
  • 35. Discovery Each Cyte is addressed as: IP address + Disc + File + Position in file
  • 36. Discovery Cytes contain application logic • Variables → pointers • Access stored data • Execute application code Variable =
  • 38. Discovery • Cyte Discovery • Relative Paths • in reference to executing Cyte • getAge() example • Absolute paths • in reference to root Cyte ?
  • 40. Query Approach • Types of Queries • Without indexes • each record is checked in turn • Indexed • filtered records • Query approach (same as RDBMS) Know what you are looking for AND Where you want to look for it • Query Steps: STEP 1: IDENTIFY STEP 2: POPULATE RESULT SET RESULT SET
  • 41. Query Approach • At time of query RESULT SET IDENTIFICATION • Improved indexes RESULT SET POPULATION (Compound) • Traverse logical layer (minimal reads) • Context = Stateful Results vs • Additional external lookups Additional External Navigate through Logical Structure Lookups O(n) → O(1)
  • 42. Query Approach Define: Contextual / Stateful Results
  • 43. Query Approach Define: Contextual / Stateful Results
  • 44. Query Approach Addressing schema • Defines context of access • Cyte → Unique ID within local file system • Offset within file • Cytes simply exist within file system • No global registry Multiple Contexts = Multiple Addresses Address = Context = Chain of ID‟s (named)
  • 45. Query Approach • Query Language • Show of hands: SQL users • Similar to XPath • Parent or child Cytes (multiple criteria) • SQL Interface • Cytes that conform to relational model • Lower complexity of architecture • More natural language • Steeper learning curve (learn more faster)
  • 46. Performance & Scalability • Sub-linear Performance Degradation • Logical Layer → Directed Searches • Example: Geo-spatial modelling • Instantiation • Full control over level • No class hierarchy • Multiple Logical Structures • Same data, different context • Multi-dimensional searches → single dimension
  • 47. Performance & Scalability • OLTP • Architecture marries Structured + Networking paradigms • Container Topology • Allows extensible heterogeneous data structuring • 3-stage versioning protocol • Balance: performance and integrity • Data Footprint • Encoding and Compressing on storage • No Intermediate link tables Intermediate Products Table Ingredients
  • 48. Performance & Scalability • Proof of Concept: 2008 • Cancer research hospital (Los Angeles) • Considerable funding • A proteomic analysis problem – blood analysis study • Data mining to search for cancer markers • 50 data samples • 250 billion data elements • 1.3 Tb in Oracle • Results are from the same data set and same queries Cancer Center DataCyte Single criteria queries 1½ minutes < 1 sec Complex Queries ± 1 – 2 days < 11-66 mins Hardware Sun™ Grid Rack 400 Laptop Sun Fire™ x64 servers Data footprint 1.3Tb 60Gb
  • 49. Performance & Scalability • No Intermediate link tables Intermediate Products Table Ingredients
  • 50. Query Approach Define: Contextual / Stateful Results
  • 51. Query Approach Define: Contextual / Stateful Results
  • 52. Data Storage o Serialized o Encoded Open Encode Decode o Compressed Encoded o Pages Decompress Compress o Caching Stack Encoded & Compressed o Data Distance
  • 53. Data Storage o Leaf nodes o Stack management o Partial Decoding o Data Management
  • 54. Performance Four Spheres of Influence during Design
  • 55. Data Storage • Enterprise Cloud Storage • Soft RAIS using commodity hardware • RAIS provides soft parallelized, grid computing • Soft RAIS enables redundant distribution of cytes • Granular scalability and full sharability of resources • Elastic auto provision of service and resources • Unified access to data through multiple data models • New programming approaches unconfined by • old designs and • existing programming languages • to tackle the new data flood. • Green • Footprint • Power usage – running, cooling and start-up
  • 56. Data Storage • External Data Sources • Lua add-on libraries: LuaCOM and ADOLua • Access: Data Services • MSSQL NCLI (Native Client Interface) • DB2 OLEDB (Object Linking and Embedding) • Oracle OLEDB (Object Linking and Embedding)
  • 57. Security • Security implemented by the service on the cyte level • Domain-based, inclusion, exclusion • Cyte-to-cyte communication is encrypted • Redundant distribution of cytes provides additional security • Contextual access provides further flexibility for security • Child and parent presentation • Hardware encryption of storage is preferable • Cyte granularity enables • Blind security information retaining associations • Cleansed health data with relationships • DataCyte can integrate with existing authentication / authorization systems (LDAP, Active Directory)
  • 58. Disaster Recovery • Transaction-based with roll-back • All transactions are Atomic, Consistent, Isolated and Durable (ACID) • 3-state versioning protocol • My old • My new • Yours • provides fine grain control • Balance between performance and integrity mitigation • Each service can partial recover from physical loss • Redundancy could provide complete recovery