SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
STORAGE CHARACTERISTICS
   OF CALL DATA RECORDS
IN COLUMN STORE DATABASES
           D AV I D M WA L K E R
D ATA M A N A G E M E N T & WA R E H O U S I N G
OVERVIEW

 •  This presentation gives a brief overview of the
    storage characteristics of Call Data Records in
    Column Store Databases
 •  It discusses
     •  What are Call Data Records (CDRs)?
     •  What is a Column Store Database?
     •  How efficient is a column store database for storing CDR
        and other (similar) machine generated data?
 •  It does not:
     •  Examine performance in any detail
     •  Compare column store to traditional row-based

Jan 2012                © 2012 Data Management & Warehousing       2
WHAT ARE CALL DATA RECORDS
                     (CDRs) ?
 •  Every time a telephone call is made data about
    that call is recorded. At its most basic this will
    include:
     •     The Calling Number (who made the call)
     •     The Called Number (who was called)
     •     The Start Time
     •     The End Time (or the duration)
     •     Various pieces of technical information (which network
           switch was used, mobile handset identifier, call direction, is it
           a free x800 type call etc.)




Jan 2012                     © 2012 Data Management & Warehousing          3
CDRs AT MULTIPLE LEVELS

 •  A CDR is created at the switch, each switch
    involved in a call creates its own CDRs, these are
    often called Network CDRs

 •  The Network CDRs are joined together into a record
    of an end to end call record through a process
    known as mediation. These are Unrated CDRs

 •  Finally the cost of the call is calculated and added
    to the Unrated CDRs to create Rated CDRs


Jan 2012           © 2012 Data Management & Warehousing    4
MORE CDR COMPLEXITY

 •  There are CDRs that are used for billing the subscriber,
    often called Retail CDRs

 •  There are also CDRs that are used to charge other
    operators when their call travels over your network (e.g.
    when you make a mobile call that finishes on land line
    from another operator) These are known as Interconnect
    CDRs or Wholesale CDRs

 •  There are also differences between Mobile and Fixed
    (Land) Line CDRs

 •  Finally each Switch Manufacturer (there are over 60)
    and each Mediation and/or Billing system (again at least
    50) uses their own format

Jan 2012             © 2012 Data Management & Warehousing      5
FOR THIS EXERCISE …

 •  We are using a European Telephone Company
    (Telco) Mobile Rated Interconnect CDRs

 •  We have 12,902 files, containing 435,242,447 CDRs
    over a 181 day period from 482,883 subscribers

 •  Each CDR has 80 fields and 583 characters in a
    fixed length record format file. In addition we have
    added an additional mandatory field to hold the
    source file name from which the record came


Jan 2012           © 2012 Data Management & Warehousing    6
DATA DISTRIBUTION IN THE CDR
                    RECORDS (1)
 •  The structure of the data in the record has a
    massive impact on its storage. There are a number
    of factors to look at:
     •  Data Types, Padding, Place Holders and Data Cardinality

 •  The example data we are using has 2 Datetime
    fields, 11 Char fields, 10 Numeric fields, 33 Integer
    fields and 25 Varchar fields which is a fairly typical
    mix for this type of machine generated data. In the
    source file these are all held as ASCII text.


Jan 2012               © 2012 Data Management & Warehousing       7
DATA DISTRIBUTION IN THE CDR
                    RECORDS (2)
 •  Fixed length records are padded. In our data set
    the ‘Calling Number’ fixed length field is defined as
    24 characters long however the maximum field
    length in the actual data is only 11 characters long.
    This means that there always 13 space characters
    of padding afterwards

 •  24 of our 80 fields have no information in them at all,
    43 of the fields are mandatory and are 100%
    populated. The remaining 13 fields have between
    25% and 75% of the records filled.

Jan 2012            © 2012 Data Management & Warehousing    8
DATA DISTRIBUTION IN THE CDR
                    RECORDS (3)
 •  Finally the number of discreet values (cardinality) a
    field has affects storage. One flag field has possible
    values of 0 or 1 and therefore a (low) cardinality of
    2, another field has a nearly unique value for every
    record and therefore a very high cardinality. Of the
    57 fields with data there are 20 fields with high
    cardinality, 5 fields with medium cardinality and the
    remaining 32 fields have a low cardinality




Jan 2012           © 2012 Data Management & Warehousing      9
WHAT IS A
              COLUMN STORE DATABASE?
 •  Traditionally databases are ‘row-based’ i.e. each
    field of data in a record is stored next to each other.
           Forename    Surname                     Gender
           David       Walker                      Male
           Helen       Walker                      Female
           Sheila      Jones                       Female

 •  Column store databases store the values in columns
    and then hold a mapping to form the record
 •  This is transparent to the user, who queries a table
    with SQL in exactly the same way as they would a
    row-based database
Jan 2012              © 2012 Data Management & Warehousing   10
COLUMN STORAGE

   First Name     F Token        Note: To the user this appears as a conventional
                                row-based table that can be queried by standard
   Value                        SQL, it is only the underlying storage that is different
   David          PPP
   Helen          QQQ                       F Token S Token           G Token
   Sheila         RRR                       PPP           YYY         BBB
   Surname Value S Token                    QQQ           YYY         AAA
   Jones          XXX                       RRR           XXX         AAA
   Walker         YYY

   Gender Value   G Token
   Female         AAA
   Male           BBB

Jan 2012                © 2012 Data Management & Warehousing                               11
EFFICIENCIES OF COLUMN STORE
                    DATABASES
 •  Column store databases offer significant storage
    optimisation opportunities especially where there is low
    or medium cardinality character strings (e.g. the
    telephone numbers and reference data) because long
    strings are not repeatedly stored
 •  In addition it is possible to compress the data column
    stores very efficiently
 •  It is possible, in some column store implementations, that
    the column storage holds additional metadata that can
    be used to speed up specific queries (e.g. the number of
    records associated with each value in a column)
 •  Reduced the data volume stored means reduced I/O
    when querying the database, this consequently gives
    query performance improvements


Jan 2012             © 2012 Data Management & Warehousing   12
INEFFICIENCIES OF COLUMN STORE
               DATABASES
 •  In general manipulating individual rows for updates
    is expensive as it has to go to each of the columns
    and then update the mapping table
     •  Some column store databases have specific technologies
        to limit the impact of this by caching updates
 •  Consequently Column Store Databases are not
    efficient at OLTP type applications – however they
    are very efficient for DWH/BI/Archive type
    applications because the data is bulk loaded rather
    than individual row inserts, it is not frequently
    updated and used in large set based queries

Jan 2012              © 2012 Data Management & Warehousing       13
HOW EFFICIENT IS IT TO STORE THIS
                 DATA?
  •  What hardware was used and what would be needed for a
     production environment?

  •  How was the data loaded?

  •  What was the storage characteristics?




Jan 2012              © 2012 Data Management & Warehousing   14
THE TEST ENVIRONMENT

 •  The test environment was designed to measure storage
    and not system performance
 •  This test was done using Sybase IQ 15.4
     •  Sybase has had a column storage database called IQ since
        1996 and is one of the most established of the 25 or so currently
        listed on Wikipedia
     •  The server was running CentOS 5.7 x64, a Redhat Linux
        derivative
     •  The hardware consisted of:
           •    Intel Xeon Quad-Core X3363
           •    16GB Memory
           •    Adaptec 5405 RAID Controller with 2x 1TB 7200rpm Hard Disk (RAID1)
           •    The database was built on file systems rather than raw devices
 •  Total hardware cost was less than US$3000
 •  Software licences were provided on evaluation

Jan 2012                        © 2012 Data Management & Warehousing            15
A PRODUCTION ENVIRONMENT?

 •  To make this into a production environment would
    depend on the volume of data per month and the
    number of months data to be held and the type of CDR
 •  The biggest performance driver would be to have more
    disk spindles adding more (faster) drives or using solid
    state disks. This would improve performance as well as
    adding greater capacity
     •  e.g. 16 1Tb drives in RAID10 configuration would provide
        around 7.75Tb of space and store 75 Billion of these CDRs
     •  Using raw devices instead or file systems would also improve
        performance
 •  Other performance enhancements would include
     •  Moving from 1 to 2 or 4 Quad Core CPUs
     •  Adding another 16Gb of memory


Jan 2012                 © 2012 Data Management & Warehousing          16
LOADING THE DATA

 •  The data was loaded using PELT, an ETL tool written
    and used by Data Management & Warehousing

 •  The loading was done to production level quality

 •  Data is loaded into a load table (CDR_LOAD) which
    has a view (CDR_CONVERT) over it that applies
    data quality checks. The data is then selected from
    the view and inserted into the main table (CDRs)

 •  Each step is fully logged and audited

Jan 2012           © 2012 Data Management & Warehousing   17
THE LOADING STEPS

      •  Copy a compressed (Unix                 •  Insert into the main CDR table
         Compress .Z) flat file (as                 from the DQ view
         provided) from the                         CDR_CONVERT over the
         incoming directory to the                  CDR_LOAD table
         workspace                               •  Record the size of the CDR
      •  Record the size of the .Z file             table in kilobytes
         in bytes                                •  Truncate the CDR_LOAD table
      •  Uncompress the file                     •  Compress the source file with
      •  Record the size in bytes and               ‘gzip -9’ (maximum
         the number of records in                   compression, longest
         the uncompressed file                      execution)
      •  Use iSQL ‘Load’ command                 •  Record the size of the .gz file in
         to insert the data into a                  bytes
         CDR_LOAD table                          •  Move the compressed .gz file
      •  Record the size of the                     to an archive directory
         CDR_LOAD table in
         kilobytes

Jan 2012                    © 2012 Data Management & Warehousing                     18
RESULTS
 •  12,902 files were loaded                •  27.48 Gb of un-indexed
    with zero data quality                     storage in the database
    errors                                      •  8.6:1 Compression Ratio

 •  435,583,388 CDRs                        •  41.47 Gb of fully indexed
                                               storage in the database
 •  236.50 Gb of raw files                      •  5.7:1 Compression Ratio

                                            •  20.03 Gb of storage in the
 •  Loading: 33 hours, 22                      original .Z files
    minutes, 12 second                          •  11.8:1 Compression Ratio

 •  Indexing: 2 hours, 13                   •  12.42 Gb of storage in the
    minutes, 9 seconds                         archive .gz files
                                                •  19.0:1 Compression Ratio

Jan 2012               © 2012 Data Management & Warehousing                   19
ADDING INDEXES

 •  By default the table has no indexes
     •  This is the same in most databases
 •  For this test every field was indexed
     •  This added 63 indexes that took up an additional 24Gb
 •  The total space used was still 5.7 times smaller than
    the space used by the raw files
 •  These indexes would significantly improve query
    performance
     •  However not all the indexes would be required in a
        production system as not all fields would be actively
        queried and this would reduce the space used

Jan 2012                © 2012 Data Management & Warehousing    20
DISK SPACE USED




Jan 2012     © 2012 Data Management & Warehousing   21
LOAD PERFORMANCE

 •  The average file had 33,760 records
 •  The ETL to load an average file took 11 seconds
     •  2 seconds to copy to the working directory and
        decompress
     •  3 seconds import into CDR_LOAD table
     •  3 seconds copy from CDR_CONVERT table to CDRS table
     •  2 seconds to gzip -9 and archive
     •  1 second logging and truncating tables
 •  None of the tables were indexed during the load



Jan 2012              © 2012 Data Management & Warehousing    22
OBSERVATIONS (1)

 •  The results were approximately in the middle of our
    expectations and previous experience of other
    similar data sets where the raw data has been
    compressed between 5 and 10 times
 •  Even low end hardware gives acceptable load
    performance suitable for archive functionality but
    production scale hardware is needed for BI/DWH




Jan 2012           © 2012 Data Management & Warehousing   23
OBSERVATIONS (2)

 •  Some database tuning techniques are needed for truly
    massive data sets but can be designed in from the
    outset at low cost (e.g. which indexes/index types)
 •  It is worth considering putting each month (or some
    other similar date based partitioning) in separate tables
    for systems management purposes as it makes it easy to
    remove the data at the end of the archiving process
 •  Smaller reference tables added to the schema would
    have little/no compression but they are also very small
    and therefore not contribute greatly to the space used



Jan 2012             © 2012 Data Management & Warehousing       24
ALTERNATIVE SCENARIOS

 •  This presentation uses information gathered on
    specific data used for a specific purpose by a client
 •  Companies may wonder how their data would
    work in both storage and performance terms
 •  Vendors may also wonder how their technologies
    compare in both storage and performance terms
 •  If you are interested in finding out please contact us
    with these or any other Data Warehousing/Business
    Intelligence enquiries



Jan 2012           © 2012 Data Management & Warehousing   25
CONTACT US

 •  Data Management & Warehousing
     •  Website: http://www.datamgmt.com
     •  Telephone: +44 (0) 118 321 5930
 •  David Walker
     •     E-Mail: davidw@datamgmt.com
     •     Telephone: +44 (0) 7990 594 372
     •     Skype: datamgmt
     •     White Papers: http://scribd.com/davidmwalker




Jan 2012                  © 2012 Data Management & Warehousing   26
ABOUT US

   Data Management & Warehousing is a UK based consultancy
   that has been delivering successful business intelligence and
              data warehousing solutions since 1995.

Our consultants have worked with major corporations around the
  world including the US, Europe, Africa and the Middle East.

   We have worked in many industry sectors such as telcos,
   manufacturing, retail, financial and transport. We provide
governance and project management as well as expertise in the
                    leading technologies.




Jan 2012               © 2012 Data Management & Warehousing        27
THANK YOU
© 2 0 1 2 - D ATA M A N A G E M E N T & WA R E H O U S I N G
            H T T P : / / W W W. D ATA M G M T. C O M

Contenu connexe

Tendances

[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニングオラクルエンジニア通信
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesGustavo Rene Antunez
 
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configurationsuresh gandhi
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONMarkus Michalewicz
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfSrirakshaSrinivasan2
 
Rman Presentation
Rman PresentationRman Presentation
Rman PresentationRick van Ek
 
Enable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgentEnable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgentBobby Curtis
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
Extreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateExtreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateBobby Curtis
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1Satishbabu Gunukula
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architectureAmit Bhalla
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insightsKirill Loifman
 
Awr + 12c performance tuning
Awr + 12c performance tuningAwr + 12c performance tuning
Awr + 12c performance tuningAiougVizagChapter
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsEnkitec
 
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
HA, Scalability, DR & MAA in Oracle Database 21c - OverviewHA, Scalability, DR & MAA in Oracle Database 21c - Overview
HA, Scalability, DR & MAA in Oracle Database 21c - OverviewMarkus Michalewicz
 

Tendances (20)

[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 
Smooth as Silk Exadata Patching
Smooth as Silk Exadata PatchingSmooth as Silk Exadata Patching
Smooth as Silk Exadata Patching
 
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configuration
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLON
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
 
Rman Presentation
Rman PresentationRman Presentation
Rman Presentation
 
Enable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgentEnable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgent
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Extreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGateExtreme Replication - Performance Tuning Oracle GoldenGate
Extreme Replication - Performance Tuning Oracle GoldenGate
 
ORACLE ARCHITECTURE
ORACLE ARCHITECTUREORACLE ARCHITECTURE
ORACLE ARCHITECTURE
 
Exadata X8M-2 KVM仮想化ベストプラクティス
Exadata X8M-2 KVM仮想化ベストプラクティスExadata X8M-2 KVM仮想化ベストプラクティス
Exadata X8M-2 KVM仮想化ベストプラクティス
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architecture
 
SAP BODS 4.2
SAP BODS 4.2 SAP BODS 4.2
SAP BODS 4.2
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insights
 
Awr + 12c performance tuning
Awr + 12c performance tuningAwr + 12c performance tuning
Awr + 12c performance tuning
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
HA, Scalability, DR & MAA in Oracle Database 21c - OverviewHA, Scalability, DR & MAA in Oracle Database 21c - Overview
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
 
SQLd360
SQLd360SQLd360
SQLd360
 

Similaire à Storage Characteristics Of Call Data Records In Column Store Databases

Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Secondary Storage - General Knowledge
Secondary Storage - General KnowledgeSecondary Storage - General Knowledge
Secondary Storage - General KnowledgeSamat
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsDenodo
 
DC Storage Review
DC Storage ReviewDC Storage Review
DC Storage ReviewRodney Koch
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bTony Pearson
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3xKinAnx
 
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceMarketingArrowECS_CZ
 
ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)Huibert Aalbers
 
Things learned from OpenWorld 2013
Things learned from OpenWorld 2013Things learned from OpenWorld 2013
Things learned from OpenWorld 2013Connor McDonald
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014MongoDB
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Denodo
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
System Analysis And Design
System Analysis And DesignSystem Analysis And Design
System Analysis And DesignLijo Stalin
 

Similaire à Storage Characteristics Of Call Data Records In Column Store Databases (20)

Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Secondary Storage - General Knowledge
Secondary Storage - General KnowledgeSecondary Storage - General Knowledge
Secondary Storage - General Knowledge
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
DC Storage Review
DC Storage ReviewDC Storage Review
DC Storage Review
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Presentation dell™ power vault™ md3
Presentation   dell™ power vault™ md3Presentation   dell™ power vault™ md3
Presentation dell™ power vault™ md3
 
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database Appliance
 
ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)
 
Things learned from OpenWorld 2013
Things learned from OpenWorld 2013Things learned from OpenWorld 2013
Things learned from OpenWorld 2013
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
System Analysis And Design
System Analysis And DesignSystem Analysis And Design
System Analysis And Design
 

Plus de David Walker

Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServicesDavid Walker
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure ClustersDavid Walker
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceDavid Walker
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Big Data Analytics 2017 - Worldpay - Empowering Payments
Big Data Analytics 2017  - Worldpay - Empowering PaymentsBig Data Analytics 2017  - Worldpay - Empowering Payments
Big Data Analytics 2017 - Worldpay - Empowering PaymentsDavid Walker
 
Data Driven Insurance Underwriting
Data Driven Insurance UnderwritingData Driven Insurance Underwriting
Data Driven Insurance UnderwritingDavid Walker
 
Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)David Walker
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceDavid Walker
 
BI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosBI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosDavid Walker
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platformDavid Walker
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesDavid Walker
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environmentDavid Walker
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data recordsDavid Walker
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
A linux mac os x command line interface
A linux mac os x command line interfaceA linux mac os x command line interface
A linux mac os x command line interfaceDavid Walker
 
Connections a life in the day of - david walker
Connections   a life in the day of - david walkerConnections   a life in the day of - david walker
Connections a life in the day of - david walkerDavid Walker
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or futureDavid Walker
 
An introduction to social network data
An introduction to social network dataAn introduction to social network data
An introduction to social network dataDavid Walker
 
Using the right data model in a data mart
Using the right data model in a data martUsing the right data model in a data mart
Using the right data model in a data martDavid Walker
 
Implementing Netezza Spatial
Implementing Netezza SpatialImplementing Netezza Spatial
Implementing Netezza SpatialDavid Walker
 

Plus de David Walker (20)

Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServices
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI Compliance
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Big Data Analytics 2017 - Worldpay - Empowering Payments
Big Data Analytics 2017  - Worldpay - Empowering PaymentsBig Data Analytics 2017  - Worldpay - Empowering Payments
Big Data Analytics 2017 - Worldpay - Empowering Payments
 
Data Driven Insurance Underwriting
Data Driven Insurance UnderwritingData Driven Insurance Underwriting
Data Driven Insurance Underwriting
 
Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
 
BI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosBI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for Telcos
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environment
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data records
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
A linux mac os x command line interface
A linux mac os x command line interfaceA linux mac os x command line interface
A linux mac os x command line interface
 
Connections a life in the day of - david walker
Connections   a life in the day of - david walkerConnections   a life in the day of - david walker
Connections a life in the day of - david walker
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or future
 
An introduction to social network data
An introduction to social network dataAn introduction to social network data
An introduction to social network data
 
Using the right data model in a data mart
Using the right data model in a data martUsing the right data model in a data mart
Using the right data model in a data mart
 
Implementing Netezza Spatial
Implementing Netezza SpatialImplementing Netezza Spatial
Implementing Netezza Spatial
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Storage Characteristics Of Call Data Records In Column Store Databases

  • 1. STORAGE CHARACTERISTICS OF CALL DATA RECORDS IN COLUMN STORE DATABASES D AV I D M WA L K E R D ATA M A N A G E M E N T & WA R E H O U S I N G
  • 2. OVERVIEW •  This presentation gives a brief overview of the storage characteristics of Call Data Records in Column Store Databases •  It discusses •  What are Call Data Records (CDRs)? •  What is a Column Store Database? •  How efficient is a column store database for storing CDR and other (similar) machine generated data? •  It does not: •  Examine performance in any detail •  Compare column store to traditional row-based Jan 2012 © 2012 Data Management & Warehousing 2
  • 3. WHAT ARE CALL DATA RECORDS (CDRs) ? •  Every time a telephone call is made data about that call is recorded. At its most basic this will include: •  The Calling Number (who made the call) •  The Called Number (who was called) •  The Start Time •  The End Time (or the duration) •  Various pieces of technical information (which network switch was used, mobile handset identifier, call direction, is it a free x800 type call etc.) Jan 2012 © 2012 Data Management & Warehousing 3
  • 4. CDRs AT MULTIPLE LEVELS •  A CDR is created at the switch, each switch involved in a call creates its own CDRs, these are often called Network CDRs •  The Network CDRs are joined together into a record of an end to end call record through a process known as mediation. These are Unrated CDRs •  Finally the cost of the call is calculated and added to the Unrated CDRs to create Rated CDRs Jan 2012 © 2012 Data Management & Warehousing 4
  • 5. MORE CDR COMPLEXITY •  There are CDRs that are used for billing the subscriber, often called Retail CDRs •  There are also CDRs that are used to charge other operators when their call travels over your network (e.g. when you make a mobile call that finishes on land line from another operator) These are known as Interconnect CDRs or Wholesale CDRs •  There are also differences between Mobile and Fixed (Land) Line CDRs •  Finally each Switch Manufacturer (there are over 60) and each Mediation and/or Billing system (again at least 50) uses their own format Jan 2012 © 2012 Data Management & Warehousing 5
  • 6. FOR THIS EXERCISE … •  We are using a European Telephone Company (Telco) Mobile Rated Interconnect CDRs •  We have 12,902 files, containing 435,242,447 CDRs over a 181 day period from 482,883 subscribers •  Each CDR has 80 fields and 583 characters in a fixed length record format file. In addition we have added an additional mandatory field to hold the source file name from which the record came Jan 2012 © 2012 Data Management & Warehousing 6
  • 7. DATA DISTRIBUTION IN THE CDR RECORDS (1) •  The structure of the data in the record has a massive impact on its storage. There are a number of factors to look at: •  Data Types, Padding, Place Holders and Data Cardinality •  The example data we are using has 2 Datetime fields, 11 Char fields, 10 Numeric fields, 33 Integer fields and 25 Varchar fields which is a fairly typical mix for this type of machine generated data. In the source file these are all held as ASCII text. Jan 2012 © 2012 Data Management & Warehousing 7
  • 8. DATA DISTRIBUTION IN THE CDR RECORDS (2) •  Fixed length records are padded. In our data set the ‘Calling Number’ fixed length field is defined as 24 characters long however the maximum field length in the actual data is only 11 characters long. This means that there always 13 space characters of padding afterwards •  24 of our 80 fields have no information in them at all, 43 of the fields are mandatory and are 100% populated. The remaining 13 fields have between 25% and 75% of the records filled. Jan 2012 © 2012 Data Management & Warehousing 8
  • 9. DATA DISTRIBUTION IN THE CDR RECORDS (3) •  Finally the number of discreet values (cardinality) a field has affects storage. One flag field has possible values of 0 or 1 and therefore a (low) cardinality of 2, another field has a nearly unique value for every record and therefore a very high cardinality. Of the 57 fields with data there are 20 fields with high cardinality, 5 fields with medium cardinality and the remaining 32 fields have a low cardinality Jan 2012 © 2012 Data Management & Warehousing 9
  • 10. WHAT IS A COLUMN STORE DATABASE? •  Traditionally databases are ‘row-based’ i.e. each field of data in a record is stored next to each other. Forename Surname Gender David Walker Male Helen Walker Female Sheila Jones Female •  Column store databases store the values in columns and then hold a mapping to form the record •  This is transparent to the user, who queries a table with SQL in exactly the same way as they would a row-based database Jan 2012 © 2012 Data Management & Warehousing 10
  • 11. COLUMN STORAGE First Name F Token Note: To the user this appears as a conventional row-based table that can be queried by standard Value SQL, it is only the underlying storage that is different David PPP Helen QQQ F Token S Token G Token Sheila RRR PPP YYY BBB Surname Value S Token QQQ YYY AAA Jones XXX RRR XXX AAA Walker YYY Gender Value G Token Female AAA Male BBB Jan 2012 © 2012 Data Management & Warehousing 11
  • 12. EFFICIENCIES OF COLUMN STORE DATABASES •  Column store databases offer significant storage optimisation opportunities especially where there is low or medium cardinality character strings (e.g. the telephone numbers and reference data) because long strings are not repeatedly stored •  In addition it is possible to compress the data column stores very efficiently •  It is possible, in some column store implementations, that the column storage holds additional metadata that can be used to speed up specific queries (e.g. the number of records associated with each value in a column) •  Reduced the data volume stored means reduced I/O when querying the database, this consequently gives query performance improvements Jan 2012 © 2012 Data Management & Warehousing 12
  • 13. INEFFICIENCIES OF COLUMN STORE DATABASES •  In general manipulating individual rows for updates is expensive as it has to go to each of the columns and then update the mapping table •  Some column store databases have specific technologies to limit the impact of this by caching updates •  Consequently Column Store Databases are not efficient at OLTP type applications – however they are very efficient for DWH/BI/Archive type applications because the data is bulk loaded rather than individual row inserts, it is not frequently updated and used in large set based queries Jan 2012 © 2012 Data Management & Warehousing 13
  • 14. HOW EFFICIENT IS IT TO STORE THIS DATA? •  What hardware was used and what would be needed for a production environment? •  How was the data loaded? •  What was the storage characteristics? Jan 2012 © 2012 Data Management & Warehousing 14
  • 15. THE TEST ENVIRONMENT •  The test environment was designed to measure storage and not system performance •  This test was done using Sybase IQ 15.4 •  Sybase has had a column storage database called IQ since 1996 and is one of the most established of the 25 or so currently listed on Wikipedia •  The server was running CentOS 5.7 x64, a Redhat Linux derivative •  The hardware consisted of: •  Intel Xeon Quad-Core X3363 •  16GB Memory •  Adaptec 5405 RAID Controller with 2x 1TB 7200rpm Hard Disk (RAID1) •  The database was built on file systems rather than raw devices •  Total hardware cost was less than US$3000 •  Software licences were provided on evaluation Jan 2012 © 2012 Data Management & Warehousing 15
  • 16. A PRODUCTION ENVIRONMENT? •  To make this into a production environment would depend on the volume of data per month and the number of months data to be held and the type of CDR •  The biggest performance driver would be to have more disk spindles adding more (faster) drives or using solid state disks. This would improve performance as well as adding greater capacity •  e.g. 16 1Tb drives in RAID10 configuration would provide around 7.75Tb of space and store 75 Billion of these CDRs •  Using raw devices instead or file systems would also improve performance •  Other performance enhancements would include •  Moving from 1 to 2 or 4 Quad Core CPUs •  Adding another 16Gb of memory Jan 2012 © 2012 Data Management & Warehousing 16
  • 17. LOADING THE DATA •  The data was loaded using PELT, an ETL tool written and used by Data Management & Warehousing •  The loading was done to production level quality •  Data is loaded into a load table (CDR_LOAD) which has a view (CDR_CONVERT) over it that applies data quality checks. The data is then selected from the view and inserted into the main table (CDRs) •  Each step is fully logged and audited Jan 2012 © 2012 Data Management & Warehousing 17
  • 18. THE LOADING STEPS •  Copy a compressed (Unix •  Insert into the main CDR table Compress .Z) flat file (as from the DQ view provided) from the CDR_CONVERT over the incoming directory to the CDR_LOAD table workspace •  Record the size of the CDR •  Record the size of the .Z file table in kilobytes in bytes •  Truncate the CDR_LOAD table •  Uncompress the file •  Compress the source file with •  Record the size in bytes and ‘gzip -9’ (maximum the number of records in compression, longest the uncompressed file execution) •  Use iSQL ‘Load’ command •  Record the size of the .gz file in to insert the data into a bytes CDR_LOAD table •  Move the compressed .gz file •  Record the size of the to an archive directory CDR_LOAD table in kilobytes Jan 2012 © 2012 Data Management & Warehousing 18
  • 19. RESULTS •  12,902 files were loaded •  27.48 Gb of un-indexed with zero data quality storage in the database errors •  8.6:1 Compression Ratio •  435,583,388 CDRs •  41.47 Gb of fully indexed storage in the database •  236.50 Gb of raw files •  5.7:1 Compression Ratio •  20.03 Gb of storage in the •  Loading: 33 hours, 22 original .Z files minutes, 12 second •  11.8:1 Compression Ratio •  Indexing: 2 hours, 13 •  12.42 Gb of storage in the minutes, 9 seconds archive .gz files •  19.0:1 Compression Ratio Jan 2012 © 2012 Data Management & Warehousing 19
  • 20. ADDING INDEXES •  By default the table has no indexes •  This is the same in most databases •  For this test every field was indexed •  This added 63 indexes that took up an additional 24Gb •  The total space used was still 5.7 times smaller than the space used by the raw files •  These indexes would significantly improve query performance •  However not all the indexes would be required in a production system as not all fields would be actively queried and this would reduce the space used Jan 2012 © 2012 Data Management & Warehousing 20
  • 21. DISK SPACE USED Jan 2012 © 2012 Data Management & Warehousing 21
  • 22. LOAD PERFORMANCE •  The average file had 33,760 records •  The ETL to load an average file took 11 seconds •  2 seconds to copy to the working directory and decompress •  3 seconds import into CDR_LOAD table •  3 seconds copy from CDR_CONVERT table to CDRS table •  2 seconds to gzip -9 and archive •  1 second logging and truncating tables •  None of the tables were indexed during the load Jan 2012 © 2012 Data Management & Warehousing 22
  • 23. OBSERVATIONS (1) •  The results were approximately in the middle of our expectations and previous experience of other similar data sets where the raw data has been compressed between 5 and 10 times •  Even low end hardware gives acceptable load performance suitable for archive functionality but production scale hardware is needed for BI/DWH Jan 2012 © 2012 Data Management & Warehousing 23
  • 24. OBSERVATIONS (2) •  Some database tuning techniques are needed for truly massive data sets but can be designed in from the outset at low cost (e.g. which indexes/index types) •  It is worth considering putting each month (or some other similar date based partitioning) in separate tables for systems management purposes as it makes it easy to remove the data at the end of the archiving process •  Smaller reference tables added to the schema would have little/no compression but they are also very small and therefore not contribute greatly to the space used Jan 2012 © 2012 Data Management & Warehousing 24
  • 25. ALTERNATIVE SCENARIOS •  This presentation uses information gathered on specific data used for a specific purpose by a client •  Companies may wonder how their data would work in both storage and performance terms •  Vendors may also wonder how their technologies compare in both storage and performance terms •  If you are interested in finding out please contact us with these or any other Data Warehousing/Business Intelligence enquiries Jan 2012 © 2012 Data Management & Warehousing 25
  • 26. CONTACT US •  Data Management & Warehousing •  Website: http://www.datamgmt.com •  Telephone: +44 (0) 118 321 5930 •  David Walker •  E-Mail: davidw@datamgmt.com •  Telephone: +44 (0) 7990 594 372 •  Skype: datamgmt •  White Papers: http://scribd.com/davidmwalker Jan 2012 © 2012 Data Management & Warehousing 26
  • 27. ABOUT US Data Management & Warehousing is a UK based consultancy that has been delivering successful business intelligence and data warehousing solutions since 1995. Our consultants have worked with major corporations around the world including the US, Europe, Africa and the Middle East. We have worked in many industry sectors such as telcos, manufacturing, retail, financial and transport. We provide governance and project management as well as expertise in the leading technologies. Jan 2012 © 2012 Data Management & Warehousing 27
  • 28. THANK YOU © 2 0 1 2 - D ATA M A N A G E M E N T & WA R E H O U S I N G H T T P : / / W W W. D ATA M G M T. C O M