SlideShare une entreprise Scribd logo
1  sur  105
Life and Work of Jim
Gray
January 5, 2013



                       1
2
JAMES ("JIM") NICHOLAS GRAY
United States – 1998
CITATION
For fundamental contributions to database and
transaction processing research and technical
leadership in system implementation from research
prototypes to commercial products. The transaction
is the fundamental abstraction underlying database
system concurrency and failure recovery. Gray’s
work [defined] the key transaction properties:
atomicity, consistency, isolation and durability, and
his locking and recovery work demonstrated how to
build … systems that exhibit these properties.
                                                        3
E. F. Codd invented the
Relational Databases in
1970 and created what is a
100+ Billion Dollar/year
Industry today.
Codd’s Relational Model
● Simple model
● Data stored in relational tables
● Data Independence – separation of
  data storage and data access
● Declarative Queries
● Algebra to mathematically reason
  about data objects – made query
  optimization possible
● Ad-hoc queries through SQL.
● Embedded in operational systems.
                                      5
ACID properties are
fundamental to
Relational Systems and
necessary for on-line
transaction processing (OLTP)
systems
                              Atomicity
● Jim Gray defined ACID
  properties to guarantee
                              Consistency
  database transactions are   Isolation
  processed reliably.         Durability
                                            6
From Transactions to Transaction
  Processing Systems - II
                    Reality                Abstraction


                                              DB



                    Change                 Transaction




                                                             Q u ery
                                                    '
                                               DB                      Answer




The real state is represented by an abstraction, called the database, and the
transformation of the real state is mirrored by the execution of a program, called a
transaction, that transforms the database.
   7
Gray defined
Data Manipulation Actions as

• transient and    • grouped into           • involve sensors,
  internal state     transactions and         actuators etc. They
                     reflected in the         cannot be undone
                     state of transaction     they can be
                     outcome                  compensated.


Unprotected        Protected                Real



                                                                    8
Definitions
● A transaction is a sequence of operations that form a single
  unit of work
● A transaction is often initiated by an application program
   – begin a transaction
     START TRANSACTION
   – end a transaction
     COMMIT (if successful) or ROLLBACK (if errors)
● Either the whole transaction must succeed or the effect of all
  operations has to be undone (rollback)
● To achieve durable transaction atomicity, the transition to the
  ―committed‖ state must be accomplished by the single write to
  non-volatile storage.
                                                                    9
Structure of a Transaction Program

      BEGIN WORK ()
                                ROLL BACK WORK ()




             WORK
                      ROLL BACK WORK ()




     COMMIT WORK ()
                                                    10
While at IBM San Jose Research
Laboratory
October 1972 to December 1980
● Jim Gray developed three key ideas related to transaction concurrency
  control:
   – The notion of transaction
   – Serializability; degrees of consistency;
   – Multi-granularity locking.

● There are two main transaction issues
   – concurrent execution of multiple transactions
   – recovery after hardware failures and system crashes

                                                                     11
Write Ahead Log (WAL) protocol
● The WAL protocol records the old and new states induced by
  protected actions separately from the actual state changes.
● The logged changes are written to stable storage before the
  actual changes are written back to stable storage (that‘s the
  ―Write Ahead‖ part).
● Transactions are committed by simply appending and writing a
  ‗commit‘ record to the recovery log. Logged changes are used
  to undo protected actions of aborted transactions and of
  transactions in progress at the time of a system failure.

                                                              12
Write Ahead Log (WAL) protocol
● Log records are also used to redo committed actions
  whose actual changes have not been written back to
  stable storage at the time of a system failure.
● The WAL protocol allows changed data to be written to
  their stable storage home at any time after the log
  records describing the changes have been written into the
  stable log.
● This gives the Database Manager great flexibility in
  managing the contents of its volatile data buffer pools.
                                                          13
ACID Properties: First Definition
● Atomicity: A transaction‘s changes to the state are atomic:
  either all happen or none happen. These changes include
  database changes, messages, and actions on transducers.
● Consistency: A transaction is a correct transformation of the
  state. The actions taken as a group do not violate any of the
  integrity constraints associated with the state. This requires that
  the transaction be a correct program.
● Isolation: Even though transactions execute concurrently, it
  appears to each transaction T, that others executed either
  before T or after T, but not both.
● Durability: Once a transaction completes successfully
  (commits), its changes to the state survive failures.
14
[Gray 1993] Jim Gray and Andreas Reuter,
Transaction Processing: Concepts and
Techniques, Morgan Kaufmann, San
Mateo, CA (1993).
                                           15
In 1985, Jim and a number of other
senior leaders in the field of transaction
processing started the HPTS (High
Performance Transaction Systems)
Workshop [HPTS]. This is a biennial
gathering of folks interested in
transaction systems (and things related
to scalable systems). It includes people
from competing companies in industry
and also from academia. Over the last
22 years, it has evolved to include many
different topics as high-end computing
morphed from the mainframe to the
Internet.
                                         16
The early years …
● Born January 12, 1944

● 1961 graduated from Westmoor
  High School in San Francisco.

● 1966 graduated from the
  University of California at
  Berkeley with bachelor‘s degree
  in mathematics and engineering.

                                    17
James Nicholas Gray was born in San
Francisco, California on 12 January
1944.
● In 1961 Gray graduated from Westmoor High School in San
  Francisco.
● He graduated from the University of California at Berkeley
  bachelor‘s degree in mathematics and engineering in 1966.
● After spending a year in New Jersey working at Bell
  Laboratories in Murray Hill and attending classes at the
  Courant Institute in New York City, he returned to Berkeley and
  enrolled in the newly-formed computer science department,
  earning a Ph.D. in 1969 for work on context-free grammars and
  formal language theory.
                                                                18
5-minute rule
for Memory vs. Disk Access (1987)
When does it make economic sense to
hold pages in memory versus doing IO
every time data from the page is
accessed?

    THE FIVE MINUTE RULE
     Pages referenced every
     five minutes should be
        memory resident.
                                       19
From Tandem Report 1987:
Jim Gray and Gianfranco Putzolu
● The argument goes as follows: A Tandem disc, and half a
  controller comfortably deliver 15 accesses per second and are
  priced at 15K$ for a small disc and 20K$ for a large disc (180Mb
  and 540Mb respectively).
● So the price per access per second is about 1K$. The extra CPU
  and channel cost for supporting a disc are lK$/a/s. So one disc
  access per second costs about 2K$ on a Tandem system.
● A megabyte of Tandem main memory costs 5K$, so a kilobyte
  costs 5$.

                                                                20
● If making a 1Kb record resident saves 1a/s, then it saves
  about 2K$ worth of disc accesses at a cost of 5$, a good
  deal. If it saves 0.1 a/s then it saves about 200$, still a
  good deal. Continuing this, the break even point is an
  access every 2000/5 - 400 seconds.
● So, any 1KB record accessed more frequently than every
  400 seconds should live in main memory. 400 seconds is
  "about" 5 minutes, hence the name: the Five Minute Rule.

                                                            21
5-minute rule
● The five-minute rule is based on the tradeoff between the
  cost of RAM and the cost of disk accesses.




                                                          22
5-minute rule
● The five-minute rule is based on the tradeoff between the
  cost of RAM and the cost of disk accesses.




                                                          23
1997 – Ten years later




                         24
New Storage Metrics:
   Kaps, Maps, SCAN
● Kaps: How many kilobyte objects served per second
  – The file server, transaction processing metric
  – This is the OLD metric.
● Maps: How many megabyte objects served per sec
  – The Multi-Media metric
● SCAN: How long to scan all the data
  – the data mining and utility metric
● And
  – Kaps/$, Maps/$, TBscan/$
                                                      25
Disk Changes
● Disks got cheaper: 20k$ -> 1K$ (or even 200$)
   – $/Kaps etc improved 100x (Moore‘s law!) (or even 500x)
   – One-time event (went from mainframe prices to PC prices)
● Disk data got cooler (10x per decade):
   – 1990 disk ~ 1GB and 50Kaps and 5 minute scan
   – 2000 disk ~70GB and 120Kaps and 45 minute scan
● So
   – 1990: 1 Kaps per 20 MB
   – 2000: 1 Kaps per 500 MB
   – disk scans take longer (10x per decade)
● Backup/restore takes a long time (too long)
                                                                26
Storage Ratios Changed
● 10x better access time         ● DRAM/disk media price
● 10x more bandwidth               ratio changed
● 100x more capacity                –   1970-1990      100:1
                                    –   1990-1995       10:1
● Data 25x cooler
                                    –   1995-1997       50:1
  (1Kaps/20MB vs
                                    –   today
  1Kaps/500MB)                          ~ 0.03$/MB disk 100:1
● 4,000x lower media price                3$/MB dram
● 20x to 100x lower disk price
● Scan takes 10x longer (3
  min vs 45 min)
                                                                27
The Five Minute Rule
●   Trade DRAM for Disk Accesses
●   Cost of an access (DriveCost / Access_per_second)
●   Cost of a DRAM page ( $/MB / pages_per_MB)
●   Break even has two terms:
●   Technology term and an Economic term

● Grew page size to compensate for changing ratios.
● Still at 5 minute for random, 1 minute sequential

From his presentations in 2000                          28
Data on Disk
Can Move to RAM in 10 years
                     Storage Price vs Time
                    Megabytes per kilo-dollar
                  10,000.


                   1,000.


                    100.
          MB/k$




100:1                 10.


        10 years       1.


                      0.1
                        1980     1990       2000
                                 Ye ar             29
Storage Hierarchy :
            Speed & Capacity vs Cost Tradeoffs
                                  Size vs Speed                        Price vs Speed
                         1015                Nearline           Cache                             102
                                                 Tape
Typical System (bytes)




                                                           Offline    Main
                         1012                 Disc         Tape              Secondary            100
                                                        Online




                                                                                                          $/MB
                                  Secondary                                              Online
                                                        Tape                             Tape
                                                                                 Disc
                         109       Main                                                            10-2
                                                                              Nearline        Offline
                                                                                  Tape        Tape
                         106                                                                       10-4
                                Cache
                         103                                                                      10-6
                                10-9 10-6 10-3 10 0 10 3             10-9 10-6 10-3 10 0 10 3
                                   Access Time (seconds)                 Access Time (seconds)          30
5-minute rule holds in 1997
● In summary, the five-minute rule still seems to apply to
  randomly accessed pages, primarily because page sizes
  have grown from 1KB to 8KB to compensate for changing
  technology ratios.




                                                         31
Storage Latency:
 How Far Away is the Data?
                                                      Andromeda
          9
     10       Tape /Optical                                            2,000 Years
               Robot


     106      Disk                          Pluto                          2 Years




                                            Olympia                         1.5 hr
    100       Memory
     10       On Board Cache                  This Hotel                   10 min
      2       On Chip Cache                     This Room
      1       Registers                                      My Head        1 min
                                                                                     32
From Jim Gray‟s Rules of Thumb in Data Engineering Presentation
What’s TeraByte?
      ● 1 Terabyte:
             –   1,000,000,000 business letters          150 miles of book shelf
             –   100,000,000 book pages                  15 miles of book shelf
             –   50,000,000 FAX images                   7 miles of book shelf
             –   10,000,000 TV pictures (mpeg)           10 days of video
             –   4,000 LandSat images                    16 earth images (100m)
             –   100,000,000 web page                    10 copies of the web HTML

      ● Library of Congress (in ASCII) is 25 TB
             – 1980: $200 million of disc        10,000 discs
             – $5 million of tape silo  10,000 tapes
             – 1997: 200 k$ of magnetic disc              48 discs
             –     30 k$ nearline tape                20 tapes


Jim Gray‘s presentations 1995
                                                             Terror Byte !           33
Yotta
     How Much Information Is there?
                                                                               Everything!
                                                                                                    Zetta
     ● Soon everything can be                                                  Recorded
           recorded and indexed
                                                                                 All Books           Exa
     ● Most data never be seen by
                                                                                 MultiMedia
       humans
                                                                                                    Peta
                                                                           All LoC books
     ● Precious Resource:                                                  (words)                   Tera
       Human attention
            – Auto-Summarization                                                           .Movie
            – Auto-Search                                                                           Giga
              is key technology.
              http://www.lesk.com/mlesk/ksg97                                    A Photo
              /ksg.html                                                                             Mega
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli            A Book          34
                                                                                                     Kilo
2007: Twenty Years Later




                           35
The 5-minute rule holds in 2007
● The old five-minute rule for RAM and disk now applies to
  64KB page sizes (334 seconds).
   – Five minutes had been the approximate break-even interval for
     1KB in 198715and for 8KB in 1997.14
● The five-minute break-even interval also applies to RAM
  and the expensive flash memory of 2007 for page sizes of
  64KB and above (365 seconds and 339 seconds).
   – As the price premium for flash memory decreases, so does the
     break-even interval (146 seconds and 136 seconds).

                                                                     36
Flash memory falls between
traditional RAM and
persistent mass storage
based on rotating disks in
terms of acquisition cost,
access latency, transfer
bandwidth, spatial density,
power consumption, and
cooling costs.

                              37
20 years out:
Summary and Conclusion
● The 20-year-old five-minute rule for RAM and disks still
  holds, but for ever-larger disk pages.
● It should be augmented by two new five-minute rules:
   – for small pages moving between RAM and flash memory and
   – for large pages moving between flash memory and traditional
     disks.
● For small pages moving between RAM and disk, Gray and
  Putzolu were amazingly accurate in predicting a five-hour
  break-even point 20 years into the future.

                                                                   38
39
40
Data Cube




            41
Aggregates in SQL
● The SQL standard [Melton, Simon] provides five                        SUM()
  aggregate functions:
  COUNT, SUM, MIN, MAX, AVG
       SELECT [DISTINCT] AVG(Temp)
       FROM     Weather;
● Aggregate functions return a single value. In addition,
  SQL allows aggregation over distinct values.
                                                                Table
                                                                        attribute SUM()
                                                            A

● Using GROUP BY , SQL can create a table of aggregate
                                                                          A
                                                            A
                                                            A

  values indexed by a set of attributes.
                                                            B
                                                            B
                                                            B             B
                                                            B
       SELECT     Time, Altitude, AVG(Temp)                 B
                                                            C
                                                            C
       FROM       Weather                                   C
                                                            C
                                                                          C

                                                            C
       GROUP BY Time, Altitude;                             D
                                                            D             D
                                                                                   42
Problems With This Design
● Users Want Histograms
● Users want sub-totals and totals
                                                                    sum
   – drill-down & roll-up reports              F() G() H()

● Users want CrossTabs
● Conventional wisdom
   – These are not relational operators       AIR
                                                    M T W T F S S   •


   – They are in many report writers and   HOTEL
                                           FOOD
     query engines                          MISC
                                               •


                                                                        43
Other Variants – Illustra
● init(&handle):
  – Allocates the handle and initializes the aggregate computation.

● iter(&handle, value):
  – Aggregates the next value into the current aggregate.

● value = final(&handle):
  – Computes and returns the resulting aggregate by using data
    saved in the handle. This invocation deallocates the handle.

                                                                      44
Agg reg at e
DATA CUBE and                                    Gro up B y
                                   Su m

ROLLUP                                           (wit h t ot al)
                                                   By Colo r
                                          RED
                                        WHIT E
                                         BLUE

SELECT Model, Year, Color
                                                   Su m
                SUM(Sales) AS total,                                  C ro ss Ta b
                                                                                  By Colo r
      SUM(Sales) / total(ALL,ALL,ALL)                       RED
                                                                   Chevy   Ford



FROM Sales                                                WHIT E
                                                          BLUE

WHERE Model IN {‘Ford’, ‘Chevy’}                      By Make
                                                                                                  Th e Da ta C ube a nd
                                                                                  Su m
  AND Year Between 1990 AND 1992                                                              Th e Su b- Space Agg re ga te s
                                                                                              CH          FO
                                                                                                               RD         0
                                                                                                   EV                1 9 9 91
GROUP BY CUBE(Model, Year, Color);                                                                    Y                   1 9 92
                                                                                                                             19    3
                                                                                                                                199
                                                                                   By Year
                                                                                                                                   By Make
                                                                   By Make & Year
                                                                                                                                   RED
                                                                                                                                   WHIT E
                                                                                                                                   BLUE
                                                                            By Colo r & Year
                                                                                                                         By Make & Col or
                                                                                                Su m                By Colo r
                                                                                                                                             45
46
47
A Dozen Information Technology Research Goals

1. Scalability: Devise a software and hardware architecture that scales
   up by a factor of 106. That is, an application‘s storage and
   processing capacity can automatically grow by a factor of million,
   doing jobs faster (106 x speedup) or doing larger jobs in the same
   time (106 x scale-up), just by adding more resources.
2. The Turing Test: Build a computer system that wins the imitation
   game at least 30% of the time.
3. Speech to text: Hear as well as a native speaker.
4. Text to speech: Speak as well as a native speaker.
5. See as well as person: Recognize objects and motion.
                                                                      48
A Dozen Information Technology Research Goals

6. Personal Memex: Record every thing a person sees and hears and
   quickly re retrieve any iteration on request.
7. World Memex: Build a system that given a text corpus, can answer
   questions about and summarize the text as precisely and quickly as
   a human expert in that field. Do the same for music, images, art and
   cinema.
8. Telepresence: Simulate being some other place retrospectively as
   an observer.
   (Teleobserver): hear and see as well as actually being there and as
   well as participant. Simulate being some other place as a participant
   (Telepresent): interacting with others and with the environment as
   though you are actually there.
                                                                       49
A Dozen Information Technology Research Goals

9. Trouble-Free Systems: Built a system used by millions of people
    each day and yet administered and managed by a single part-time
    person.
10. Secure System: Assure that the system of problem 9 services only
    authorized users, service cannot be denied by unauthorized users
    and information cannot be stolen (and prove it).
11. Always Up: Assure that the system is unavailable for less than one
    second per hundred years – eight s of availability (and prove it).



                                                                         50
A Dozen Information Technology Research Goals

12. Automatic Programmer: Devise a specification language or user
    interface that
   – Makes it easy for people to express designs (1,000x easier),
   – Computer can compile, and
   – Can describe all applications (is complete).
   The system should reason about application, asking questions about
   exception cases and incomplete specification. But is should not be onerous
   to use.




                                                                            51
Computer Industry Laws
(Rules of thumb)
●   Metcalf‘s law
●   Moore‘s first law
●   Bell‘s computer classes (7 price tiers)
●   Bell‘s platform evolution
●   Bell‘s platform economics
●   Bill‘s law
●   Software economics
●   Grove‘s law
●   Moore‘s second law
●   Is info-demand infinite?
●   The death of Grosch‘s law                 52
Gordon Bell’s Seven Price Tiers
       10$:            wrist watch computers
      100$:            pocket/ palm computers
     1,000$:           portable computers
    10,000$:           personal computers (desktop)
                                  •

   100,000$:           departmental computers (closet)
  1,000,000$:          site computers (glass house)
 10,000,000$:          regional computers (glass castle)


              Super server: costs more than $100,000
             “Mainframe”: costs more than $1 million
      Must be an array of processors, disks, tapes, comm ports   53
Information at your fingertips.
Bill Gates is known for his long-standing
belief that, as he once put it, ‖any piece of
information you want should be available
to you. -- Putting Information at Your
Fingertips.‖

Gates championed it as early as 1989,
and he was in a position to do something
about it. It remained his overriding goal
for the next two decades.

                                                54
The Vision: Global Data Federation
 ● Massive datasets live near their owners:
    – Near the instrument‘s software pipeline
    – Near the applications
    – Near data knowledge and curation
 ● Each Archive publishes a (web) service
    – Schema: documents the data
    – Methods on objects (queries)
 ● Scientists get ―personalized‖ extracts
 ● Uniform access to multiple Archives
    – A common global schema
                                                Federation
                                                         55
Gray and Bell
worked closely at
Digital and at
Microsoft’s Bay
Area Research
Center since 1994
● MyLifeBits

● Terra Server
                    56
Gordon Bell’s: MyLifeBits
● MylifeBits is a lifetime store of everything.
  It is the fulfillment of Vannevar Bush‘s 1945
  Memex vision including full-text search,
  text and audio annotations, and hyperlinks.
● The experiment:
   Gordon Bell has captured a lifetime's worth of
   articles, books, cards, CDs, letters, memos,
   papers, photos, pictures, presentations, home
   movies, videotaped lectures, and voice
   recordings and stored them digitally. He is now
   paperless, and is beginning to capture phone
   calls, IM transcripts, television, and radio.
                                                     57
58
TerraServer
In late spring of 1996, Paul Flessner, the General Manager of the
SQL Server team asked our lab to build a database application
that would test and demonstrate the scalability of the next release
of SQL Server code named ―Sphinx‖.

One of Jim‘s greatest abilities was to clearly define and articulate
the problem. The SQL team gave us two goals:
1. Test SQL‘s ability to scale up to support a database of one
    terabyte or larger.
2. An internet application where SQL marketing could
    demonstrate Windows and SQL Server‘s scalability.
                                                                       59
About moving research to production



―ideas don’t transfer, people transfer…”




                                           60
TerraServer Requirements
●   BIG —1 TB of data including catalog, temporary space, etc.
●   PUBLIC — available on the world wide web
●   INTERESTING — to a wide audience
●   ACCESSIBLE — using standard browsers (IE, Netscape)
●   REAL — a LOB application (users can buy imagery)
●   FREE —cannot require NDA or money to a user to access
●   FAST — usable on low-speed (56kbps) and high speeds(T-1+)
●   EASY — we do not want a large group to develop, deploy, or
    maintain the application

● CHEAP – An unwritten requirement
  (1) because TerraServer was only a prototype, test, and free
  demonstration; and (2) Jim Gray was a very frugal person!      61
SOVINFORMSPUTNIK
                                            (the Russian Space Agency)
                                            and Aerial Images




     United States Geological               An Interesting Internet
     Survey (USGS)                          Server
http://msdn.microsoft.com/en-us/library/aa226316(v=sql.70).aspx       62
Thesis: Scaleable Servers
● Scaleable Servers
   – Commodity hardware allows new applications
   – New applications need huge servers
   – Clients and servers are built of the same ―stuff‖
    • Commodity software and
    • Commodity hardware
● Servers should be able to
   – Scale up (grow node by adding CPUs, disks, networks)
   – Scale out (grow by adding nodes)
   – Scale down (can start small)
● Key software technologies
   – Objects, Transactions, Clusters, Parallelism
                                                            63
Thesis: Scaleable Servers
● Scaleable Servers
   – Commodity hardware allows new applications
   – New applications need huge servers
   – Clients and servers are built of the same ―stuff‖
    • Commodity software and
    • Commodity hardware
● Servers should be able to
   – Scale up (grow node by adding CPUs, disks, networks)
   – Scale out (grow by adding nodes)
   – Scale down (can start small)
● Key software technologies
   – Objects, Transactions, Clusters, Parallelism
                                                            64
Scaleable Servers
  BOTH SMP And Cluster

                    Grow up with SMP; 4xP6
SMP super           is now standard
server
                    Grow out with cluster
                    Cluster has inexpensive parts
Departmental
server
                                              Cluster
                                              of PCs
Personal
system

                                                        65
SMPs Have Advantages
● Single system image easier to
  manage, easier to program
  threads in shared memory, SMP super
  disk, Net                     server
● 4x SMP is commodity
● Software capable of 16x
                                Departmental
● Problems:                     server
    – >4 not commodity
    – Scale-down problem (starter
      systems expensive)            Personal
● There is a BIGGEST one            system
                                               66
Grow UP and OUT
           1 Terabyte DB   Cluster:
                               •a collection of nodes
                               •as easy to program and manage as
SMP super                      a single node
server



Departmental                                             1 billion
server                                                transactions
                                                         per day
Personal
system
                                                               67
Clusters Have Advantages
● Clients and servers made from the same stuff
● Inexpensive:
   – Built with commodity components
● Fault tolerance:
   – Spare modules mask failures
● Modular growth
   – Grow by adding small modules
● Unlimited growth:
             no biggest one
                                                 68
Windows NT Clusters
● Microsoft & 60 vendors defining NT clusters
   – Almost all big hardware and software vendors involved
● No special hardware needed - but it may help
● Fault-tolerant first, scaleable second
   – Microsoft, Oracle, SAP giving demos today
● Enables
   – Commodity fault-tolerance
   – Commodity parallelism (data mining, virtual reality…)
   – Also great for workgroups!


                                                             69
Parallelism
The OTHER aspect of clusters
● Clusters of machines allow two
  kinds of parallelism
   – Many little jobs: online transaction
     processing
       • TPC-A, B, C…
   – A few big jobs: data search and
     analysis
       • TPC-D, DSS, OLAP
● Both give automatic parallelism
                                            70
Kinds of Parallel Execution

                                                               Any                 Any
                                                            Sequential          Sequential
    Pipeline                                                 Program             Program




   Partition                                                         Any
                                                                   Sequential
                                                                                    Any
                                                                                  Sequential
                                                                    Program        Program
       outputs split N ways
       inputs merge M ways
                                                                                               71
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Data Rivers
       Split + Merge Streams
                                                   N X M Data Streams

                                                                           M Consumers
                                      N producers
                                                                   River
       Producers add records to the river,
       Consumers consume records from the river
       Purely sequential programming.
       River does flow control and buffering
               does partition and merge of data records

       River = Split/Merge in Gamma = Exchange operator in Volcano.
                                                                                         72
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Partitioned Execution
                Spreads computation and IO among processors
                                                                    Count


                                 Count             Count            Count     Count      Count




                                                           A Table

                                 A...E            F...J            K...N    O...S     T...Z



                    Partitioned data gives
                                      NATURAL parallelism
                                                                                                 73
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
N x M way Parallelism

                                                     Merge          Merge      Merge


                                    Sort              Sort           Sort       Sort       Sort

                                     Join              Join           Join      Join       Join




                                  A...E            F...J           K...N     O...S     T...Z


                                N inputs, M outputs, no bottlenecks.
                                Partitioned Data
                                Partitioned and Pipelined Data Flows
                                                                                                  74
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Year 2000
       The Year 2000 commodity PC
                                     4B      Machine
                                                       1 Bips Processor
       ●Billion Instructions/Sec
       ● .1 Billion Bytes RAM
                                                        .1 B byte RAM


       ●Billion Bits/s Net
                                                        10 GB byte Disk


       ● 10 B Bytes Disk
       ●Billion Pixel display
               – 3000 x 3000 x 24
       ● 1,000 $
                                                                          75
Jim Gray & Gordon Bell: 1997 presentations
Super Server: 4T Machine
       ●     Array of 1,000 4B machines
               –    1 b ips processors
               –    1 B B DRAM                                          CPU


               –    10 B B disks                                                 50 GB Disc

               –    1 Bbps comm lines                                 5 GB RAM


               –    1 TB tape robot
       ●     A few megabucks
       ●     Challenge:                                                 Cyber Brick
               –    Manageability
                                                                       a 4B machine
               –    Programmability
               –    Security
                                             Future servers are CLUSTERS
               –    Availability             of processors, discs
               –    Scaleability
               –    Affordability            Distributed database techniques
                                             make clusters work
       ●     As easy as a single system
                                                                                              76
Jim Gray & Gordon Bell: 1997 presentations
Jim Gray’s quest for real problems and
real data … led to a collaboration with
Astronomers.        Why Astronomy Data?
                  ●   It has no commercial value
                       –   No privacy concerns
                       –   Can freely share results with others
                       –   Great for experimenting with algorithms
                  ●   It is real and well documented
                       –   High-dimensional data (with confidence intervals)
                       –   Spatial data
                       –   Temporal data
                  ●   Many different instruments from many different
                      places and many different times
                  ●   Federation is a goal
 Alex Szalay      ●   There is a lot of it (petabytes)
                                                                               77
Availability and
                ability
             to handle
   very large volumes
       of storage and
 complex computing
is redefining how we
            do Science
                          78
Galileo and his telescope

First Paradigm:
For thousands of years, Science was about
empirically describing natural phenomenon
                            79
Second Paradigm:
Theoretical Science using models and
generalization
   Newton



                    Kepler


                                   Maxwell




                      80
Third Paradigm:
Computational Science: Simulating
Complex Phenomenon
                                     Over the last
                                         25 years
                                       Scientists
                                       have used
                                       computer
                                    simulation to
                                         validate
                                        theories.
 A hurricane computer simulation.                81
Fourth Paradigm:
Data Intensive Science
The scientific method was traditionally driven by hypothesis.

First scientists predict a good response, then collect
experimental data to validate the data against its predictions.

However, in the new data-driven approach researchers start
with collecting data and analyze data later.


                              82
Scientists are collecting data
How to codify data and extract insights and
knowledge?
 Experiments and
   Instruments

   Simulations
                                         Question

    Literature
                                          Answer

  Other Archives


                       83
Astronomy
● Help build world-wide telescope
   – All astronomy data and literature online
     and cross indexed
   – Tools to analyze the data
● Built SkyServer.SDSS.org
● Built Analysis system
   – MyDB
   – CasJobs (batch job)
● Results:
   –   It works and is used every day
   –   Spatial extensions in SQL 2005
   –   A good example of Data Grid
   –   Good examples of Web Services.
World Wide Telescope
Virtual Observatory
http://www.us-vo.org/             http://www.ivoa.net/

● Premise: Most data is (or could be online)
● So, the Internet is the world‘s best telescope:
    – It has data on every part of the sky
    – In every measured spectral band: optical, x-ray, radio..
    – As deep as the best instruments (2 years ago).
    – It is up when you are up.
      The ―seeing‖ is always great
       (no working at night, no clouds no moons no..).
    – It‘s a smart telescope:
          links objects and data to literature on them.
SkyServer.SDSS.org
● A modern archive
   – Access to Sloan Digital Sky Survey
     Spectroscopic and Optical surveys
   – Raw Pixel data lives in file servers
   – Catalog data (derived objects) lives in Database
   – Online query to any and all
● Also used for education
   – 150 hours of online Astronomy
   – Implicitly teaches data analysis
● Interesting things
   –   Spatial data search
   –   Client query interface via Java Applet
   –   Query from Emacs, Python, ….
   –   Cloned by other surveys (a template design)
   –   Web services are core of it.
SkyServer
  SkyServer.SDSS.org
● Like the TerraServer,
  but looking the other way:
  a picture of ¼ of the universe
● Sloan Digital Sky Survey Data:
  Pixels + Data Mining
● About 400 attributes per
  ―object‖
● Spectrograms for 1% of
  objects
SkyQuery




           88
SkyQuery (http://skyquery.net/)
● Distributed Query tool using a set of web services
● Many astronomy archives from
  Pasadena, Chicago, Baltimore, Cambridge (England)
● Has grown from 4 to 15 archives,
  now becoming
  international standard
  WebService Poster Child
●SELECT o.objId, o.r, o.type, t.objId
● Allows queries like:
   FROM SDSS:PhotoPrimary o,
       TWOMASS:PhotoPrimary t
  WHERE XMATCH(o,t)<3.5
       AND AREA(181.3,-0.76,6.5)
        AND o.type=3 and (o.I - t.m_j)>2
SkyServer/SkyQuery Evolution
 MyDB and Batch Jobs
Problem: need multi-step data analysis (not just single
  query).
Solution: Allow personal databases on portal

Problem: some queries are monsters
Solution: ―Batch schedule‖ on portal. Deposits answer in
  personal database.
Ecosystem Sensor Net
   LifeUnderYourFeet.Org
● Small sensor net monitoring soil
● Sensors feed to a database
● Helping build system to
  collect & organize data.
● Working on data analysis tools
● Prototype for other LIMS
  Laboratory Information Management Systems
RNA Structural Genomics
● Goal: Predict secondary and
  tertiary structure
  from sequence.
  Deduce tree of life.
● Technique: Analyze
  sequence variations sharing
  a common structure
  across tree of life
● Representing
  structurally aligned sequences
  is a key challenge
● Creating a database-driven alignment
  workbench accessing public and private
  sequence data
VHA Health Informatics
● VHA: largest standardized electronic medical records system in US.
● Design, populate and tune a ~20 TB Data Warehouse and Analytics
  environment
● Evaluate population health and treatment outcomes,
● Support epidemiological studies
   – 7 million enrollees
   – 5 million patients
   – Example Milestones:
      • 1 Billionth Vital Sign loaded
        in April „06
      • 30-minutes to population-wide
        obesity analysis (next slide)
      • Discovered seasonality in
        blood pressure -- NEJM fall „06
HDR Vitals Based Body Mass Index Calculation on VHA FY04 Population
 Source: VHA Corporate Data Warehouse
                                                     V H A P a tie n ts in B M I C a te g o rie s (B a s e d u p o n v ita ls fro m F Y 0 4 )
W t/H t   5 ft 0 in   5 ft 1 in   5 ft 2 in   5 ft 3 in   5 ft 4 in   5 ft 5 in   5 ft 6 in    5 ft 7 in    5 ft 8 in    5 ft 9 in    5 ft 1 0 in   5 ft 1 1 in   6 ft 0 in    6 ft 1 in   6 ft 2 in   6 ft 3 in   6 ft 4 in   6 ft 5 in                   L eg en d
100           230         211         334         276         316         364          346          300          244          172           114             73           58          16          11            3           1           1   B M I < 1 8 U n d e rw e ig h t
105           339         364         518         532         558         561          584          515          436          284           226           144          102           25          13           4            4           1   B M I 1 8 -2 4 .9 H e a lth y W e ig h t
110           488         489         836         815         955         972       1 ,0 3 1        899          680          521           395           256          161           70          23          10            6           4   B M I 2 5 -2 9 .9 O ve rw e ig h t
115           526         614      1 ,0 1 8    1 ,0 9 8    1 ,3 2 6    1 ,3 2 5     1 ,6 0 7     1 ,4 2 6     1 ,1 7 5        903           598           451          264           84          59          17            6           4   B M I 3 0 + O b ese
120           644         714      1 ,4 1 9    1 ,5 8 3    1 ,9 6 4    2 ,1 5 3     2 ,6 1 2     2 ,3 7 4     1 ,9 3 3     1 ,4 5 0      1 ,0 8 5         690          501         153           95          38          13            9
125           672         855      1 ,6 8 2    1 ,9 3 3    2 ,6 2 8    3 ,0 0 5     3 ,5 2 1     3 ,4 0 5     2 ,9 2 9     2 ,1 9 7      1 ,5 3 8      1 ,1 4 4        756         253         114           46          32            8
130           753         944      1 ,9 8 4    2 ,3 9 2    3 ,4 6 2    3 ,9 6 8     5 ,0 3 9     4 ,8 2 7     4 ,2 8 5     3 ,2 2 3      2 ,3 7 8      1 ,7 6 5     1 ,1 8 2       429         214           81          41          12
135           753      1 ,0 6 2    2 ,1 7 3    2 ,8 5 2    4 ,1 0 5    4 ,9 1 2     6 ,5 3 5     6 ,5 3 5     5 ,7 9 7     4 ,5 0 0      3 ,3 9 3      2 ,4 6 7     1 ,6 6 8       596         309         108          70           15
140           754      1 ,0 7 3    2 ,3 0 0    3 ,1 7 7    4 ,9 3 7    6 ,2 8 6     8 ,7 6 9     8 ,7 5 0     7 ,9 3 9     6 ,3 0 3      4 ,8 3 7      3 ,4 9 3     2 ,5 3 4       977         513         144         106           22           Total Patients
                                                                                                                                                                                                                                                  23,876 (0.7%)
145           748      1 ,0 5 3    2 ,2 5 4    3 ,3 8 9    5 ,4 1 2    7 ,3 3 4   1 0 ,4 8 5   1 1 ,0 0 4   1 0 ,5 7 6     8 ,0 8 4      6 ,5 1 1      4 ,6 8 6     3 ,3 4 4    1 ,2 0 7       680         221         140           41
150           730      1 ,0 7 7    2 ,3 6 1    3 ,5 9 6    6 ,1 5 2    8 ,6 6 5   1 2 ,7 7 2   1 4 ,3 3 5   1 3 ,8 6 6   1 1 ,2 5 5      9 ,2 5 0      6 ,5 4 5     4 ,7 9 6    1 ,7 9 2       979         350         162           48
155           683         923      2 ,1 7 8    3 ,3 9 1    6 ,0 3 1    8 ,8 9 1   1 4 ,1 8 1   1 5 ,8 9 9   1 6 ,5 9 4   1 3 ,5 1 7    1 1 ,4 8 9      8 ,0 5 6     5 ,7 4 1    2 ,1 5 5    1 ,2 0 3       472         249           70
160           671         872      2 ,1 0 6    3 ,5 3 2    6 ,1 8 4    9 ,5 8 0   1 5 ,4 9 3   1 8 ,8 6 9   1 9 ,9 3 9   1 7 ,0 4 6    1 4 ,6 5 0    1 0 ,3 6 6     7 ,7 0 8    2 ,8 3 1    1 ,6 1 8       615         341         100
165           627         772      1 ,8 9 4    3 ,0 7 4    5 ,7 7 3    9 ,5 4 9   1 6 ,3 3 2   2 0 ,0 8 0   2 2 ,5 0 7   1 9 ,6 9 2    1 7 ,7 2 9    1 2 ,5 8 8     9 ,5 5 8    3 ,5 4 8    2 ,0 3 2       716         399         117
170           596         750      1 ,7 1 6    2 ,9 0 0    5 ,4 2 8    9 ,0 8 0   1 6 ,6 3 3   2 1 ,5 5 0   2 5 ,0 5 1   2 2 ,5 6 8    2 1 ,1 9 8    1 5 ,5 5 2   1 2 ,0 9 3    4 ,5 4 8    2 ,6 2 6       944         489         124
175           493         674      1 ,5 2 1    2 ,5 5 1    4 ,8 1 6    8 ,4 1 7   1 5 ,9 0 0   2 1 ,4 2 0   2 6 ,2 6 2   2 4 ,2 7 7    2 3 ,7 5 6    1 8 ,1 9 4   1 3 ,8 1 7    5 ,3 6 1    3 ,1 7 8    1 ,1 5 2       586         144
180           486         599      1 ,4 1 1    2 ,3 2 3    4 ,5 8 4    7 ,8 5 5   1 5 ,4 8 2   2 0 ,8 7 3   2 6 ,9 2 2   2 6 ,0 6 7    2 6 ,3 1 3    2 0 ,3 5 8   1 6 ,4 5 9    6 ,4 5 1    3 ,8 4 8    1 ,4 4 1       737         207
185           420         546      1 ,1 9 5    1 ,9 8 5    3 ,9 0 5    6 ,9 1 8   1 3 ,4 0 6   1 9 ,3 6 2   2 5 ,8 1 8   2 5 ,6 2 0    2 7 ,0 3 7    2 1 ,7 9 9   1 8 ,1 7 2    7 ,2 0 6    4 ,4 5 8    1 ,5 4 8       867         247
190           424         495      1 ,0 7 3    1 ,7 2 9    3 ,3 8 3    5 ,9 0 9   1 1 ,9 1 8   1 7 ,6 4 0   2 4 ,2 7 7   2 5 ,2 6 3    2 7 ,3 9 8    2 2 ,6 9 7   1 9 ,9 7 7    8 ,3 4 4    4 ,9 3 7    1 ,8 5 8       963         287
195           341         463         913      1 ,4 7 4    2 ,8 0 3    5 ,2 0 7   1 0 ,5 8 4   1 5 ,7 2 7   2 2 ,1 3 7   2 3 ,8 6 0    2 6 ,3 7 3    2 2 ,5 1 3   2 0 ,1 6 3    8 ,7 5 4    5 ,6 8 3    2 ,1 7 8    1 ,1 2 0       309
200           315         384         763      1 ,3 3 8    2 ,6 0 2    4 ,5 5 1     9 ,4 1 3   1 4 ,1 4 9   2 0 ,6 0 8   2 2 ,5 4 1    2 5 ,4 5 2    2 3 ,3 5 8   2 1 ,5 4 8    9 ,2 8 4    6 ,2 2 1    2 ,2 9 4    1 ,2 9 5       372
205           265         338         633      1 ,0 2 6    1 ,9 9 3    3 ,7 3 6     7 ,7 6 5   1 1 ,9 4 0   1 7 ,5 0 1   1 9 ,9 4 4    2 3 ,0 6 5    2 1 ,0 9 4   2 0 ,3 5 4    9 ,2 7 0    6 ,3 5 0    2 ,5 9 7    1 ,3 2 2       376
210           275         284         543         853      1 ,7 9 4    3 ,1 4 8     6 ,8 0 4   1 0 ,5 4 0   1 5 ,6 4 7   1 8 ,1 2 9    2 1 ,8 6 2    2 0 ,5 4 0   2 0 ,2 7 1    9 ,5 6 6    6 ,8 1 6    2 ,7 8 6    1 ,5 0 9       418
215           205         244         501         746      1 ,3 8 9    2 ,6 4 5     5 ,7 4 7     8 ,7 1 2   1 3 ,0 6 4   1 5 ,5 6 0    1 9 ,0 8 9    1 8 ,1 9 1   1 9 ,0 6 3    9 ,0 1 9    6 ,6 7 5    2 ,7 9 8    1 ,5 0 9       454
220           168         208         415         652      1 ,2 3 1    2 ,3 2 6     4 ,9 5 0     7 ,7 5 1   1 1 ,6 4 5   1 3 ,9 0 0    1 7 ,5 7 7    1 7 ,2 3 9   1 7 ,5 8 3    8 ,8 9 6    6 ,8 1 8    2 ,9 4 8    1 ,6 3 5       484

                                                                                                                                                                                                                                                701,089 (21.6%)
225           156         160         325         522         968      1 ,8 7 3     4 ,0 1 5     6 ,3 4 0     9 ,7 9 4   1 1 ,8 9 0    1 4 ,8 9 8    1 5 ,0 9 7   1 5 ,7 4 1    8 ,3 3 2    6 ,4 4 1    2 ,9 1 5    1 ,6 4 7       452
230           141         160         259         486         880      1 ,6 5 3     3 ,3 3 4     5 ,4 1 0     8 ,6 5 7   1 0 ,5 0 0    1 3 ,5 3 2    1 3 ,4 8 8   1 4 ,8 1 5    7 ,9 0 1    6 ,2 5 8    2 ,8 5 9    1 ,7 0 1       496
235           115         119         244         373         738      1 ,2 5 1     2 ,7 9 5     4 ,5 7 0     7 ,1 9 2     8 ,7 8 4    1 1 ,4 8 9    1 1 ,8 5 7   1 2 ,7 9 6    7 ,1 1 3    5 ,5 4 4    2 ,7 4 4    1 ,6 1 7       465
240             72        116         214         313         562      1 ,0 9 9     2 ,4 2 2     3 ,8 6 1     6 ,0 4 4     7 ,6 5 2      9 ,9 8 2    1 0 ,6 9 2   1 1 ,8 2 5    6 ,4 9 6    5 ,3 9 2    2 ,6 0 6    1 ,5 8 1       449
245             71          76        169         253         509         888       1 ,8 5 8     3 ,1 6 7     5 ,0 7 6     6 ,4 4 6      8 ,3 1 2      8 ,6 4 7     9 ,9 1 0    5 ,6 3 8    4 ,7 4 2    2 ,2 6 3    1 ,4 7 9       469
250             70          55        152         226         452         753       1 ,6 4 7     2 ,8 2 6     4 ,5 0 5     5 ,5 0 9      7 ,5 6 9      8 ,0 6 4     8 ,9 0 0    5 ,1 8 3    4 ,3 1 9    2 ,1 7 7    1 ,4 5 1       469
255             59          61        128         174         316         599       1 ,2 8 9     2 ,1 3 0     3 ,4 6 8     4 ,5 4 0      5 ,9 5 7      6 ,4 5 1     7 ,4 3 8    4 ,3 2 0    3 ,7 4 1    1 ,9 0 3    1 ,2 7 1       443
260             50          64        117         167         281         493       1 ,1 0 7     1 ,9 2 9     2 ,9 6 3     3 ,9 4 7      5 ,1 9 0      5 ,7 9 7     6 ,7 2 5    3 ,9 0 0    3 ,4 2 9    1 ,8 2 8    1 ,2 1 8       481
265
270
                37
                47
                            34
                            42
                                        88
                                        67
                                                  122
                                                  119
                                                              234
                                                              203
                                                                          454
                                                                          367
                                                                                       894
                                                                                       800
                                                                                                 1 ,4 4 9
                                                                                                 1 ,2 9 1
                                                                                                              2 ,4 5 7
                                                                                                              2 ,1 1 0
                                                                                                                           3 ,1 5 2
                                                                                                                           2 ,7 4 0
                                                                                                                                         4 ,3 7 4
                                                                                                                                         3 ,8 7 8
                                                                                                                                                       4 ,8 1 8
                                                                                                                                                       4 ,1 3 3
                                                                                                                                                                    5 ,7 2 9
                                                                                                                                                                    5 ,0 7 5
                                                                                                                                                                                3 ,3 5 0
                                                                                                                                                                                2 ,9 3 4
                                                                                                                                                                                            2 ,9 8 4
                                                                                                                                                                                            2 ,6 8 5
                                                                                                                                                                                                        1 ,5 3 9
                                                                                                                                                                                                        1 ,4 6 8
                                                                                                                                                                                                                    1 ,0 2 8
                                                                                                                                                                                                                       918
                                                                                                                                                                                                                                   406
                                                                                                                                                                                                                                   403
                                                                                                                                                                                                                                              1,177,093 (36.2%)
275             22          34          44          85        184         291          662       1 ,0 6 4     1 ,7 6 7     2 ,2 3 5      3 ,1 1 3      3 ,4 1 2     4 ,2 6 7    2 ,5 9 8    2 ,3 6 2    1 ,2 4 7       837         334
280             21          20          51          69        139         286          548          903       1 ,5 1 3     1 ,9 5 5      2 ,7 7 0      3 ,1 2 6     3 ,6 0 4    2 ,2 7 3    2 ,0 2 0    1 ,1 5 2       763         300
285             12          12          36          68        118         201          451          720       1 ,3 1 8     1 ,6 1 3      2 ,2 0 8      2 ,3 9 4     3 ,1 3 2    1 ,9 2 4    1 ,7 8 0       994         677         241
290             16          14          47          38          92        182          387          667       1 ,0 5 0     1 ,3 0 1      1 ,9 0 4      2 ,1 5 0     2 ,6 5 5    1 ,7 4 9    1 ,5 2 9       881         688         252
295               9         12          22          53          92        127          341          493          838       1 ,1 6 2      1 ,5 7 7      1 ,8 2 3     2 ,3 3 8    1 ,4 4 5    1 ,3 3 3       813         533         202
300             12          10          30          43          59        117          309          434          764          988        1 ,4 2 8      1 ,5 8 8     1 ,9 8 9    1 ,2 5 5    1 ,2 1 2       709         479         205
                                                                                                                 DRAFT
Jim Gray’s work on Fourth Paradigm
and eScience has had a profound
impact on the scientific community.

This work continues …


                                      95
Jim Gray eScience Award
Each year, Microsoft Research presents the Jim Gray eScience
Award to a researcher who has made an outstanding
contribution to the field of data-intensive computing. The
award recognizes innovators whose work truly makes science
easier for scientists.




                                                          96
97
Jim Gray’s Legacy
● The Prolific Writer
   – Jim Gray‘s two rules for authorship:
       • The person who types puts their name first, and
       • It‘s easier to add a name to the list of authors     Ideas
         than deal with someone‘s hurt feelings.
● The Masterful Presenter
● The Sense of Community
● The Patient Listener                            Community           People


                                                                           98
Jim’s Life was a
Text Book on Mentoring

●   Making time
●   Simply Listening             ● Promoting the Young
●   Inspiring Self-Confidence    ● Sharing Knowledge Selflessly
●   Lighting the Way             ● Displaying Professional
●   Nurturing and Pushing          Integrity
●   Following the Muse           ● Advocating for the Field
●   Connecting Good People and   ● Keeping things in Perspective
    Good Ideas Without           ● Being a friend
    Boundaries
                                                                   99
100
Lost at Sea …. January 28, 2007




                                  101
The Search for Jim Gray




                          102
The University of
California, Berkeley and
Gray's family hosted a
tribute to him on May
31, 2008.
http://www.youtube.com/user/UCBerkeleyE
vents/videos?query=jim+gray
                                     103
104
Good references
● Microsoft Faculty Summit 2011
   – http://research.microsoft.com/en-us/events/fs2011/
   – Tony Hey‘s presentations at the event
   – http://research.microsoft.com/en-
     us/events/fs2011/welcome_introduction_hey_faculitysummit_071811.pdf
● The Fourth Paradigm book
   – http://research.microsoft.com/en-
     us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf
● Jim Gray‘s work
   – http://research.microsoft.com/en-us/um/people/gray/
● Alex Szalay‘s work on Large Databases and Science
   – http://www.sdss.jhu.edu/~szalay/servers.html


                                       105

Contenu connexe

En vedette

En vedette (7)

Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
 
Life and Work of Ivan Sutherland | Turing100@Persistent
Life and Work of Ivan Sutherland | Turing100@PersistentLife and Work of Ivan Sutherland | Turing100@Persistent
Life and Work of Ivan Sutherland | Turing100@Persistent
 
Life and Work of Ken Thompson and Dennis Ritchie | Turing Techtalk
Life and Work of Ken Thompson and Dennis Ritchie | Turing TechtalkLife and Work of Ken Thompson and Dennis Ritchie | Turing Techtalk
Life and Work of Ken Thompson and Dennis Ritchie | Turing Techtalk
 
Embedded Linux Evolution | Turing Techtalk
Embedded Linux Evolution | Turing TechtalkEmbedded Linux Evolution | Turing Techtalk
Embedded Linux Evolution | Turing Techtalk
 
Persistent Systems
Persistent SystemsPersistent Systems
Persistent Systems
 
Skilling for SMAC by Anand Deshpande, Founder, Chairman and Managing Director...
Skilling for SMAC by Anand Deshpande, Founder, Chairman and Managing Director...Skilling for SMAC by Anand Deshpande, Founder, Chairman and Managing Director...
Skilling for SMAC by Anand Deshpande, Founder, Chairman and Managing Director...
 
Company Overview Presentation
Company Overview PresentationCompany Overview Presentation
Company Overview Presentation
 

Similaire à Life and Work of Jim Gray | Turing100@Persistent

FALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptx
FALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptxFALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptx
FALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptx
hritikraj888
 
Unit 4 chapter - 8 Transaction processing Concepts (1).pptx
Unit 4 chapter - 8 Transaction processing Concepts (1).pptxUnit 4 chapter - 8 Transaction processing Concepts (1).pptx
Unit 4 chapter - 8 Transaction processing Concepts (1).pptx
Koteswari Kasireddy
 
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfLecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
badboy624277
 

Similaire à Life and Work of Jim Gray | Turing100@Persistent (20)

Lec08
Lec08Lec08
Lec08
 
FALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptx
FALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptxFALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptx
FALLSEM2023-24_BCSE302L_TH_VL2023240100957_2023-06-21_Reference-Material-I.pptx
 
Building Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache GeodeBuilding Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache Geode
 
In-Memory Data Grids: Explained...
In-Memory Data Grids: Explained...In-Memory Data Grids: Explained...
In-Memory Data Grids: Explained...
 
Presentation on Transaction
Presentation on TransactionPresentation on Transaction
Presentation on Transaction
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Unit 4 chapter - 8 Transaction processing Concepts (1).pptx
Unit 4 chapter - 8 Transaction processing Concepts (1).pptxUnit 4 chapter - 8 Transaction processing Concepts (1).pptx
Unit 4 chapter - 8 Transaction processing Concepts (1).pptx
 
Architectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale SystemsArchitectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale Systems
 
Databases through out and beyond Big Data hype
Databases through out and beyond Big Data hypeDatabases through out and beyond Big Data hype
Databases through out and beyond Big Data hype
 
Tps presentation
Tps presentationTps presentation
Tps presentation
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
CQRS + Event Sourcing
CQRS + Event SourcingCQRS + Event Sourcing
CQRS + Event Sourcing
 
Rails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdfRails DB migrate SAFE.pdf
Rails DB migrate SAFE.pdf
 
DBMS unit-5.pdf
DBMS unit-5.pdfDBMS unit-5.pdf
DBMS unit-5.pdf
 
PostgreSQL as Enterprise Solution v1.1.pdf
PostgreSQL as Enterprise Solution v1.1.pdfPostgreSQL as Enterprise Solution v1.1.pdf
PostgreSQL as Enterprise Solution v1.1.pdf
 
Migrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetesMigrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetes
 
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfLecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
 
Bandwidth, Throughput, Iops, And Flops
Bandwidth, Throughput, Iops, And FlopsBandwidth, Throughput, Iops, And Flops
Bandwidth, Throughput, Iops, And Flops
 
Data Patterns
Data PatternsData Patterns
Data Patterns
 
EQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdfEQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdf
 

Plus de Persistent Systems Ltd.

Plus de Persistent Systems Ltd. (14)

What is wrong with the Internet? [On the foundations of internet security, fu...
What is wrong with the Internet? [On the foundations of internet security, fu...What is wrong with the Internet? [On the foundations of internet security, fu...
What is wrong with the Internet? [On the foundations of internet security, fu...
 
Life and Work of Ronald L. Rivest, Adi Shamir & Leonard M. Adleman | Turing10...
Life and Work of Ronald L. Rivest, Adi Shamir & Leonard M. Adleman | Turing10...Life and Work of Ronald L. Rivest, Adi Shamir & Leonard M. Adleman | Turing10...
Life and Work of Ronald L. Rivest, Adi Shamir & Leonard M. Adleman | Turing10...
 
Life and Work of Judea Perl | Turing100@Persistent
Life and Work of Judea Perl | Turing100@PersistentLife and Work of Judea Perl | Turing100@Persistent
Life and Work of Judea Perl | Turing100@Persistent
 
Early History of Fortran: The Making of a Wonder | Turing100@Persistent
Early History of Fortran: The Making of a Wonder | Turing100@PersistentEarly History of Fortran: The Making of a Wonder | Turing100@Persistent
Early History of Fortran: The Making of a Wonder | Turing100@Persistent
 
Life and Work of Dr. John Backus | Turing100@Persistent
Life and Work of Dr. John Backus | Turing100@PersistentLife and Work of Dr. John Backus | Turing100@Persistent
Life and Work of Dr. John Backus | Turing100@Persistent
 
Software Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@PersistentSoftware Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@Persistent
 
System Anecdotes | Turing100@Persistent
System Anecdotes | Turing100@PersistentSystem Anecdotes | Turing100@Persistent
System Anecdotes | Turing100@Persistent
 
Systems Design Experiences or Just Some War Stories…
Systems Design Experiences or Just Some War Stories…Systems Design Experiences or Just Some War Stories…
Systems Design Experiences or Just Some War Stories…
 
Life & Work of Robin Milner | Turing100@Persistent
Life & Work of Robin Milner | Turing100@PersistentLife & Work of Robin Milner | Turing100@Persistent
Life & Work of Robin Milner | Turing100@Persistent
 
Life & Work of Dr. Vinton Cerf and Dr. Robert Kahn | Turing100@Persistent
Life & Work of Dr. Vinton Cerf and Dr. Robert Kahn | Turing100@PersistentLife & Work of Dr. Vinton Cerf and Dr. Robert Kahn | Turing100@Persistent
Life & Work of Dr. Vinton Cerf and Dr. Robert Kahn | Turing100@Persistent
 
Net Neutrality | Turing100@Persistent Systems
Net Neutrality | Turing100@Persistent SystemsNet Neutrality | Turing100@Persistent Systems
Net Neutrality | Turing100@Persistent Systems
 
Alan Turing Scientist Unlimited | Turing100@Persistent Systems
Alan Turing Scientist Unlimited | Turing100@Persistent SystemsAlan Turing Scientist Unlimited | Turing100@Persistent Systems
Alan Turing Scientist Unlimited | Turing100@Persistent Systems
 
Life and work of E.F. (Ted) Codd | Turing100@Persistent
Life and work of E.F. (Ted) Codd | Turing100@PersistentLife and work of E.F. (Ted) Codd | Turing100@Persistent
Life and work of E.F. (Ted) Codd | Turing100@Persistent
 
Alan Turing Centenary @ Persistent Systems
Alan Turing Centenary @ Persistent SystemsAlan Turing Centenary @ Persistent Systems
Alan Turing Centenary @ Persistent Systems
 

Life and Work of Jim Gray | Turing100@Persistent

  • 1. Life and Work of Jim Gray January 5, 2013 1
  • 2. 2
  • 3. JAMES ("JIM") NICHOLAS GRAY United States – 1998 CITATION For fundamental contributions to database and transaction processing research and technical leadership in system implementation from research prototypes to commercial products. The transaction is the fundamental abstraction underlying database system concurrency and failure recovery. Gray’s work [defined] the key transaction properties: atomicity, consistency, isolation and durability, and his locking and recovery work demonstrated how to build … systems that exhibit these properties. 3
  • 4. E. F. Codd invented the Relational Databases in 1970 and created what is a 100+ Billion Dollar/year Industry today.
  • 5. Codd’s Relational Model ● Simple model ● Data stored in relational tables ● Data Independence – separation of data storage and data access ● Declarative Queries ● Algebra to mathematically reason about data objects – made query optimization possible ● Ad-hoc queries through SQL. ● Embedded in operational systems. 5
  • 6. ACID properties are fundamental to Relational Systems and necessary for on-line transaction processing (OLTP) systems Atomicity ● Jim Gray defined ACID properties to guarantee Consistency database transactions are Isolation processed reliably. Durability 6
  • 7. From Transactions to Transaction Processing Systems - II Reality Abstraction DB Change Transaction Q u ery ' DB Answer The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database. 7
  • 8. Gray defined Data Manipulation Actions as • transient and • grouped into • involve sensors, internal state transactions and actuators etc. They reflected in the cannot be undone state of transaction they can be outcome compensated. Unprotected Protected Real 8
  • 9. Definitions ● A transaction is a sequence of operations that form a single unit of work ● A transaction is often initiated by an application program – begin a transaction START TRANSACTION – end a transaction COMMIT (if successful) or ROLLBACK (if errors) ● Either the whole transaction must succeed or the effect of all operations has to be undone (rollback) ● To achieve durable transaction atomicity, the transition to the ―committed‖ state must be accomplished by the single write to non-volatile storage. 9
  • 10. Structure of a Transaction Program BEGIN WORK () ROLL BACK WORK () WORK ROLL BACK WORK () COMMIT WORK () 10
  • 11. While at IBM San Jose Research Laboratory October 1972 to December 1980 ● Jim Gray developed three key ideas related to transaction concurrency control: – The notion of transaction – Serializability; degrees of consistency; – Multi-granularity locking. ● There are two main transaction issues – concurrent execution of multiple transactions – recovery after hardware failures and system crashes 11
  • 12. Write Ahead Log (WAL) protocol ● The WAL protocol records the old and new states induced by protected actions separately from the actual state changes. ● The logged changes are written to stable storage before the actual changes are written back to stable storage (that‘s the ―Write Ahead‖ part). ● Transactions are committed by simply appending and writing a ‗commit‘ record to the recovery log. Logged changes are used to undo protected actions of aborted transactions and of transactions in progress at the time of a system failure. 12
  • 13. Write Ahead Log (WAL) protocol ● Log records are also used to redo committed actions whose actual changes have not been written back to stable storage at the time of a system failure. ● The WAL protocol allows changed data to be written to their stable storage home at any time after the log records describing the changes have been written into the stable log. ● This gives the Database Manager great flexibility in managing the contents of its volatile data buffer pools. 13
  • 14. ACID Properties: First Definition ● Atomicity: A transaction‘s changes to the state are atomic: either all happen or none happen. These changes include database changes, messages, and actions on transducers. ● Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program. ● Isolation: Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T or after T, but not both. ● Durability: Once a transaction completes successfully (commits), its changes to the state survive failures. 14
  • 15. [Gray 1993] Jim Gray and Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann, San Mateo, CA (1993). 15
  • 16. In 1985, Jim and a number of other senior leaders in the field of transaction processing started the HPTS (High Performance Transaction Systems) Workshop [HPTS]. This is a biennial gathering of folks interested in transaction systems (and things related to scalable systems). It includes people from competing companies in industry and also from academia. Over the last 22 years, it has evolved to include many different topics as high-end computing morphed from the mainframe to the Internet. 16
  • 17. The early years … ● Born January 12, 1944 ● 1961 graduated from Westmoor High School in San Francisco. ● 1966 graduated from the University of California at Berkeley with bachelor‘s degree in mathematics and engineering. 17
  • 18. James Nicholas Gray was born in San Francisco, California on 12 January 1944. ● In 1961 Gray graduated from Westmoor High School in San Francisco. ● He graduated from the University of California at Berkeley bachelor‘s degree in mathematics and engineering in 1966. ● After spending a year in New Jersey working at Bell Laboratories in Murray Hill and attending classes at the Courant Institute in New York City, he returned to Berkeley and enrolled in the newly-formed computer science department, earning a Ph.D. in 1969 for work on context-free grammars and formal language theory. 18
  • 19. 5-minute rule for Memory vs. Disk Access (1987) When does it make economic sense to hold pages in memory versus doing IO every time data from the page is accessed? THE FIVE MINUTE RULE Pages referenced every five minutes should be memory resident. 19
  • 20. From Tandem Report 1987: Jim Gray and Gianfranco Putzolu ● The argument goes as follows: A Tandem disc, and half a controller comfortably deliver 15 accesses per second and are priced at 15K$ for a small disc and 20K$ for a large disc (180Mb and 540Mb respectively). ● So the price per access per second is about 1K$. The extra CPU and channel cost for supporting a disc are lK$/a/s. So one disc access per second costs about 2K$ on a Tandem system. ● A megabyte of Tandem main memory costs 5K$, so a kilobyte costs 5$. 20
  • 21. ● If making a 1Kb record resident saves 1a/s, then it saves about 2K$ worth of disc accesses at a cost of 5$, a good deal. If it saves 0.1 a/s then it saves about 200$, still a good deal. Continuing this, the break even point is an access every 2000/5 - 400 seconds. ● So, any 1KB record accessed more frequently than every 400 seconds should live in main memory. 400 seconds is "about" 5 minutes, hence the name: the Five Minute Rule. 21
  • 22. 5-minute rule ● The five-minute rule is based on the tradeoff between the cost of RAM and the cost of disk accesses. 22
  • 23. 5-minute rule ● The five-minute rule is based on the tradeoff between the cost of RAM and the cost of disk accesses. 23
  • 24. 1997 – Ten years later 24
  • 25. New Storage Metrics: Kaps, Maps, SCAN ● Kaps: How many kilobyte objects served per second – The file server, transaction processing metric – This is the OLD metric. ● Maps: How many megabyte objects served per sec – The Multi-Media metric ● SCAN: How long to scan all the data – the data mining and utility metric ● And – Kaps/$, Maps/$, TBscan/$ 25
  • 26. Disk Changes ● Disks got cheaper: 20k$ -> 1K$ (or even 200$) – $/Kaps etc improved 100x (Moore‘s law!) (or even 500x) – One-time event (went from mainframe prices to PC prices) ● Disk data got cooler (10x per decade): – 1990 disk ~ 1GB and 50Kaps and 5 minute scan – 2000 disk ~70GB and 120Kaps and 45 minute scan ● So – 1990: 1 Kaps per 20 MB – 2000: 1 Kaps per 500 MB – disk scans take longer (10x per decade) ● Backup/restore takes a long time (too long) 26
  • 27. Storage Ratios Changed ● 10x better access time ● DRAM/disk media price ● 10x more bandwidth ratio changed ● 100x more capacity – 1970-1990 100:1 – 1990-1995 10:1 ● Data 25x cooler – 1995-1997 50:1 (1Kaps/20MB vs – today 1Kaps/500MB) ~ 0.03$/MB disk 100:1 ● 4,000x lower media price 3$/MB dram ● 20x to 100x lower disk price ● Scan takes 10x longer (3 min vs 45 min) 27
  • 28. The Five Minute Rule ● Trade DRAM for Disk Accesses ● Cost of an access (DriveCost / Access_per_second) ● Cost of a DRAM page ( $/MB / pages_per_MB) ● Break even has two terms: ● Technology term and an Economic term ● Grew page size to compensate for changing ratios. ● Still at 5 minute for random, 1 minute sequential From his presentations in 2000 28
  • 29. Data on Disk Can Move to RAM in 10 years Storage Price vs Time Megabytes per kilo-dollar 10,000. 1,000. 100. MB/k$ 100:1 10. 10 years 1. 0.1 1980 1990 2000 Ye ar 29
  • 30. Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs Size vs Speed Price vs Speed 1015 Nearline Cache 102 Tape Typical System (bytes) Offline Main 1012 Disc Tape Secondary 100 Online $/MB Secondary Online Tape Tape Disc 109 Main 10-2 Nearline Offline Tape Tape 106 10-4 Cache 103 10-6 10-9 10-6 10-3 10 0 10 3 10-9 10-6 10-3 10 0 10 3 Access Time (seconds) Access Time (seconds) 30
  • 31. 5-minute rule holds in 1997 ● In summary, the five-minute rule still seems to apply to randomly accessed pages, primarily because page sizes have grown from 1KB to 8KB to compensate for changing technology ratios. 31
  • 32. Storage Latency: How Far Away is the Data? Andromeda 9 10 Tape /Optical 2,000 Years Robot 106 Disk Pluto 2 Years Olympia 1.5 hr 100 Memory 10 On Board Cache This Hotel 10 min 2 On Chip Cache This Room 1 Registers My Head 1 min 32 From Jim Gray‟s Rules of Thumb in Data Engineering Presentation
  • 33. What’s TeraByte? ● 1 Terabyte: – 1,000,000,000 business letters 150 miles of book shelf – 100,000,000 book pages 15 miles of book shelf – 50,000,000 FAX images 7 miles of book shelf – 10,000,000 TV pictures (mpeg) 10 days of video – 4,000 LandSat images 16 earth images (100m) – 100,000,000 web page 10 copies of the web HTML ● Library of Congress (in ASCII) is 25 TB – 1980: $200 million of disc 10,000 discs – $5 million of tape silo 10,000 tapes – 1997: 200 k$ of magnetic disc 48 discs – 30 k$ nearline tape 20 tapes Jim Gray‘s presentations 1995 Terror Byte ! 33
  • 34. Yotta How Much Information Is there? Everything! Zetta ● Soon everything can be Recorded recorded and indexed All Books Exa ● Most data never be seen by MultiMedia humans Peta All LoC books ● Precious Resource: (words) Tera Human attention – Auto-Summarization .Movie – Auto-Search Giga is key technology. http://www.lesk.com/mlesk/ksg97 A Photo /ksg.html Mega 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli A Book 34 Kilo
  • 35. 2007: Twenty Years Later 35
  • 36. The 5-minute rule holds in 2007 ● The old five-minute rule for RAM and disk now applies to 64KB page sizes (334 seconds). – Five minutes had been the approximate break-even interval for 1KB in 198715and for 8KB in 1997.14 ● The five-minute break-even interval also applies to RAM and the expensive flash memory of 2007 for page sizes of 64KB and above (365 seconds and 339 seconds). – As the price premium for flash memory decreases, so does the break-even interval (146 seconds and 136 seconds). 36
  • 37. Flash memory falls between traditional RAM and persistent mass storage based on rotating disks in terms of acquisition cost, access latency, transfer bandwidth, spatial density, power consumption, and cooling costs. 37
  • 38. 20 years out: Summary and Conclusion ● The 20-year-old five-minute rule for RAM and disks still holds, but for ever-larger disk pages. ● It should be augmented by two new five-minute rules: – for small pages moving between RAM and flash memory and – for large pages moving between flash memory and traditional disks. ● For small pages moving between RAM and disk, Gray and Putzolu were amazingly accurate in predicting a five-hour break-even point 20 years into the future. 38
  • 39. 39
  • 40. 40
  • 41. Data Cube 41
  • 42. Aggregates in SQL ● The SQL standard [Melton, Simon] provides five SUM() aggregate functions: COUNT, SUM, MIN, MAX, AVG SELECT [DISTINCT] AVG(Temp) FROM Weather; ● Aggregate functions return a single value. In addition, SQL allows aggregation over distinct values. Table attribute SUM() A ● Using GROUP BY , SQL can create a table of aggregate A A A values indexed by a set of attributes. B B B B B SELECT Time, Altitude, AVG(Temp) B C C FROM Weather C C C C GROUP BY Time, Altitude; D D D 42
  • 43. Problems With This Design ● Users Want Histograms ● Users want sub-totals and totals sum – drill-down & roll-up reports F() G() H() ● Users want CrossTabs ● Conventional wisdom – These are not relational operators AIR M T W T F S S • – They are in many report writers and HOTEL FOOD query engines MISC • 43
  • 44. Other Variants – Illustra ● init(&handle): – Allocates the handle and initializes the aggregate computation. ● iter(&handle, value): – Aggregates the next value into the current aggregate. ● value = final(&handle): – Computes and returns the resulting aggregate by using data saved in the handle. This invocation deallocates the handle. 44
  • 45. Agg reg at e DATA CUBE and Gro up B y Su m ROLLUP (wit h t ot al) By Colo r RED WHIT E BLUE SELECT Model, Year, Color Su m SUM(Sales) AS total, C ro ss Ta b By Colo r SUM(Sales) / total(ALL,ALL,ALL) RED Chevy Ford FROM Sales WHIT E BLUE WHERE Model IN {‘Ford’, ‘Chevy’} By Make Th e Da ta C ube a nd Su m AND Year Between 1990 AND 1992 Th e Su b- Space Agg re ga te s CH FO RD 0 EV 1 9 9 91 GROUP BY CUBE(Model, Year, Color); Y 1 9 92 19 3 199 By Year By Make By Make & Year RED WHIT E BLUE By Colo r & Year By Make & Col or Su m By Colo r 45
  • 46. 46
  • 47. 47
  • 48. A Dozen Information Technology Research Goals 1. Scalability: Devise a software and hardware architecture that scales up by a factor of 106. That is, an application‘s storage and processing capacity can automatically grow by a factor of million, doing jobs faster (106 x speedup) or doing larger jobs in the same time (106 x scale-up), just by adding more resources. 2. The Turing Test: Build a computer system that wins the imitation game at least 30% of the time. 3. Speech to text: Hear as well as a native speaker. 4. Text to speech: Speak as well as a native speaker. 5. See as well as person: Recognize objects and motion. 48
  • 49. A Dozen Information Technology Research Goals 6. Personal Memex: Record every thing a person sees and hears and quickly re retrieve any iteration on request. 7. World Memex: Build a system that given a text corpus, can answer questions about and summarize the text as precisely and quickly as a human expert in that field. Do the same for music, images, art and cinema. 8. Telepresence: Simulate being some other place retrospectively as an observer. (Teleobserver): hear and see as well as actually being there and as well as participant. Simulate being some other place as a participant (Telepresent): interacting with others and with the environment as though you are actually there. 49
  • 50. A Dozen Information Technology Research Goals 9. Trouble-Free Systems: Built a system used by millions of people each day and yet administered and managed by a single part-time person. 10. Secure System: Assure that the system of problem 9 services only authorized users, service cannot be denied by unauthorized users and information cannot be stolen (and prove it). 11. Always Up: Assure that the system is unavailable for less than one second per hundred years – eight s of availability (and prove it). 50
  • 51. A Dozen Information Technology Research Goals 12. Automatic Programmer: Devise a specification language or user interface that – Makes it easy for people to express designs (1,000x easier), – Computer can compile, and – Can describe all applications (is complete). The system should reason about application, asking questions about exception cases and incomplete specification. But is should not be onerous to use. 51
  • 52. Computer Industry Laws (Rules of thumb) ● Metcalf‘s law ● Moore‘s first law ● Bell‘s computer classes (7 price tiers) ● Bell‘s platform evolution ● Bell‘s platform economics ● Bill‘s law ● Software economics ● Grove‘s law ● Moore‘s second law ● Is info-demand infinite? ● The death of Grosch‘s law 52
  • 53. Gordon Bell’s Seven Price Tiers 10$: wrist watch computers 100$: pocket/ palm computers 1,000$: portable computers 10,000$: personal computers (desktop) • 100,000$: departmental computers (closet) 1,000,000$: site computers (glass house) 10,000,000$: regional computers (glass castle) Super server: costs more than $100,000 “Mainframe”: costs more than $1 million Must be an array of processors, disks, tapes, comm ports 53
  • 54. Information at your fingertips. Bill Gates is known for his long-standing belief that, as he once put it, ‖any piece of information you want should be available to you. -- Putting Information at Your Fingertips.‖ Gates championed it as early as 1989, and he was in a position to do something about it. It remained his overriding goal for the next two decades. 54
  • 55. The Vision: Global Data Federation ● Massive datasets live near their owners: – Near the instrument‘s software pipeline – Near the applications – Near data knowledge and curation ● Each Archive publishes a (web) service – Schema: documents the data – Methods on objects (queries) ● Scientists get ―personalized‖ extracts ● Uniform access to multiple Archives – A common global schema Federation 55
  • 56. Gray and Bell worked closely at Digital and at Microsoft’s Bay Area Research Center since 1994 ● MyLifeBits ● Terra Server 56
  • 57. Gordon Bell’s: MyLifeBits ● MylifeBits is a lifetime store of everything. It is the fulfillment of Vannevar Bush‘s 1945 Memex vision including full-text search, text and audio annotations, and hyperlinks. ● The experiment: Gordon Bell has captured a lifetime's worth of articles, books, cards, CDs, letters, memos, papers, photos, pictures, presentations, home movies, videotaped lectures, and voice recordings and stored them digitally. He is now paperless, and is beginning to capture phone calls, IM transcripts, television, and radio. 57
  • 58. 58
  • 59. TerraServer In late spring of 1996, Paul Flessner, the General Manager of the SQL Server team asked our lab to build a database application that would test and demonstrate the scalability of the next release of SQL Server code named ―Sphinx‖. One of Jim‘s greatest abilities was to clearly define and articulate the problem. The SQL team gave us two goals: 1. Test SQL‘s ability to scale up to support a database of one terabyte or larger. 2. An internet application where SQL marketing could demonstrate Windows and SQL Server‘s scalability. 59
  • 60. About moving research to production ―ideas don’t transfer, people transfer…” 60
  • 61. TerraServer Requirements ● BIG —1 TB of data including catalog, temporary space, etc. ● PUBLIC — available on the world wide web ● INTERESTING — to a wide audience ● ACCESSIBLE — using standard browsers (IE, Netscape) ● REAL — a LOB application (users can buy imagery) ● FREE —cannot require NDA or money to a user to access ● FAST — usable on low-speed (56kbps) and high speeds(T-1+) ● EASY — we do not want a large group to develop, deploy, or maintain the application ● CHEAP – An unwritten requirement (1) because TerraServer was only a prototype, test, and free demonstration; and (2) Jim Gray was a very frugal person! 61
  • 62. SOVINFORMSPUTNIK (the Russian Space Agency) and Aerial Images United States Geological An Interesting Internet Survey (USGS) Server http://msdn.microsoft.com/en-us/library/aa226316(v=sql.70).aspx 62
  • 63. Thesis: Scaleable Servers ● Scaleable Servers – Commodity hardware allows new applications – New applications need huge servers – Clients and servers are built of the same ―stuff‖ • Commodity software and • Commodity hardware ● Servers should be able to – Scale up (grow node by adding CPUs, disks, networks) – Scale out (grow by adding nodes) – Scale down (can start small) ● Key software technologies – Objects, Transactions, Clusters, Parallelism 63
  • 64. Thesis: Scaleable Servers ● Scaleable Servers – Commodity hardware allows new applications – New applications need huge servers – Clients and servers are built of the same ―stuff‖ • Commodity software and • Commodity hardware ● Servers should be able to – Scale up (grow node by adding CPUs, disks, networks) – Scale out (grow by adding nodes) – Scale down (can start small) ● Key software technologies – Objects, Transactions, Clusters, Parallelism 64
  • 65. Scaleable Servers BOTH SMP And Cluster Grow up with SMP; 4xP6 SMP super is now standard server Grow out with cluster Cluster has inexpensive parts Departmental server Cluster of PCs Personal system 65
  • 66. SMPs Have Advantages ● Single system image easier to manage, easier to program threads in shared memory, SMP super disk, Net server ● 4x SMP is commodity ● Software capable of 16x Departmental ● Problems: server – >4 not commodity – Scale-down problem (starter systems expensive) Personal ● There is a BIGGEST one system 66
  • 67. Grow UP and OUT 1 Terabyte DB Cluster: •a collection of nodes •as easy to program and manage as SMP super a single node server Departmental 1 billion server transactions per day Personal system 67
  • 68. Clusters Have Advantages ● Clients and servers made from the same stuff ● Inexpensive: – Built with commodity components ● Fault tolerance: – Spare modules mask failures ● Modular growth – Grow by adding small modules ● Unlimited growth: no biggest one 68
  • 69. Windows NT Clusters ● Microsoft & 60 vendors defining NT clusters – Almost all big hardware and software vendors involved ● No special hardware needed - but it may help ● Fault-tolerant first, scaleable second – Microsoft, Oracle, SAP giving demos today ● Enables – Commodity fault-tolerance – Commodity parallelism (data mining, virtual reality…) – Also great for workgroups! 69
  • 70. Parallelism The OTHER aspect of clusters ● Clusters of machines allow two kinds of parallelism – Many little jobs: online transaction processing • TPC-A, B, C… – A few big jobs: data search and analysis • TPC-D, DSS, OLAP ● Both give automatic parallelism 70
  • 71. Kinds of Parallel Execution Any Any Sequential Sequential Pipeline Program Program Partition Any Sequential Any Sequential Program Program outputs split N ways inputs merge M ways 71 Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 72. Data Rivers Split + Merge Streams N X M Data Streams M Consumers N producers River Producers add records to the river, Consumers consume records from the river Purely sequential programming. River does flow control and buffering does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano. 72 Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 73. Partitioned Execution Spreads computation and IO among processors Count Count Count Count Count Count A Table A...E F...J K...N O...S T...Z Partitioned data gives NATURAL parallelism 73 Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 74. N x M way Parallelism Merge Merge Merge Sort Sort Sort Sort Sort Join Join Join Join Join A...E F...J K...N O...S T...Z N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows 74 Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 75. Year 2000 The Year 2000 commodity PC 4B Machine 1 Bips Processor ●Billion Instructions/Sec ● .1 Billion Bytes RAM .1 B byte RAM ●Billion Bits/s Net 10 GB byte Disk ● 10 B Bytes Disk ●Billion Pixel display – 3000 x 3000 x 24 ● 1,000 $ 75 Jim Gray & Gordon Bell: 1997 presentations
  • 76. Super Server: 4T Machine ● Array of 1,000 4B machines – 1 b ips processors – 1 B B DRAM CPU – 10 B B disks 50 GB Disc – 1 Bbps comm lines 5 GB RAM – 1 TB tape robot ● A few megabucks ● Challenge: Cyber Brick – Manageability a 4B machine – Programmability – Security Future servers are CLUSTERS – Availability of processors, discs – Scaleability – Affordability Distributed database techniques make clusters work ● As easy as a single system 76 Jim Gray & Gordon Bell: 1997 presentations
  • 77. Jim Gray’s quest for real problems and real data … led to a collaboration with Astronomers. Why Astronomy Data? ● It has no commercial value – No privacy concerns – Can freely share results with others – Great for experimenting with algorithms ● It is real and well documented – High-dimensional data (with confidence intervals) – Spatial data – Temporal data ● Many different instruments from many different places and many different times ● Federation is a goal Alex Szalay ● There is a lot of it (petabytes) 77
  • 78. Availability and ability to handle very large volumes of storage and complex computing is redefining how we do Science 78
  • 79. Galileo and his telescope First Paradigm: For thousands of years, Science was about empirically describing natural phenomenon 79
  • 80. Second Paradigm: Theoretical Science using models and generalization Newton Kepler Maxwell 80
  • 81. Third Paradigm: Computational Science: Simulating Complex Phenomenon Over the last 25 years Scientists have used computer simulation to validate theories. A hurricane computer simulation. 81
  • 82. Fourth Paradigm: Data Intensive Science The scientific method was traditionally driven by hypothesis. First scientists predict a good response, then collect experimental data to validate the data against its predictions. However, in the new data-driven approach researchers start with collecting data and analyze data later. 82
  • 83. Scientists are collecting data How to codify data and extract insights and knowledge? Experiments and Instruments Simulations Question Literature Answer Other Archives 83
  • 84. Astronomy ● Help build world-wide telescope – All astronomy data and literature online and cross indexed – Tools to analyze the data ● Built SkyServer.SDSS.org ● Built Analysis system – MyDB – CasJobs (batch job) ● Results: – It works and is used every day – Spatial extensions in SQL 2005 – A good example of Data Grid – Good examples of Web Services.
  • 85. World Wide Telescope Virtual Observatory http://www.us-vo.org/ http://www.ivoa.net/ ● Premise: Most data is (or could be online) ● So, the Internet is the world‘s best telescope: – It has data on every part of the sky – In every measured spectral band: optical, x-ray, radio.. – As deep as the best instruments (2 years ago). – It is up when you are up. The ―seeing‖ is always great (no working at night, no clouds no moons no..). – It‘s a smart telescope: links objects and data to literature on them.
  • 86. SkyServer.SDSS.org ● A modern archive – Access to Sloan Digital Sky Survey Spectroscopic and Optical surveys – Raw Pixel data lives in file servers – Catalog data (derived objects) lives in Database – Online query to any and all ● Also used for education – 150 hours of online Astronomy – Implicitly teaches data analysis ● Interesting things – Spatial data search – Client query interface via Java Applet – Query from Emacs, Python, …. – Cloned by other surveys (a template design) – Web services are core of it.
  • 87. SkyServer SkyServer.SDSS.org ● Like the TerraServer, but looking the other way: a picture of ¼ of the universe ● Sloan Digital Sky Survey Data: Pixels + Data Mining ● About 400 attributes per ―object‖ ● Spectrograms for 1% of objects
  • 88. SkyQuery 88
  • 89. SkyQuery (http://skyquery.net/) ● Distributed Query tool using a set of web services ● Many astronomy archives from Pasadena, Chicago, Baltimore, Cambridge (England) ● Has grown from 4 to 15 archives, now becoming international standard WebService Poster Child ●SELECT o.objId, o.r, o.type, t.objId ● Allows queries like: FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2
  • 90. SkyServer/SkyQuery Evolution MyDB and Batch Jobs Problem: need multi-step data analysis (not just single query). Solution: Allow personal databases on portal Problem: some queries are monsters Solution: ―Batch schedule‖ on portal. Deposits answer in personal database.
  • 91. Ecosystem Sensor Net LifeUnderYourFeet.Org ● Small sensor net monitoring soil ● Sensors feed to a database ● Helping build system to collect & organize data. ● Working on data analysis tools ● Prototype for other LIMS Laboratory Information Management Systems
  • 92. RNA Structural Genomics ● Goal: Predict secondary and tertiary structure from sequence. Deduce tree of life. ● Technique: Analyze sequence variations sharing a common structure across tree of life ● Representing structurally aligned sequences is a key challenge ● Creating a database-driven alignment workbench accessing public and private sequence data
  • 93. VHA Health Informatics ● VHA: largest standardized electronic medical records system in US. ● Design, populate and tune a ~20 TB Data Warehouse and Analytics environment ● Evaluate population health and treatment outcomes, ● Support epidemiological studies – 7 million enrollees – 5 million patients – Example Milestones: • 1 Billionth Vital Sign loaded in April „06 • 30-minutes to population-wide obesity analysis (next slide) • Discovered seasonality in blood pressure -- NEJM fall „06
  • 94. HDR Vitals Based Body Mass Index Calculation on VHA FY04 Population Source: VHA Corporate Data Warehouse V H A P a tie n ts in B M I C a te g o rie s (B a s e d u p o n v ita ls fro m F Y 0 4 ) W t/H t 5 ft 0 in 5 ft 1 in 5 ft 2 in 5 ft 3 in 5 ft 4 in 5 ft 5 in 5 ft 6 in 5 ft 7 in 5 ft 8 in 5 ft 9 in 5 ft 1 0 in 5 ft 1 1 in 6 ft 0 in 6 ft 1 in 6 ft 2 in 6 ft 3 in 6 ft 4 in 6 ft 5 in L eg en d 100 230 211 334 276 316 364 346 300 244 172 114 73 58 16 11 3 1 1 B M I < 1 8 U n d e rw e ig h t 105 339 364 518 532 558 561 584 515 436 284 226 144 102 25 13 4 4 1 B M I 1 8 -2 4 .9 H e a lth y W e ig h t 110 488 489 836 815 955 972 1 ,0 3 1 899 680 521 395 256 161 70 23 10 6 4 B M I 2 5 -2 9 .9 O ve rw e ig h t 115 526 614 1 ,0 1 8 1 ,0 9 8 1 ,3 2 6 1 ,3 2 5 1 ,6 0 7 1 ,4 2 6 1 ,1 7 5 903 598 451 264 84 59 17 6 4 B M I 3 0 + O b ese 120 644 714 1 ,4 1 9 1 ,5 8 3 1 ,9 6 4 2 ,1 5 3 2 ,6 1 2 2 ,3 7 4 1 ,9 3 3 1 ,4 5 0 1 ,0 8 5 690 501 153 95 38 13 9 125 672 855 1 ,6 8 2 1 ,9 3 3 2 ,6 2 8 3 ,0 0 5 3 ,5 2 1 3 ,4 0 5 2 ,9 2 9 2 ,1 9 7 1 ,5 3 8 1 ,1 4 4 756 253 114 46 32 8 130 753 944 1 ,9 8 4 2 ,3 9 2 3 ,4 6 2 3 ,9 6 8 5 ,0 3 9 4 ,8 2 7 4 ,2 8 5 3 ,2 2 3 2 ,3 7 8 1 ,7 6 5 1 ,1 8 2 429 214 81 41 12 135 753 1 ,0 6 2 2 ,1 7 3 2 ,8 5 2 4 ,1 0 5 4 ,9 1 2 6 ,5 3 5 6 ,5 3 5 5 ,7 9 7 4 ,5 0 0 3 ,3 9 3 2 ,4 6 7 1 ,6 6 8 596 309 108 70 15 140 754 1 ,0 7 3 2 ,3 0 0 3 ,1 7 7 4 ,9 3 7 6 ,2 8 6 8 ,7 6 9 8 ,7 5 0 7 ,9 3 9 6 ,3 0 3 4 ,8 3 7 3 ,4 9 3 2 ,5 3 4 977 513 144 106 22 Total Patients 23,876 (0.7%) 145 748 1 ,0 5 3 2 ,2 5 4 3 ,3 8 9 5 ,4 1 2 7 ,3 3 4 1 0 ,4 8 5 1 1 ,0 0 4 1 0 ,5 7 6 8 ,0 8 4 6 ,5 1 1 4 ,6 8 6 3 ,3 4 4 1 ,2 0 7 680 221 140 41 150 730 1 ,0 7 7 2 ,3 6 1 3 ,5 9 6 6 ,1 5 2 8 ,6 6 5 1 2 ,7 7 2 1 4 ,3 3 5 1 3 ,8 6 6 1 1 ,2 5 5 9 ,2 5 0 6 ,5 4 5 4 ,7 9 6 1 ,7 9 2 979 350 162 48 155 683 923 2 ,1 7 8 3 ,3 9 1 6 ,0 3 1 8 ,8 9 1 1 4 ,1 8 1 1 5 ,8 9 9 1 6 ,5 9 4 1 3 ,5 1 7 1 1 ,4 8 9 8 ,0 5 6 5 ,7 4 1 2 ,1 5 5 1 ,2 0 3 472 249 70 160 671 872 2 ,1 0 6 3 ,5 3 2 6 ,1 8 4 9 ,5 8 0 1 5 ,4 9 3 1 8 ,8 6 9 1 9 ,9 3 9 1 7 ,0 4 6 1 4 ,6 5 0 1 0 ,3 6 6 7 ,7 0 8 2 ,8 3 1 1 ,6 1 8 615 341 100 165 627 772 1 ,8 9 4 3 ,0 7 4 5 ,7 7 3 9 ,5 4 9 1 6 ,3 3 2 2 0 ,0 8 0 2 2 ,5 0 7 1 9 ,6 9 2 1 7 ,7 2 9 1 2 ,5 8 8 9 ,5 5 8 3 ,5 4 8 2 ,0 3 2 716 399 117 170 596 750 1 ,7 1 6 2 ,9 0 0 5 ,4 2 8 9 ,0 8 0 1 6 ,6 3 3 2 1 ,5 5 0 2 5 ,0 5 1 2 2 ,5 6 8 2 1 ,1 9 8 1 5 ,5 5 2 1 2 ,0 9 3 4 ,5 4 8 2 ,6 2 6 944 489 124 175 493 674 1 ,5 2 1 2 ,5 5 1 4 ,8 1 6 8 ,4 1 7 1 5 ,9 0 0 2 1 ,4 2 0 2 6 ,2 6 2 2 4 ,2 7 7 2 3 ,7 5 6 1 8 ,1 9 4 1 3 ,8 1 7 5 ,3 6 1 3 ,1 7 8 1 ,1 5 2 586 144 180 486 599 1 ,4 1 1 2 ,3 2 3 4 ,5 8 4 7 ,8 5 5 1 5 ,4 8 2 2 0 ,8 7 3 2 6 ,9 2 2 2 6 ,0 6 7 2 6 ,3 1 3 2 0 ,3 5 8 1 6 ,4 5 9 6 ,4 5 1 3 ,8 4 8 1 ,4 4 1 737 207 185 420 546 1 ,1 9 5 1 ,9 8 5 3 ,9 0 5 6 ,9 1 8 1 3 ,4 0 6 1 9 ,3 6 2 2 5 ,8 1 8 2 5 ,6 2 0 2 7 ,0 3 7 2 1 ,7 9 9 1 8 ,1 7 2 7 ,2 0 6 4 ,4 5 8 1 ,5 4 8 867 247 190 424 495 1 ,0 7 3 1 ,7 2 9 3 ,3 8 3 5 ,9 0 9 1 1 ,9 1 8 1 7 ,6 4 0 2 4 ,2 7 7 2 5 ,2 6 3 2 7 ,3 9 8 2 2 ,6 9 7 1 9 ,9 7 7 8 ,3 4 4 4 ,9 3 7 1 ,8 5 8 963 287 195 341 463 913 1 ,4 7 4 2 ,8 0 3 5 ,2 0 7 1 0 ,5 8 4 1 5 ,7 2 7 2 2 ,1 3 7 2 3 ,8 6 0 2 6 ,3 7 3 2 2 ,5 1 3 2 0 ,1 6 3 8 ,7 5 4 5 ,6 8 3 2 ,1 7 8 1 ,1 2 0 309 200 315 384 763 1 ,3 3 8 2 ,6 0 2 4 ,5 5 1 9 ,4 1 3 1 4 ,1 4 9 2 0 ,6 0 8 2 2 ,5 4 1 2 5 ,4 5 2 2 3 ,3 5 8 2 1 ,5 4 8 9 ,2 8 4 6 ,2 2 1 2 ,2 9 4 1 ,2 9 5 372 205 265 338 633 1 ,0 2 6 1 ,9 9 3 3 ,7 3 6 7 ,7 6 5 1 1 ,9 4 0 1 7 ,5 0 1 1 9 ,9 4 4 2 3 ,0 6 5 2 1 ,0 9 4 2 0 ,3 5 4 9 ,2 7 0 6 ,3 5 0 2 ,5 9 7 1 ,3 2 2 376 210 275 284 543 853 1 ,7 9 4 3 ,1 4 8 6 ,8 0 4 1 0 ,5 4 0 1 5 ,6 4 7 1 8 ,1 2 9 2 1 ,8 6 2 2 0 ,5 4 0 2 0 ,2 7 1 9 ,5 6 6 6 ,8 1 6 2 ,7 8 6 1 ,5 0 9 418 215 205 244 501 746 1 ,3 8 9 2 ,6 4 5 5 ,7 4 7 8 ,7 1 2 1 3 ,0 6 4 1 5 ,5 6 0 1 9 ,0 8 9 1 8 ,1 9 1 1 9 ,0 6 3 9 ,0 1 9 6 ,6 7 5 2 ,7 9 8 1 ,5 0 9 454 220 168 208 415 652 1 ,2 3 1 2 ,3 2 6 4 ,9 5 0 7 ,7 5 1 1 1 ,6 4 5 1 3 ,9 0 0 1 7 ,5 7 7 1 7 ,2 3 9 1 7 ,5 8 3 8 ,8 9 6 6 ,8 1 8 2 ,9 4 8 1 ,6 3 5 484 701,089 (21.6%) 225 156 160 325 522 968 1 ,8 7 3 4 ,0 1 5 6 ,3 4 0 9 ,7 9 4 1 1 ,8 9 0 1 4 ,8 9 8 1 5 ,0 9 7 1 5 ,7 4 1 8 ,3 3 2 6 ,4 4 1 2 ,9 1 5 1 ,6 4 7 452 230 141 160 259 486 880 1 ,6 5 3 3 ,3 3 4 5 ,4 1 0 8 ,6 5 7 1 0 ,5 0 0 1 3 ,5 3 2 1 3 ,4 8 8 1 4 ,8 1 5 7 ,9 0 1 6 ,2 5 8 2 ,8 5 9 1 ,7 0 1 496 235 115 119 244 373 738 1 ,2 5 1 2 ,7 9 5 4 ,5 7 0 7 ,1 9 2 8 ,7 8 4 1 1 ,4 8 9 1 1 ,8 5 7 1 2 ,7 9 6 7 ,1 1 3 5 ,5 4 4 2 ,7 4 4 1 ,6 1 7 465 240 72 116 214 313 562 1 ,0 9 9 2 ,4 2 2 3 ,8 6 1 6 ,0 4 4 7 ,6 5 2 9 ,9 8 2 1 0 ,6 9 2 1 1 ,8 2 5 6 ,4 9 6 5 ,3 9 2 2 ,6 0 6 1 ,5 8 1 449 245 71 76 169 253 509 888 1 ,8 5 8 3 ,1 6 7 5 ,0 7 6 6 ,4 4 6 8 ,3 1 2 8 ,6 4 7 9 ,9 1 0 5 ,6 3 8 4 ,7 4 2 2 ,2 6 3 1 ,4 7 9 469 250 70 55 152 226 452 753 1 ,6 4 7 2 ,8 2 6 4 ,5 0 5 5 ,5 0 9 7 ,5 6 9 8 ,0 6 4 8 ,9 0 0 5 ,1 8 3 4 ,3 1 9 2 ,1 7 7 1 ,4 5 1 469 255 59 61 128 174 316 599 1 ,2 8 9 2 ,1 3 0 3 ,4 6 8 4 ,5 4 0 5 ,9 5 7 6 ,4 5 1 7 ,4 3 8 4 ,3 2 0 3 ,7 4 1 1 ,9 0 3 1 ,2 7 1 443 260 50 64 117 167 281 493 1 ,1 0 7 1 ,9 2 9 2 ,9 6 3 3 ,9 4 7 5 ,1 9 0 5 ,7 9 7 6 ,7 2 5 3 ,9 0 0 3 ,4 2 9 1 ,8 2 8 1 ,2 1 8 481 265 270 37 47 34 42 88 67 122 119 234 203 454 367 894 800 1 ,4 4 9 1 ,2 9 1 2 ,4 5 7 2 ,1 1 0 3 ,1 5 2 2 ,7 4 0 4 ,3 7 4 3 ,8 7 8 4 ,8 1 8 4 ,1 3 3 5 ,7 2 9 5 ,0 7 5 3 ,3 5 0 2 ,9 3 4 2 ,9 8 4 2 ,6 8 5 1 ,5 3 9 1 ,4 6 8 1 ,0 2 8 918 406 403 1,177,093 (36.2%) 275 22 34 44 85 184 291 662 1 ,0 6 4 1 ,7 6 7 2 ,2 3 5 3 ,1 1 3 3 ,4 1 2 4 ,2 6 7 2 ,5 9 8 2 ,3 6 2 1 ,2 4 7 837 334 280 21 20 51 69 139 286 548 903 1 ,5 1 3 1 ,9 5 5 2 ,7 7 0 3 ,1 2 6 3 ,6 0 4 2 ,2 7 3 2 ,0 2 0 1 ,1 5 2 763 300 285 12 12 36 68 118 201 451 720 1 ,3 1 8 1 ,6 1 3 2 ,2 0 8 2 ,3 9 4 3 ,1 3 2 1 ,9 2 4 1 ,7 8 0 994 677 241 290 16 14 47 38 92 182 387 667 1 ,0 5 0 1 ,3 0 1 1 ,9 0 4 2 ,1 5 0 2 ,6 5 5 1 ,7 4 9 1 ,5 2 9 881 688 252 295 9 12 22 53 92 127 341 493 838 1 ,1 6 2 1 ,5 7 7 1 ,8 2 3 2 ,3 3 8 1 ,4 4 5 1 ,3 3 3 813 533 202 300 12 10 30 43 59 117 309 434 764 988 1 ,4 2 8 1 ,5 8 8 1 ,9 8 9 1 ,2 5 5 1 ,2 1 2 709 479 205 DRAFT
  • 95. Jim Gray’s work on Fourth Paradigm and eScience has had a profound impact on the scientific community. This work continues … 95
  • 96. Jim Gray eScience Award Each year, Microsoft Research presents the Jim Gray eScience Award to a researcher who has made an outstanding contribution to the field of data-intensive computing. The award recognizes innovators whose work truly makes science easier for scientists. 96
  • 97. 97
  • 98. Jim Gray’s Legacy ● The Prolific Writer – Jim Gray‘s two rules for authorship: • The person who types puts their name first, and • It‘s easier to add a name to the list of authors Ideas than deal with someone‘s hurt feelings. ● The Masterful Presenter ● The Sense of Community ● The Patient Listener Community People 98
  • 99. Jim’s Life was a Text Book on Mentoring ● Making time ● Simply Listening ● Promoting the Young ● Inspiring Self-Confidence ● Sharing Knowledge Selflessly ● Lighting the Way ● Displaying Professional ● Nurturing and Pushing Integrity ● Following the Muse ● Advocating for the Field ● Connecting Good People and ● Keeping things in Perspective Good Ideas Without ● Being a friend Boundaries 99
  • 100. 100
  • 101. Lost at Sea …. January 28, 2007 101
  • 102. The Search for Jim Gray 102
  • 103. The University of California, Berkeley and Gray's family hosted a tribute to him on May 31, 2008. http://www.youtube.com/user/UCBerkeleyE vents/videos?query=jim+gray 103
  • 104. 104
  • 105. Good references ● Microsoft Faculty Summit 2011 – http://research.microsoft.com/en-us/events/fs2011/ – Tony Hey‘s presentations at the event – http://research.microsoft.com/en- us/events/fs2011/welcome_introduction_hey_faculitysummit_071811.pdf ● The Fourth Paradigm book – http://research.microsoft.com/en- us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf ● Jim Gray‘s work – http://research.microsoft.com/en-us/um/people/gray/ ● Alex Szalay‘s work on Large Databases and Science – http://www.sdss.jhu.edu/~szalay/servers.html 105

Notes de l'éditeur

  1. Jim Gray refined the notion of a Database Transaction.He explained that application initiated data manipulationactions can be classified as “unprotected”, “protected”,and “real” actions [Gray 1981b]. Unprotected actionsinvolve transient and internal state, such as temporaryfiles. Protected actions, on the other hand, are groupedinto transactions and are reflected in the state of thetransaction outcome. The outcome of a transaction mustbe to either commit the effects of its protected actions tothe system state, or to abort and remove the protectedactions’ effects from the system state. This means thatprotected actions must be undone on transaction failure orabort and their effects must be ensured in the case oftransaction commit. Real actions involve sensors,actuators, and messages outside the DBMS. While realactions cannot be “undone”, they can be compensated.For example, if the missile is fired, the compensationcould be “debit quantity on hand and send apologies”.In order to achieve durable transaction atomicity (all ornothing for protected actions) in the presence ofprocessor, memory, storage, communication, orenvironmental failures, multiple copies of the stored datamust be maintained and a record of the protected actionsequence is needed to complete or undo transactionsinterrupted by system failures. To achieve durabletransaction atomicity, the transition to the “committed”state must be accomplished by a single write to nonvolatilestorage. To these ends Jim Gray defined the WriteAhead Log (WAL) protocol [Gray 1978, Gray 1981a]while at IBM Research. The WAL protocol records theold and new states induced by protected actions separatelyfrom the actual state changes. The logged changes arewritten to stable storage before the actual changes arewritten back to stable storage (that’s the “Write Ahead”part). Transactions are committed by simply appendingand writing a ‘commit’ record to the recovery log. Loggedchanges are used to undo protected actions of abortedtransactions and of transactions in progress at the time ofa system failure. Log records are also used to redocommitted actions whose actual changes have not beenwritten back to stable storage at the time of a systemfailure. The WAL protocol allows changed data to bewritten to their stable storage home at any time after thelog records describing the changes have been written intothe stable log. This gives the Database Manager greatflexibility in managing the contents of its volatile databuffer pools.The recovery techniques developed by Jim Grayand the System R team have been instrumental to thedeployment of on-line transaction processing applications.With the ability to recover from equipment andenvironmental failures, without loss of committed,protected actions, along with atomic (all-or-nothing)transaction completion, on-line business criticalapplications become reliable enough to replace batch andpaper-based transaction processing. The impact of Dr.Gray’s recovery technologies for transaction reliabilitycannot be overstated – without adequate reliability anddurability for transactional applications, the transition toon-line transaction processing would not have beenpossible.