SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
System Support for OpenGL Direct Rendering
                              Mark J. Kilgard David Blythe Deanna Hohn
                                           Silicon Graphics, Inc.
                                             2011 N. Shoreline
                                      Mountain View, CA 94043-1389
                          email: mjk@sgi.com blythe@sgi.com hohn@sgi.com




                        Abstract                             tion (as opposed to running over the network). Direct
                                                             rendering is not available if the OpenGL process is con-
   OpenGL’s window system support for the X Window           nected to a remote X server.
System explicitly allows implementations to support di-
rect rendering of OpenGL commands to the graphics               For interactive 3D applications, the maximum possi-
hardware. Rendering directly to the hardware avoids the      ble rendering performance is critical to the success of the
overhead of packing and relaying protocol requests to the    application. When available, direct rendering has a sub-
X server inherent in indirect rendering.                     stantial performance advantage over rendering indirectly
   The OpenGL implementation available for Silicon           via the X server, i.e., indirect rendering. Instead of using
Graphics workstations supports direct rendering using        the X server as a proxy for rendering, rendering com-
virtualizable graphics hardware in conjunction with the      mands are sent directly to the graphics hardware. Direct
kernel and the X server. The techniques described provide    rendering can be thought of as a means of “cutting out
“maximum performance” rendering for OpenGL. Some             the middle man.” Indirect rendering is still useful (and
of the issues are specific to OpenGL, but most of the         required by GLX) because it allows the same network
techniques described are appropriate for the implemen-       extensibility and inter-operability of traditional X clients.
tation of any high-performance direct rendering graphics        This paper discusses how Silicon Graphics, Inc. (SGI)
interface.                                                   implements direct rendering in its OpenGL implemen-
                                                             tation through a combination of hardware features, op-
Keywords: OpenGL, Virtual Graphics, Direct Rendering.
                                                             erating system support, X server support, and X and
                                                             OpenGL library support. SGI implements the described
1 Introduction
                                                                                         




                                                             facilities in IRIX 5.3. The next section discusses the
                                                             goal of direct rendering, SGI’s approach for supporting
   The OpenGL graphics system [14, 11] is a window sys-      virtualized direct rendering, and support for direct ren-
tem independent software interface to graphics hardware      dering by OpenGL’s predecessor and other direct ren-
for 3D rendering. GLX [8] is the OpenGL extension to         dering graphics systems. Section 3 describes OpenGL’s
the X Window System that specifies how OpenGL inte-           implementation model and the requirements implied for
grates with X. The GLX specification explicitly allows        implementing direct rendering. Section 4 presents how
(but does not require) implementations to support direct     SGI virtualizes access to the graphics hardware to sup-
rendering of OpenGL commands to the graphics hard-           port direct rendering. Section 5 addresses other issues
ware. Direct rendering allows OpenGL commands to             not strictly related to virtualized direct access rendering
bypass the normal X protocol encoding, transport, and        but still important for supporting direct rendering.
X server dispatch. Through sufficient hardware and sys-
tem software support, OpenGL rendering can achieve the
maximum rendering performance from the hardware.
   Direct rendering naturally implies that the direct ren-      ¡




                                                                  IRIX is the SGI version of the Unix operating system. Most of the
dering process is running on the local graphics worksta-     facilities described were originally developed for IRIX 5.2.
2 Background                                                              example (whereby OpenGL sends a 3D vertex to the hard-
                                                                          ware; a very common OpenGL operation) is handled in
   Support for direct rendering was purposefully designed                 this way. Even with streaming, the indirect case incurs
into OpenGL’s GLX specification. While use of the GLX                      the overhead of encoding, transport, and decoding GLX
extension protocol permits interoperability and network-                  protocol for requests and replies.
extensibility of OpenGL rendering, forcing the X server                      The direct case can eliminate the overhead associated
as an intermediary for OpenGL rendering imposes inher-                    with context switching between the OpenGL program and
ent limitations on OpenGL rendering performance. Per-                     the X server, protocol encoding, transport, and decoding
mitting direct rendering avoids the unacceptable situation                when performing OpenGL rendering. Direct rendering
where expensive, high-performance graphics hardware                       also improves cache and TLB behavior by avoiding fre-
subsystems designed to support OpenGL [1, 6] have their                   quent context switches and multiple active contexts [3].
graphics performance potential starved by the overhead                       The examples in Table 1 assume the “fast path” of
incurred by indirect rendering.                                           SGI’s OpenGL implementation is taken. Being on the
   Using OpenGL display lists (non-editable sequences of                  “fast path” means the system resources for direct render-
OpenGL commands that can be downloaded into the X                         ing (discussed in detail later) are already made available.
server and later executed) can ease the burden of indirect                Unless resources are in contention or resources are being
rendering since it minimizes the GLX protocol needed                      used for the first time, the “fast path” is the norm.
for rendering. However, use of display lists is often inap-
propriate for many applications, particularly applications                2.2 “Maximum Performance” 3D Rendering
with very dynamic scenes. Such applications favor us-
ing immediate mode rendering that requires much higher                       SGI’s OpenGL implementation seeks to achieve “max-
bandwidth for the OpenGL command stream. Measure-                         imum performance” OpenGL rendering. For our pur-
ments of the IRIX 5.2 OpenGL implementation show                          poses, “maximum performance” means that when there
indirect immediate mode rendering has inferior perfor-        
                                                                          is no contention for rendering resources, and once utilized
mance to direct rendered immediate mode [10]. Even                        resources are made available, graphics rendering perfor-
programs heavily reliant on display lists are slower when                 mance is limited only by the system’s raw graphics per-
rendering indirectly.                                                     formance and the graphics software efficiency. The max-
                                                                          imum performance potential of the workstation should be
                                                                          achievable.
2.1    Direct Rendering Benefits
                                                                             In practice, this means no locks need to be acquired
   Table 1 breaks down the overhead of indirect rendering                 and released when rendering. Even so, multiple OpenGL
relative to direct rendering for the extreme and common                   programs should be able to run concurrently. But the
cases. The extreme example demonstrates the high-level                    overhead from concurrent use of graphics should only
steps involved in executing the OpenGL glReadPix-                         be introduced when multiple processes are concurrently
els command used for reading pixels from a window.                        using the graphics hardware; performance of the sin-
The command in question requires data to be returned                      gle renderer case should not be compromised. Graphics
to the application. In the indirect case, this requires a                 programs should not be burdened with overhead from
context switch to the X server and back to the OpenGL                     window clipping; in particular, multi-pass rendering for
program. And the pixel data returned is copied three                      correct clipping should not be necessary. And the possi-
times, as opposed to a single copy in the direct render-                  bility of asynchronous window management operations
ing case. While glReadPixels requires a round-trip                        such as changing window clipping or changing the win-
to the X server to return the pixel data, most OpenGL                     dow origin should not add any overhead to the normal
commands return no data.                                                  case when clips and origins are not changing.
   For most OpenGL commands, the context switch over-
head can be amortized over multiple GLX requests by                       2.3 Virtual Graphics
streaming protocol requests. The table’s glVertex3f
                                                                             SGI workstations implement virtual graphics [17] to
                                                                          achieve the goal of “maximum performance” rendering.
   ¡




     The presented results showed immediate mode OpenGL graphics
performance using indirect rendering ranging from 34% to 68% of the       Virtual graphics means that every graphics process has
direct rendering performance depending on the model of SGI graphics
                                                                          the illusion of exclusive access to the graphics rendering
workstation. The faster the graphics hardware, the higher the relative
penalty for using indirect rendering versus direct rendering due to the   engine. Many systems allow direct access to the graph-
higher relative overhead of indirect rendering.                           ics hardware. Virtual graphics not only allows direct
Indirect glReadPixels              Direct glReadPixels         Indirect glVertex3f               Direct glVertex3f
          1.   glReadPixels(...)             1.   glReadPixels(...)      1.   glVertex3f(...)              1.   glVertex3f(...)
          2.   pack GLX protocol request                                 2.   pack GLX protocol request    2.   send vertex to HW
          3.   write request to kernel                                   3.   return                       3.   return
          4.   block client waiting
               to read reply                                                  additional non-reply
          5.   deschedule OpenGL client                                       OpenGL requests can be
          6.   schedule X server                                              batched in protocol buffer
          7.   read request from kernel
          8.   request decoded and                                       4.   eventually, flush
               dispatched to GLX handler                                      protocol buffer
          9.   read pixels from              2.   read pixels from       5.   deschedule OpenGL client
               screen to a buffer                                        6.   schedule X server
         10.   write reply to kernel                                     7.   read request from kernel
         11.   deschedule X server                                       8.   request decoded and
         12.   schedule OpenGL client                                         dispatch to GLX handler
         13.   decode reply header                                       9.   send vertex to HW
         14.   copy reply data from kernel
               to final buffer
         15.   return                        3.   return

Table 1: A comparison of the “fast path” steps involved implementing glReadPixels for the indirect and direct fast
path cases. The extra steps involve data copying; protocol packing, unpacking, and dispatching; and context switch
overhead.


access, but treats graphics as a virtual system resource.               gle display mode. The displayed buffer for the window’s
This virtual view of graphics allows the system to con-                 display mode can be instantaneously changed. But the
tend with simultaneous direct access by multiple graphics               number of display modes is a limited hardware resource
processes. Virtual graphics also arbitrates the contention              so a virtual graphics system must be ready to virtualize
for graphics resources such as screen real estate.                      display modes.
   There are three classes of contention that virtualized                 Each of these classes of contention are dealt with by
graphics for a window system must arbitrate:                            SGI’s virtualized, direct access rendering for OpenGL.
   Concurrent access contention.           When different
graphics processes are using a single graphics engine,                  2.4 SGI-style Graphics Hardware
this hardware state must be context switched. With vir-
tual graphics, this context switching is transparent to the                Unlike most low-end workstation and PC graphics
graphics process, much the same way processor context                   hardware, SGI graphics hardware does not expose a mem-
switches are transparent to Unix processes.                             ory mapped frame buffer. Instead, graphics commands
                                                                        are issued to a graphics engine through the manipulation
   Screen real estate contention. Window systems ar-
                                                                        of memory mapped device registers. This interface to
bitrate how windows are arranged on the screen. Vir-
                                                                        the graphics hardware is often called the graphics pipe
tualized graphics must ensure that rendering is properly
                                                                        or simply the pipe. Because all rendering operations are
clipped to the drawable region of the rendering window.
                                                                        done through the graphics pipe, the pipe’s virtual mem-
Clipping should be correct even in the face of asyn-
                                                                        ory mapping can be used to control access to graphics
chronous window management operation by the X server.
                                                                        rendering, i.e., virtualize graphics.
Because window systems like X allow arbitrary overlap-
                                                                           There is substantial variation in the extent of geome-
ping of windows, clipping to arbitrary regions must be
                                                                        try and rasterization processing implemented within vari-
possible.
                                                                        ous SGI graphics hardware configurations. The high-end
   Non-visible resource contention. Modern graph-                       SGI graphics hardware [1] implements almost the en-
ics hardware supports features like double buffering, in                tire OpenGL state machine within the graphics hardware.
which a front buffer is displayed while the next anima-                 Most pipe commands (called tokens) sent to the graphics
tion frame is generated in a non-visible back buffer. When              hardware have a fairly direct mapping to the OpenGL
the frame is complete, a buffer swap effectively copies the             API. The high-end hardware is micro-coded and makes
back buffer contents into the front visible buffer. Graphics            heavy use of pipelining and parallelism in its various
hardware can efficiently perform buffer swaps by tagging                 stages.
all the pixels belonging to a swapping window with a sin-                  The low-end SGI graphics hardware [15] implements
only the back-end of the OpenGL state machine. Most          case, the hardware can support a set of clip rectangles to
high-level operations such as vertex transformation, poly-   quickly clip rendered pixels not belonging to the window
gon rasterization, texturing, and lighting are performed     [12]. The number of clip rectangles varies with graph-
on the host processor inside the OpenGL library. But         ics hardware but is typically less than eight. These clip
low-level operations such as line drawing, shading, span     rectangles are generally part of the graphics hardware
rendering, dithering, etc. are implemented in the hard-      context. The context for each direct renderer can use the
ware rendering engine.                                       same rectangular clipping hardware.
   Despite the variations in hardware across SGI’s prod-        Arbitrary window clips will require far more clip rect-
uct line, a direct rendering OpenGL program will work        angles than is reasonable to support in hardware. A sec-
(though perhaps not find the same frame buffer capabil-       ond method of clipping is used for such windows. SGI
ities or achieve the same performance) across the entire     hardware also supports clipping planes. Clipping planes
range of SGI OpenGL-capable graphics hardware. All           are non-visible frame buffer planes used to encode per-
the rendering code for OpenGL that directly accesses the     pixel clipping IDs (often called CIDs). If the pixels of a
hardware is isolated in the OpenGL library which is im-      window all have the same CID (and only pixels belonging
plemented as a shared library. The shared library on the     to the window have that CID), the graphics hardware can
system depends on the graphics hardware installed on the     clip to an arbitrary window by enabling CID testing. A
workstation, hiding all device dependencies when direct      pixel rendered into the window is drawn only if the CID
rendering.                                                   of the pixel being modified matches the CID that is set in
                                                             the graphic hardware context.
2.4.1   Virtualized, Direct Rendering Needs                     The number of planes set aside for maintaining CIDs
                                                             can be rather small. On low-end SGI hardware, the clip-
   While the degree to which the OpenGL state machine        ping planes are only 2 bits deep. This means three CIDs
is supported in hardware varies across the product line,     are possible (2 bit planes provide 4 CID values but one
virtualized direct access rendering requires functionality   CID value must be used for screen real estate not belong-
across all SGI graphics hardware configurations. The          ing to the CID assigned windows). As will be explained
requirements are:                                            later, CIDs may need to be virtualized because they are a
   A context-switchable graphics engine. The state of        limited resource.
the graphics hardware must be context-switchable and the
context switch must be able to be performed preemptively,    2.4.3   Display Mode Support
including in the middle of a command.
                                                                Display mode IDs (often called DIDs) are like CIDs,
   Window relative rendering. A process using virtual
                                                             but instead of providing clipping information, DIDs de-
graphics renders using window relative coordinates, so
                                                             termine on a per-pixel basis how each pixel value on the
that the window location need not be tracked if the win-
                                                             screen should be displayed. The DID for a pixel is looked
dow is moved.
                                                             up in the hardware display mode table to determine how
   Arbitrary window clipping. As mentioned earlier, a        the pixel should be displayed. The DID mode indicates
window system supporting arbitrarily overlapping win-        whether the pixel is RGB or color index (i.e., the color is
dows, can result in arbitrary window clipping regions, so    determined by a colormap); if color index, what hardware
the hardware must support arbitrary clipping. And the        colormap to use if there is more than one; the depth of the
window’s clip and location can change asynchronously         pixel (how many bits of the pixel value are significant);
to the direct rendering process so the window clip must      if the pixel is double buffered, and if so, what buffer (A
be able to change without the renderer’s knowledge.          or B) should be displayed. Figure 1 gives an example of
  Double buffering. Support must exist for per-window        how a DID determines how a pixel should be displayed.
double buffering. Also, OpenGL’s front/back relative            Logically, DIDs are stored in a set of non-visible frame
naming of buffers must be supported. A direct renderer       buffer planes. Usually 16 or 32 DIDs are available. In
should be unaware of the absolute hardware buffer it is      low-end hardware, devoting 4 to 5 bitplanes per pixel to
rendering to.                                                store the DID is too expensive. In this case, the DID val-
                                                             ues are run-length encoded. This encoding is convenient
                                                             becauses the frame buffer is scanned out in horizontal
2.4.2   Window Clipping Hardware
                                                             lines. But logically, there is still a DID per pixel. In
  Most windows have rather simple clip regions, consist-     principle, a complex arrangement of display modes on
ing of a small number of rectangles. For this common         the screen might be too complex to represent with a run-
Pixel Composition                                          ware design trade off.
                                            Pixel value
                                            sent to
                        Buffer A            Color Controller   2.5 Previous IRIS GL Support
 Single
 Buffer
                                       Display mode table         The predecessor to OpenGL is SGI’s proprietary IRIS
                        Buffer B        0
                                        1                      GL. While OpenGL leaves window system operations to
                                        2                      the native window system (for example, the X Window
 Display                                                       System or Windows NT), IRIS GL provides its own win-
                                             DblBuf,CI,
   ID        5                          5                      dow management routines. OpenGL has a similar 3D
                                             display A buf
 (DID)                                                         rendering philosophy to IRIS GL, but OpenGL is a dis-
                                                               tinctly new interface. The OpenGL state machine is well
  Other                                                        defined and the OpenGL API has a cleaner design and
  pixel                                                        regular name space. One of the most important changes
  bits                                  29                     in OpenGL from IRIS GL is the clean separation of ren-
                                        30                     derer state from window state. In IRIS GL, the renderer
                                        31
                                                               and window state were coupled.
Figure 1: Example of display ID hardware for supporting           Locally running IRIS GL programs use virtualized,
double buffering. The pixel shown is assigned display ID       direct access rendering. In fact, most of the experi-
5 meaning the pixel should be treated as a double buffered,    ence in supporting virtualized, direct access rendering
color index pixel with buffer A being the displayed buffer.    for OpenGL was a result of experience with IRIS GL.
                                                               The current IRIS GL direct rendering support actually
                                                               uses the same support OpenGL uses.
length encoded table. In practice, run-length encoded
DID tables work extremely well.
   Note that many windows can all share the same DID if
                                                               2.6 Other Approaches
their pixels all use the same display mode. For example,
all non-double buffered 8-bit color index windows using
the same colormap can share the same DID.                         Direct rendering in the manner SGI describes in this
   When a process requests a buffer swap for a window (in      paper is not the only option for implementing direct ren-
OpenGL, glXSwapBuffers would be called), a dou-                dering. And direct rendering is not a necessity for pro-
ble buffer window must have an unshared or swappable           duction OpenGL implementations. The IBM OpenGL
DID. This is because the buffer swap is accomplished           implementation described in [7] does not utilize direct
by toggling the displayed buffer for the window. If the        rendering. Most currently available OpenGL implemen-
pixels for the window (and only the pixels for the win-        tations do not support direct rendering.
dow) are all in the same DID, the buffer swap happens             Previous to IBM’s support for the OpenGL standard,
cleanly. What this means is a double buffered window           IBM licensed IRIS GL from SGI for 3D hardware in the
that needs to swap must be placed on an unshared DID           original RS/6000. Their IRIS GL implementation uses
before the swap can happen. Like CIDs, the number of           virtualized, direct rendering much like SGI’s IRIS GL
DIDs is limited by hardware so DIDs may also need to           implementation [16, 5].
be virtualized.                                                   Hewlett-Packard provides direct rendering support for
   While DIDs and CIDs are discussed here as distinct          their Starbase Graphics Library [2] with an approach
entities, it is possible to combine display mode and clip-     that is different from SGI’s direct rendering mechanism.
ping information into a single set of bitplanes. This may      Hewlett-Packard’s approach acquires a fast lock to be held
be useful because often the clipping ID planes and dis-        during rendering to the graphics engine. This locking al-
play mode ID planes both have the same regions each            lows clipping to be coordinated in software via shared
assigned a CID and DID. Combining CIDs and DIDs can            memory window clip serial numbers and proprietary X
make better use of frame buffer memory. If 32 DIDs             requests to query the current clip of a window. While
were supported and 4 CIDs were supported, a combined           such a system avoids the complex hardware and operat-
scheme would support 128 combined DID/CID values!              ing system support involved in SGI’s virtualized, direct
But combining DIDs and CIDs means DID information              access rendering mechanism, it forces explicit, fine-grain
cannot be run-length encoded. This is a graphics hard-         locking to arbitrate access to the graphics hardware.
3 OpenGL Requirements                                          3.3 Per Window Double Buffering

   The OpenGL GLX specification provides the model                 OpenGL supports per-window double buffering using
used to integrate OpenGL with the X Window System.             the glXSwapBuffers call. A side effect of calling
Understanding the GLX model motivates how SGI im-              glXSwapBuffers on a window that the calling thread
plements virtualized, direct access rendering specifically      is currently bound to is that further rendering to the win-
for OpenGL and X.                                              dow will not execute until the buffer swap completes.
                                                               Double buffer hardware usually times the buffer swap to
                                                               occur during vertical retrace. The glXSwapBuffers
3.1   Context Model                                            may return before the buffer swap completes, but the
                                                               OpenGL implementation is then responsible for delaying
   An OpenGL rendering context (or GLXContext) is              any further OpenGL rendering to the window until the
logically an instance of an OpenGL state machine. When         buffer swap actually occurs.
a context is created using glXCreateContext, the
creator has the option of requesting a direct rendering con-
text. If the program is running locally and the OpenGL         4 Virtualizing SGI Graphics
implementation supports direct rendering, a direct render-
ing context will be created. Everything that can be done          When an OpenGL process creates a direct OpenGL
with a direct context can be done with an indirect context     rendering context, the process opens the graphics device.
(the reverse is not true) so requesting a direct context but   The process allocates an IRIX kernel resource known as
being returned an indirect context is acceptable.              a rendering node. A rendering node is a virtual graph-
   Once a context is created, that context can be bound        ics hardware context and permits the graphics pipe to be
or “made current” to a drawable (either a window or            mapped into or “attached to” the process’s address space
pixmap) supporting OpenGL rendering by calling glX-            so a process can directly access the graphics hardware.
MakeCurrent. Not only is the context bound to the              Every direct rendering OpenGL context has an associated
drawable, but also to the thread calling glXMakeCur-           rendering node. Note that rendering nodes are completely
rent. Once bound, any OpenGL calls issued by the               hidden from OpenGL programs. The allocation and use
thread are issued using the current context and affect the     of rendering nodes is purposefully not made available for
current drawable. Only one thread can be bound to a            use by applications. They exist only to support imple-
given context at a time; but multiple contexts (bound to       menting the OpenGL and IRIS GL APIs.
different threads) can be bound to a single drawable. Sub-        The SGI X server [9] also uses a rendering node to
sequent calls to glXMakeCurrent rebind the thread to           access the graphics pipe. But the X server’s rendering
the newly specified drawable and context.                       node is marked as being the board manager rendering
                                                               node. The board manager rendering node is allowed to
                                                               call a number of special board manager ioctls used
3.2   Sharing of Window State
                                                               for validating and invalidating resources associated with
   GLX explicitly allows the sharing of window state.          other rendering nodes. By acting as the board manager,
For example, all OpenGL renderers bound to a double            the X server must process messages sent by the kernel
buffered window share the same notion of front and back        indicating the needs of rendering nodes to have their vir-
buffer state. This means if one client calls glXSwap-          tual graphics resources validated. The X server receives
Buffers on a window bound to by other OpenGL ren-              the messages through a shared memory input queue (or
derers, the other renderers maintain the same view of          shmiq) also used by the kernel to efficiently pass input
which buffer is front and which is back.                       device events to the X server. The purpose of the kernel
   One requirement of GLX that proves difficult to meet is      messages and how the X server responds to them are dis-
the sharing and management of ancillary buffer contents        cussed shortly. The X server also uses its rendering node
for multiple renderers bound to the same window. An-           for standard X server rendering.
cillary buffers are non-visible buffers used by rendering
operations. Examples are stencil, depth, and accumula-         4.1 Making Current to a Window
tion buffers. Sharing ancillary buffers is straightforward
if they are supported in hardware, but sharing of buffers        Before a direct rendering process can begin referenc-
implemented via software is more difficult to correctly         ing the graphics pipe through the rendering node’s pipe
support.                                                       memory mapping, the rendering node must be bound to
a window. For OpenGL, this happens at glXMakeCur-             pipe access.
rent time via a graphics driver ioctl.                           Each rendering node has a clip resource which is either
   The kernel manages a cache of bound and recently           valid or invalid. When valid, the graphics kernel driver
bound windows. The cache, known as the pane cache,            has up-to-date window clipping parameters in the pane
allows OpenGL threads to quickly bind and rebind to           cache for the window that can be loaded into the rendering
windows. If the window being bound to is not found            node’s graphics hardware context (either a set of clip
in the pane cache, a currently unbound pane cache entry       rectangles or a CID value to use). When the clip resource
is selected and reused for the window, and a message is       is valid and there is no other reason for the pipe to be
sent to the X server notifying it that a direct renderer is   invalid, the kernel graphics page fault handler validates
interested in rendering to the specified X window ID. This     the graphics pipe pages after loading the correct clipping
message allows the X server to initialize data structures     information into the graphics hardware context.
for the window to keep track of direct rendering.                If the rendering node’s clip resource is invalid, the
   A new entry in the pane cache will have two important      faulting process is suspended and a clip validate mes-
resources marked invalid. The first resource is the clip re-   sage is sent to the X server. The X server receives the
source. When valid, this means the X server has properly      message through the shmiq and issues a board manager
informed the kernel of the proper window origin and clip      reserved “clip validate” ioctl to the graphics driver in-
rectangles or CID to be used for clipping rendering to the    forming the kernel of the correct clipping parameters for
window. The second resource indicates if the window is        the window. When the pane cache entry for the win-
assigned a swappable DID.                                     dow is updated with the new clip information, then any
                                                              processes waiting for the clip resource of the window to
4.2   Context Switching                                       be validated are awakened. The result is that a graphics
                                                              process can asynchronously have its clip updated without
   Access to the graphics pipe is mediated using virtual      any knowledge on the part of the process. An example of
memory techniques. The graphics pipe is physically            the clip validation process is described in Figure 2.
mapped to only one process at a time. Other processes            Clip resources are possibly invalidated by the X server
using the graphics pipe will have invalid virtual mem-        whenever the window tree is manipulated by X window
ory mappings for their pages corresponding to the pipe.       management operations. The X server tracks which win-
If a process without valid mappings accesses a pipe ad-       dows are being used for direct rendering, and if the clips
dress, a page fault is generated. The kernel graphics         of any of these windows change the X server uses a board
driver handles this page fault. If the graphics pipe pages    manager reserved “clip invalidate” ioctl to tell the ker-
can be physically mapped immediately (there are reasons       nel to invalidate the clip resource of any rendering nodes
access to the pipe might temporarily be denied that are       bound to the windows. The invalidation of a clip happens
discussed later), the kernel will save the graphics context   asynchronously to the direct rendering program and can
of the current process and mark that process’s graphics       happen at any time the X server needs it to. The next
pipe pages invalid. Then the kernel restores the graphics     time a direct rendering program attempts to render to the
hardware context for the faulting process and validates       window, the clip validation process ensures a new, correct
that process’s memory mapping for the pipe.                   clip is loaded into the direct renderer’s graphics hardware
   The context switching sequence is transparent to the       context.
processes involved. If a single process is using the pipe,       The other reason the clip resource of a window might
the pipe does not need to be context switched. Graphics       be invalidated is if too many windows that need CIDs as-
pipe context switching only occurs when there is con-         signed to them are being directly rendered to. Remember
tention for the pipe. Preemptive scheduling by the kernel     there are a limited number of CIDs in the hardware. The
ensures graphics processes get a fair allocation of time to   X server can virtualize clips by invalidating the clip of
access the graphics pipe. On multi-processor machines,        one window assigned a CID, reusing the CID by repaint-
the scheduler needs special modifications to prevent two       ing another window with the newly freed CID, and then
simultaneously scheduled graphics processes from con-         validating the clip of a window and assigning it the newly
tinually stealing the graphics pipe from each other.          reassigned CID. The X server makes no attempt to handle
                                                              thrashing or starvation due to repeated CID invalidations
4.3   Clip Validation                                         and validations, but in practice, because most windows
                                                              use clip rectangles, CID thrashing is not a problem.
  As hinted above, the graphics pipe mapping is not              Notice that clip validation happens lazily. Not until
always immediately validated when a process faults on a       a direct renderer actually touches the pipe does a clip
GL                   IRIX                X
                                                          A) GL program with invalid clip resource faults when accessing
  program              kernel              server             graphics registers.
                                                          B) Kernel trap handler determines fault caused by invalid clip for the
                             B                                rendering node.
  A                                 C
                                                          C) Message put in shmiq telling X server to validate the rendering
                                                              node’s clip.
                                                          D) X server generates a clip list for the rendering node’s window.
                                                    D     E) X server performs ioctl to inform kernel of new valid clip list.
                                                          F) Kernel updates the rendering node to reflect its new clip, validates
                                                             the node’s clip resource, maps in the graphics registers, and
  G                                   E                      restarts the program where it stopped.
                             F                            G) GL program continues running with no knowledge of the
                                                             interruption.


                                 Figure 2: RRM clip validation assisted by the X server.


validation begin. Often when clips on the screen change,         this, the graphics driver invalidates the “allow rendering”
they change repeatedly (a window manager using opaque            resource and invalidates the virtual memory mapping to
move is a good example of this). So it makes sense to            the graphics pipe. Any further access to the pipe by the
validate clips only on demand.                                   rendering node will stall the process until the buffer swap
   The pane cache minimizes clip validation costs when           completes. When the buffer swap completes, the “allow
an OpenGL process repeatedly binds and rebinds to dif-           rendering” resource will be revalidated so rendering can
ferent windows (an operation expected to be common for           continue.
OpenGL programs). A rendering node can be bound to a                Because which buffer is front and which is back is win-
previously bound window, and if the window is still in the       dow state shared by all OpenGL direct renderers bound to
pane cache and the clip is still valid, immediately have a       the window, the kernel will also update the absolute sense
valid clip resource.                                             of what buffer is front and back for any other rendering
                                                                 nodes bound to the window being swapped. The graphics
4.4   Fast Buffer Swaps                                          hardware provides a means to switch what buffer is the
                                                                 front and back buffer without the knowledge of the direct
   A continuously animated application swaps buffers fre-        renderer.
quently enough that the operation should be optimized.              The advantage of not immediately stalling the process
As explained earlier, SGI implements a buffer swap by            until the buffer swap completes is that most animation
toggling the display mode table entry for the unshared           applications have a certain amount of computation to do
DID assigned to the window during vertical retrace.              before the next image (typically called a frame) can be
   If the glXSwapBuffers command is called on the                rendered. By not immediately stalling the process, this
currently bound window, the buffer swap is considered to         computation can be overlapped with waiting for the buffer
be in the OpenGL command stream. This allows a direct            to swap.
renderer to provide a buffer swap without contacting the
X server. The graphics driver provides a “buffer swap”           4.5 Window Display Mode Validation
ioctl which can be issued by direct renderers. The
result is to schedule a buffer swap at the next vertical            In the discussion of buffer swapping so far, it was
retrace for the thread’s currently bound window. If the          assumed that the window to be swapped was indeed on
glXSwapBuffers command is not for the currently                  an unshared DID. Since DIDs are a limited hardware
bound window, the OpenGL library generates a GLX                 resource, this may not necessarily be true. In this case,
protocol request to swap the buffers and lets the X server       DIDs must be virtualized.
perform the buffer swap.                                            Similar to a rendering node’s clip resource, rendering
   The ioctl returns immediately. This is good because           nodes also have a “swappable window” resource. The
waiting for the vertical retrace could cause a delay equal       resource is valid if the X server has placed the window
to the vertical retrace interval (typically at the rate of 60    on an unshared DID. It is invalid if the window’s DID
times per second). But further drawing to the window             is shared by other windows or the X server has revoked
must be held off until the buffer swap completes. To do          the ability to swap (this need is made clear when mixing
 ¡ ¡ ¢ ¡ ¡ ¡
          ¡¡¢¡¡¡  
         ¡ ¡ ¢ ¡ ¡¡                           ¡
                                              ¡  
       £ ¡ ¢ ¡£¡£¢ ¡¤¢¤¡ ¡ ¤¤   £
               ¢ ¡ ¢¡¢ ¡  
                         ¡££  ¤¡   £        £¡£       DID 0 8−bit ColorIndex              port these buffers in software by allocating host memory.
      1
       £¡ ¤£¡ ¤£ £¡£¡ £
           2
       ¡¡¢¡¢¡ 
                   ¤¢¤¡£¡£¡¡
                    3
               ¢¡¢¡¢¡ 
              £¢£  ¤    £¡  ¤¡ ¤
               ¢ £¡ £ £¡£¡ £4
                                         ¡
                                         £¤¡£¤¤       DID 1 12−bit RGB double buffered,
                                                            buffer A visible (shared)
                                                                                             GLX requires the contents of ancillary buffers to be
                                                                                          shared between renderers binding to a window and the
             £                         ¤ ¡            DID 2 24−bit RGB                    contents of these buffers to be retained even when no
     ¡¡¢¡¡¡                                                                               renderers are bound to the window. For hardware buffers,
                                                                                          these requirements are typically straightforward to meet
 Now a glXSwapBuffers happens on window 2...
                                                                                          since the ancillary buffers exist in the hardware frame
   ¡ ¡ ¢ ¡ ¡ ¡  
    ¡¡¢¡¡¡                                     ¡
                                               ¡  £
             The X server must:
                                                                                          buffer.
  ¡ ¡ ¢¥ ¡¥ ¡ ¡
  ¡ ¡ ¢¡¢ ¡  
      1     £¢¤¢¤¡¥ ¡¥ ¡ ¡
             ¢ ¢ ¡¡¢ ¡
                  ¤¢¤¡ ¡ ¤¤¤  ¥¥
                    3   ¡££ ¤¡
                      ££ ¤                  ¡£
                                            ££¡       DID 0 8−bit ColorIndex
                                                      DID 1 12−bit RGB double buffered,
                                                                                             OpenGL indirect rendering could easily allow ancillary
£ ¡¡¢¡¢¡   2
£¡ ¡ £ ¥ ¡¥ ¡ ¥  ¤ ¡ ¤£ ¥¡¥¡ ¥
            ¢¡¢¡¢¡   £¡  ¡  4
                               ¤       ¤¡ ¤¡¤¤¥
                                        ¥¥¡
                                        ¡¥                  buffer A visible              buffers to be shared between renderers since all the buffers
                                                      DID 2 24−bit RGB                    would exist in the X server’s address space and the X
 ¢£       £                                           DID 3 12−bit double buffered,
                                                                                          server has immediate knowledge of the changing state of
                                                            buffer A visible (unshared)   windows.
                                                                                             Combining direct rendering with retained, shared soft-
Figure 3: Example of a glXSwapBuffers being per-
                                                                                          ware ancillary buffers is difficult to achieve without
formed on window 2 which shares the same display ID
                                                                                          compromising performance. The SGI direct rendering
(DID 1)with window 4. Before the buffer swap can be
                                                                                          OpenGL implementation does not currently support the
performed, the X server must rewrite the display IDs such
                                                                                          correct sharing of ancillary buffers between renderers in
that window 2 is on an unshared display ID (DID 3).
                                                                                          different address spaces. Each OpenGL library instance
                                                                                          allocates software ancillary buffers for its own address
OpenGL and X rendering is discussed).                                                     space. These buffers can be shared between renderers
   If a process issues the “swap buffer” ioctl and the                                    in the same address space. Also, the contents of these
process’s current rendering node does not have its “swap-                                 buffers are retained only for the lifetime of the address
pable window” resource valid, a message is sent to the                                    space.
X server requesting the X server place the window on an                                      Incorrectly supporting software buffers is strictly
unshared DID and validate the rendering node’s “swap-                                     speaking a violation of what OpenGL requires. But few
pable window” resource. The X server complies with                                        programs rely on sharing buffers across address spaces.
the request and validates the “swappable window” for                                      Sharing software buffers is an area where SGI’s OpenGL
the window via a board manager reserved ioctl. Once                                       implementation does not properly isolate window state
validated, the buffer swap can then be scheduled.                                         from rendering state. Further work needs to be done to
   As in the virtualizationof CIDs, unshared DIDs may be                                  support ancillary buffer sharing.
stolen from windows already on an unshared DID to val-
                                                                                             One problem that must be solved is the de-allocation of
idate the resources of rendering nodes attempting buffer
                                                                                          software ancillary buffers when windows are destroyed.
swaps. When a DID is stolen, the window previously
                                                                                          The OpenGL library has no obvious way to find out when
on an unshared DID will have to find another window
                                                                                          an X window it is maintaining software ancillary buffers
to share a DID with that has the identical display mode
                                                                                          for is destroyed so it can know to deallocate those buffers.
(because a window should never be assigned a DID with
                                                                                          Since the buffers tend to be quite large, leaking ancillary
a display mode not matching the correct display mode
                                                                                          buffers is extremely expensive.
for the window). Forcing a window to share a DID with
another window may force that other window to have its                                       SGI solves the problem by adding a private extension
“swap buffer” resource invalidated since it might have                                    to the X server that requests the X server to generate a
previously had an unshared DID allocated to it. Figure                                    SpecialDestroyNotify when a specified X win-
3 is an example of a window needing to be assigned an                                     dow is destroyed. The first time an OpenGL rendering
unshared DID.                                                                             context is bound to a window, this request is made for the
                                                                                          new window. Hooks in the X extension library (libX-
4.6        Shared Software Buffers                                                        ext) allow SpecialDestroyNotify events to trig-
                                                                                          ger a callback into OpenGL to deallocate the associated
   OpenGL’s GLX specification requires implementa-                                         software buffers. The event is never seen by an X pro-
tions to support various types of ancillary buffers. When                                 gram. Other mechanisms such as using the standard X
there is no hardware support for these various types of                                   DestroyNotify event proved unreliable since the X
buffers, OpenGL implementations are expected to sup-                                      client might not be selecting for that event.
4.7   Cursors                                                                                   ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡
                                                                                                ¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡  
                                                                                                                           
   Logically, a window system cursor “floats” above the             Normal                    ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ 
                                                                                              ¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡  
                                                                                            ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ 
windows on the screen. Standard dumb frame buffer
graphics hardware for window systems requires special
                                                                      ¡¢¡¢¡£¢¡¢£¡£¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ 
                                                                   plane
                                                                     £¡£¢£¡£¢££¡¢£¡££¢¡£¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡
                                                                    £¡£¢£¡££¢¡£¢££¡¢£¡¢ ££¡ ££¢ ££¡ ££¢ ££¡ ¡ £££                
                                                                   window
                                                                                                                     £¡
software support for managing the window system cursor.            £ £ £ £ £ £ £ £ ¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡  £ £ £ £ £ £¡   £
                                                                                            ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ 
   The standard “software cursor” technique [13] used to
                                                                   ¡¢£¡¢¡¢£¡¢¡¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ 
                                                                  £¡££¢¡£¢£¡£¢£¡£¢£¡£¢ ¡ ¢ ¡ ¢ ¡ ¡ ££                
                                                                ££¡¢£¡£¢£¡£¢¡£¢£¡£¢£¡£¢£¡£¢£¡£¡£   £¡£¢£¡£¢£¡£¡
                                                                                                             Overlay
manage the window system cursor is to save the pixels                                                        plane
under the cursor when the cursor is rendered. When the         £¡£¢£¡£¢£¡£¢£¡£¢£¡£¢¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡  
                                                                                                 ££ ££ ££ ££ ££ ££
                                                                        £¢£¡£¢£¡£¢£¡£¢£¡£¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡
                                                                                            ¡ ¢ ¡ ¢ ¡ ¡ ££                
                                                                                                             window
cursor’s image might interfere with rendering or frame      £ ¡£ £ £ £ £ £ £ £ £ £¡£¢£¡£¢£¡£¡
                                                              £¡£¢£¡£¢£¡£¢£¡£¢£¡£¢¡¢¡¢¡¡£       £ £ £ £ £ £
buffer read back, the cursor must be undrawn (restor-
ing the saved pixels) and redrawn on completion of the                  viewable clip                           drawable clip
rendering or read back.
   The undraw/redraw technique described above is rea-      Figure 4: The difference between the drawable clip and
sonable if the window system server is the only ren-        visible clip of a window occluded by a window in the
derer and the window system server manages the cursor.      overlay planes.
But using direct rendering, asynchronous direct renderers
need to render into windows containing the cursor but do
not have immediate knowledge of the cursor to utilize       underlays can be thought of as having multiple, stacked
the undraw/redraw technique. Techniques for integrating     frame buffer layers.
software cursors with direct rendering have been imple-        As mentioned previously, OpenGL treats windows in
mented [2], but they require a graphics hardware locking    the overlay and underlay planes as first class windows
strategy that is incompatible with SGI’s goal of “maxi-     in the X window hierarchy which is the convention for
mum performance” rendering.                                 handling overlays in X [4]. IRIS GL had a simpler notion
   The alternative to a software cursor is hardware sup-    where frame buffer layers all existed in a single win-
port for a cursor. Normally, this consists of support in    dow spanning each frame buffer layer. Windows in non-
the video back end that merges in the cursor image into     normal layers are just like other windows excepting trans-
the video output. Using a hardware cursor eliminates        parency effects and potentially fewer expose events being
both the rendering overhead and flicker of the software      generated.
technique. All SGI graphics hardware supports hardware         The most important insight into the support for frame
cursors, thereby decoupling direct renderering from win-    buffer layers in a window system is that the drawable clip
dow system cursor management.                               and the visible region of a window are no longer always
                                                            identical. Figure 4 demonstrates this point.
5 Other Issues                                                 SGI found that its older hardware which supported a
                                                            single layer of integrated CIDs and DIDs made it impos-
   There are other issues that do not relate directly to    sible to support direct OpenGL rendering into the overlay
supporting virtualized, direct access rendering, but that   planes. This old hardware is sufficient to implement IRIS
still are important to the implementation of SGI’s direct   GL’s simpler model for layered frame buffers, but a single
rendering support.                                          layer of DIDs combined with CIDs makes it impossible
                                                            to perform CID clipping for overlay plane windows while
                                                            keeping the display modes correct for the normal plane
5.1   Overlays and Underlays
                                                            windows.
   OpenGL supports overlay and underlay planes. Over-          The current high-end SGI hardware supports separate
lays are frame buffer image planes that are displayed       DID information per frame layer to solve this problem (to
preferentially to the normal frame buffer image planes.     support multiple display modes for the overlays). With
A special transparent pixel value can be used to “show      separate CIDs and DIDs, the CID planes can generally be
through” to the contents of the normal planes. Underlay     used for both overlay and normal planes clipping. Using
planes are like overlays but are displayed deferentially    CIDs for clipping does not change how the display modes
to the normal planes. Overlays and underlays are useful     are arranged. But using CIDs for both overlay and normal
for text annotation, rubber banding, transient menus, and   planes clipping could contribute to CID thrashing since
animation effects. While a simple frame buffer has a sin-   the drawable region for an overlay window might overlap
gle layer, graphics hardware supporting overlays and/or     the drawable region of a normal plane window. Separate
clipping planes for each frame buffer layer could remedy       Ensuring X rendering always goes into the front buffer
the problem, but CID thrashing due to sharing clipping         cannot be done using the relative access allowed through
planes between layers has not proven to be a problem in        rendering nodes. Instead, the X server makes sure it
practice.                                                      renders into the correct absolute buffer (buffer A or B, as
                                                               opposed to front or back).
5.2    Synchronization Issues                                     But as discussed before, buffer swaps can be scheduled
                                                               by direct renderers without the attention or knowledge of
   GLX treats the OpenGL command stream and the X re-          the X server. If the X server has validated the swappable
quest stream as two independent sequences of commands.         DID resource of a window, it can no longer ensure render-
These streams may execute at different rates. GLX sup-         ing goes into the front buffer. The X server could query
ports glXWaitX and glXWaitGL that allow the X and              the hardware to determine state of the buffer. Query-
OpenGL command streams to be explicitly synchronized.          ing the current front buffer would require a query per-
glXWaitGL prevents subsequent X requests from exe-             rendering operation. Worse yet, the query’s information
cuting until any outstanding OpenGL commands have              is only valid for the instant of the query.
completed. glXWaitX prevents subsequent OpenGL                    If the X server wants to render into the window of
commands from executing until any outstanding X re-            a double buffered window that has its swappable DID
quests have completed. When using indirect rendering,          resource validated, the X server invalidates the unshared
these calls force their appropriate sequentiality without      DID resource for the window. A graphics driver ioctl
the cost of a round-trip. These calls can be thought of as     is used to perform the invalidation. This invalidation
synchronization tokens actually embedded in the X pro-         accomplishes two things:
tocol stream. When direct rendering, a glXWaitX does
require a XSync to ensure an X requests have completed.
                                                                  




                                                                     Revokes the permission of direct renderers to swap
                                                                     the buffer. Further attempts to swap the window
                                                                     will be held off and result in a message to the X
5.3    Correct Front Buffer Rendering for X                          server to revalidate the unshared DID resource. The
   Support for double buffering introduces a new compli-             swap will be delayed until the X server revalidates
cation to mixing non-OpenGL X rendering with OpenGL                  the resource.
rendering for SGI. When X rendering is done to an X win-          




                                                                     And, returns the current state of the buffer.
dow, the render operations should affect the front buffer
of the window. So X rendering must be directed into the        While this operation is heavy-handed, it lets the X server
correct relative buffer, i.e., the front buffer.               render correctly into the front buffer. With stable knowl-
                                                               edge of which absolute buffer is being displayed, the X
   As discussed earlier, rendering nodes virtualize the rel-
                                                               server can render correctly.
ative access to front and back buffers when using double
                                                                  The overhead of the scheme is minimized because
buffering. But the X server renders all non-OpenGL ren-
                                                               the X server only checks if it needs to revoke a dou-
dering to all windows through a single rendering node.
                                                               ble buffered window’s shared DID resource during the
The X server does not rely on its rendering node for cor-
                                                               validation of a graphics context (GC) and window. Se-
rect window clipping or to determine correctly the front
                                                               rial numbers determine when the window and a given
or back buffer. In part this is because the X server does
                                                               GC are validated with respect to each other. When the
not bind to particular windows the way OpenGL does.
                                                               shared DID resource is validated, the serial number of the
   Because the X server has full knowledge of the window
                                                               window is updated with a unique value, forcing any pre-
tree, it does its clipping in software (and sometimes with
                                                               viously valid GCs to become invalid. Any X rendering
hardware assist). The use of virtualized clipping within
                                                               will then force a GC validation, at which time the shared
the X server is difficult for two reasons:
                                                               DID resource can be revoked.
   




      The X server renders to many more windows than
      OpenGL programs do. Using virtualized clipping           5.4 X Server Multi-rendering for Indirect Ren-
      would quickly exhaust the hardware clipping re-              dering
      sources.                                                   Indirect rendering is still required by the GLX spec-
   




      Because the same thread in the X server does core        ification so even the best direct rendering support still
      rendering as does the validation of virtual clip re-     requires that indirect rendering be supported. SGI’s
      sources, use of virtualized clipping would deadlock      OpenGL implementation treats indirect rendering as a
      the X server.                                            special case of direct rendering.
Independently scheduled threads within the X server           [6] Chandlee Harrell, Farhad Fouladi, “Graphics Ren-
execute indirectly rendered OpenGL commands. A                       dering Architecture for a High Performance Desk-
distinct thread is created for each X connection using               top Workstation,” SIGGRAPH 93 Conference Pro-
OpenGL indirect rendering. This thread within the X                  ceedings, August 1993.
server’s address space executes the same OpenGL ren-
dering code that direct renderers use and utilizes the same      [7] Chandrasekhar Narayanawami, et.al., “Software
system support for direct rendering. One can think of the            OpenGL: Architecture and Implementation,” IBM
OpenGL rendering threads within the X server as prox-                RISC System/6000 Technology: Vol. II, 1993.
ies that execute OpenGL commands on behalf of an X               [8] Phil Karlton, OpenGL Graphics with the X Window
client using indirect OpenGL rendering. The OpenGL                   System, Ver. 1.0, Silicon Graphics, April 30, 1993.
rendering threads within the server only coordinate with
the main X server thread to hand off commands to exe-            [9] Mark J. Kilgard, “Going Beyond the MIT Sample
cute and return results. Otherwise, these threads do not             Server: The Silicon Graphics X11 Server,” The X
manipulate any X server data structures. This technique              Journal, SIGS Publications, January 1993.
is called multi-rendering and is discussed in greater detail
in [10].                                                        [10] Mark J. Kilgard, Simon Hui, Allen A. Leinwand,
                                                                     Dave Spalding, “X Server Multi-rendering for
                                                                     OpenGL and PEX,” The X Resource: Proceedings
Acknowledgments                                                      of the 8th X Technical Conference, O’Reilly & As-
                                                                     sociates, Issue 9, January 1994.
   Much of the understanding for supporting direct ren-
dering of OpenGL came from the implementation of its            [11] OpenGL Architecture Review Board, OpenGL Ref-
predecessor, the IRIS GL. The work of Jeff Doughty, John             erence Manual: The official reference document for
Giannandrea, Luther Kitahata, Paul Mielke, Jeff Wein-                OpenGL, Release 1, Addison Wesley, 1992.
stein, and others on IRIS GL’s resource management laid         [12] Desi Rhoden, Chris Wilcox, “Hardware Accelera-
the basis for the facilities described here. We would like to        tion for Window Systems,” Computer Graphics, As-
acknowledge all those at Silicon Graphics who helped en-             sociation for Computing Machinery, Vol. 23, Num.
sure OpenGL’s successful implementation, notably Kurt                3, July 1989.
Akeley, Peter Daifuku, Simon Hui, Phil Karlton, Mark
Stadler, Paula Womack, and David Yu.                            [13] David Rosenthal, Adam de Boor, Bob Scheifler,
                                                                     “Godzilla’s Guide to Porting the X V11 Sample
                                                                     Server,” X11R5 Documentation, Massachusetts In-
References                                                           stitute of Technology, April 12, 1990.
                                                                                                             ¡ ¢ 
 [1] Kurt Akeley, “RealityEngine Graphics,” SIG-                [14] Mark Segal, Kurt Akeley, The OpenGL           Graph-
     GRAPH 93 Conference Proceedings, August 1993.                   ics System: A Specification, Ver. 1.0, Silicon Graph-
                                                                     ics, April 30, 1993.
 [2] Jeff Boyton, et.al., “Sharing Access to Display                                                           ¡ £ 
     Resources in the Starbase/X11 Merge System,”               [15] Silicon Graphics, “Indy Graphics,” Indy          Techni-
     Hewlett-Packard Journal, December 1989.                         cal Report, Ver. 1.0, 1993.
                                                                [16] C. H. Tucker, C. J. Nelson, “Extending X for High
 [3] Bradley Chen, “Memory Behavior of an X11 Win-
                                                                     Performance 3D Graphics,” Xhibition 91 Proceed-
     dow System,” USENIX Conference Proceedings,
                                                                     ings, 1991.
     January 1994.
                                                                [17] Doug Voorhies, David Kirk, Olin Lathrop, “Virtual
 [4] Peter Daifuku, “A Fully Functional Implementa-                  Graphics,” Computer Graphics, Vol. 22, Num. 4,
     tion of Layered Windows,” The X Resource: Pro-                  August 1988.
     ceedings of the 7th Annual X Technical Conference,
     O’Reilly & Associates, Issue 5, January 1993.

 [5] Edward Haletky, Linas Vepstas, “Integration of GL
     with the X Window System,” Xhibition 91 Proceed-
     ings, 1991.

Contenu connexe

Tendances

Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
 
Gallium3D - Mesa's New Driver Model
Gallium3D - Mesa's New Driver ModelGallium3D - Mesa's New Driver Model
Gallium3D - Mesa's New Driver ModelChia-I Wu
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsMark Kilgard
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsAMD Developer Central
 
Avoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsAvoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsMark Kilgard
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingAndreas Schreiber
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardMark Kilgard
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
EclipseCon 2011: Deciphering the CDT debugger alphabet soupEclipseCon 2011: Deciphering the CDT debugger alphabet soup
EclipseCon 2011: Deciphering the CDT debugger alphabet soupBruce Griffith
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
Smt kingdomwhatsnew85
Smt kingdomwhatsnew85Smt kingdomwhatsnew85
Smt kingdomwhatsnew85smtmarketing
 

Tendances (20)

Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
Gallium3D - Mesa's New Driver Model
Gallium3D - Mesa's New Driver ModelGallium3D - Mesa's New Driver Model
Gallium3D - Mesa's New Driver Model
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 
Avoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsAvoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL Pitfalls
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforward
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
EclipseCon 2011: Deciphering the CDT debugger alphabet soupEclipseCon 2011: Deciphering the CDT debugger alphabet soup
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
Smt kingdomwhatsnew85
Smt kingdomwhatsnew85Smt kingdomwhatsnew85
Smt kingdomwhatsnew85
 

En vedette

Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
 
NV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked QuestionsNV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked QuestionsMark Kilgard
 
A Follow-up Cg Runtime Tutorial for Readers of The Cg Tutorial
A Follow-up Cg Runtime Tutorial for Readers of The Cg TutorialA Follow-up Cg Runtime Tutorial for Readers of The Cg Tutorial
A Follow-up Cg Runtime Tutorial for Readers of The Cg TutorialMark Kilgard
 
Improving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil BufferImproving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil BufferMark Kilgard
 
Mixing Path Rendering and 3D
Mixing Path Rendering and 3DMixing Path Rendering and 3D
Mixing Path Rendering and 3DMark Kilgard
 
(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개
(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개
(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개nemoux
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingMark Kilgard
 
GTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdGTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdMark Kilgard
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)Mark Kilgard
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path RenderingMark Kilgard
 
Geometry Shader-based Bump Mapping Setup
Geometry Shader-based Bump Mapping SetupGeometry Shader-based Bump Mapping Setup
Geometry Shader-based Bump Mapping SetupMark Kilgard
 
Ccv shadow volumes_supporting_slides
Ccv shadow volumes_supporting_slidesCcv shadow volumes_supporting_slides
Ccv shadow volumes_supporting_slidesMark Kilgard
 
GTC 2009 OpenGL Gold
GTC 2009 OpenGL GoldGTC 2009 OpenGL Gold
GTC 2009 OpenGL GoldMark Kilgard
 
An Introduction to NV_path_rendering
An Introduction to NV_path_renderingAn Introduction to NV_path_rendering
An Introduction to NV_path_renderingMark Kilgard
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)Mark Kilgard
 
Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well Mark Kilgard
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 

En vedette (20)

Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
 
NV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked QuestionsNV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked Questions
 
A Follow-up Cg Runtime Tutorial for Readers of The Cg Tutorial
A Follow-up Cg Runtime Tutorial for Readers of The Cg TutorialA Follow-up Cg Runtime Tutorial for Readers of The Cg Tutorial
A Follow-up Cg Runtime Tutorial for Readers of The Cg Tutorial
 
Improving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil BufferImproving Shadows and Reflections via the Stencil Buffer
Improving Shadows and Reflections via the Stencil Buffer
 
Mixing Path Rendering and 3D
Mixing Path Rendering and 3DMixing Path Rendering and 3D
Mixing Path Rendering and 3D
 
Cg in Two Pages
Cg in Two PagesCg in Two Pages
Cg in Two Pages
 
(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개
(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개
(NEMO-UX) WAYLAND 기반 윈도우 매니저 소개
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
 
GTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdGTC 2009 OpenGL Barthold
GTC 2009 OpenGL Barthold
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (paper)
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path Rendering
 
Realizing OpenGL
Realizing OpenGLRealizing OpenGL
Realizing OpenGL
 
Geometry Shader-based Bump Mapping Setup
Geometry Shader-based Bump Mapping SetupGeometry Shader-based Bump Mapping Setup
Geometry Shader-based Bump Mapping Setup
 
Ccv shadow volumes_supporting_slides
Ccv shadow volumes_supporting_slidesCcv shadow volumes_supporting_slides
Ccv shadow volumes_supporting_slides
 
GTC 2009 OpenGL Gold
GTC 2009 OpenGL GoldGTC 2009 OpenGL Gold
GTC 2009 OpenGL Gold
 
An Introduction to NV_path_rendering
An Introduction to NV_path_renderingAn Introduction to NV_path_rendering
An Introduction to NV_path_rendering
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
 
Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 

Similaire à System Support for OpenGL Direct Rendering

OpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsOpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsJungsoo Nam
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Prabindh Sundareson
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalMasatsugu HASHIMOTO
 
Ha4 displaying 3 d polygon animations
Ha4   displaying 3 d polygon animationsHa4   displaying 3 d polygon animations
Ha4 displaying 3 d polygon animationsJordanSmith96
 
13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGL13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGLJungsoo Nam
 
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axelaparuma
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
Displaying 3 d polygon animations
Displaying 3 d polygon animationsDisplaying 3 d polygon animations
Displaying 3 d polygon animationshalo4robo
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsPrabindh Sundareson
 
Dynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdfDynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdfAgneshShetty
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfSamiraKids
 
Gpu computing-webgl
Gpu computing-webglGpu computing-webgl
Gpu computing-webglVisCircle
 
Cloud Graphical Rendering: A New Paradigm
Cloud Graphical Rendering:  A New ParadigmCloud Graphical Rendering:  A New Paradigm
Cloud Graphical Rendering: A New ParadigmJoel Isaacson
 

Similaire à System Support for OpenGL Direct Rendering (20)

OpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIsOpenGL ES EGL Spec&APIs
OpenGL ES EGL Spec&APIs
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_Final
 
Ha4 displaying 3 d polygon animations
Ha4   displaying 3 d polygon animationsHa4   displaying 3 d polygon animations
Ha4 displaying 3 d polygon animations
 
13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGL13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGL
 
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
 
Computer graphics workbook
Computer graphics workbookComputer graphics workbook
Computer graphics workbook
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
Angel
AngelAngel
Angel
 
Opengl basics
Opengl basicsOpengl basics
Opengl basics
 
Displaying 3 d polygon animations
Displaying 3 d polygon animationsDisplaying 3 d polygon animations
Displaying 3 d polygon animations
 
Introduction to OpenGL.ppt
Introduction to OpenGL.pptIntroduction to OpenGL.ppt
Introduction to OpenGL.ppt
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI Platforms
 
Graphics Libraries
Graphics LibrariesGraphics Libraries
Graphics Libraries
 
Regal
RegalRegal
Regal
 
Slideshare
SlideshareSlideshare
Slideshare
 
Dynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdfDynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdf
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
 
Gpu computing-webgl
Gpu computing-webglGpu computing-webgl
Gpu computing-webgl
 
Cloud Graphical Rendering: A New Paradigm
Cloud Graphical Rendering:  A New ParadigmCloud Graphical Rendering:  A New Paradigm
Cloud Graphical Rendering: A New Paradigm
 

Plus de Mark Kilgard

Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsMark Kilgard
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016Mark Kilgard
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsMark Kilgard
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectanglesMark Kilgard
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Mark Kilgard
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsMark Kilgard
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingMark Kilgard
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012Mark Kilgard
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012Mark Kilgard
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam ReviewMark Kilgard
 
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsCS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsMark Kilgard
 
CS 354 Performance Analysis
CS 354 Performance AnalysisCS 354 Performance Analysis
CS 354 Performance AnalysisMark Kilgard
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingMark Kilgard
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingMark Kilgard
 

Plus de Mark Kilgard (20)

Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School Students
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUs
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectangles
 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional Improvements
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam Review
 
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsCS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
 
CS 354 Performance Analysis
CS 354 Performance AnalysisCS 354 Performance Analysis
CS 354 Performance Analysis
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & Tracing
 
CS 354 Typography
CS 354 TypographyCS 354 Typography
CS 354 Typography
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path Rendering
 

Dernier

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 

Dernier (20)

20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 

System Support for OpenGL Direct Rendering

  • 1. System Support for OpenGL Direct Rendering Mark J. Kilgard David Blythe Deanna Hohn Silicon Graphics, Inc. 2011 N. Shoreline Mountain View, CA 94043-1389 email: mjk@sgi.com blythe@sgi.com hohn@sgi.com Abstract tion (as opposed to running over the network). Direct rendering is not available if the OpenGL process is con- OpenGL’s window system support for the X Window nected to a remote X server. System explicitly allows implementations to support di- rect rendering of OpenGL commands to the graphics For interactive 3D applications, the maximum possi- hardware. Rendering directly to the hardware avoids the ble rendering performance is critical to the success of the overhead of packing and relaying protocol requests to the application. When available, direct rendering has a sub- X server inherent in indirect rendering. stantial performance advantage over rendering indirectly The OpenGL implementation available for Silicon via the X server, i.e., indirect rendering. Instead of using Graphics workstations supports direct rendering using the X server as a proxy for rendering, rendering com- virtualizable graphics hardware in conjunction with the mands are sent directly to the graphics hardware. Direct kernel and the X server. The techniques described provide rendering can be thought of as a means of “cutting out “maximum performance” rendering for OpenGL. Some the middle man.” Indirect rendering is still useful (and of the issues are specific to OpenGL, but most of the required by GLX) because it allows the same network techniques described are appropriate for the implemen- extensibility and inter-operability of traditional X clients. tation of any high-performance direct rendering graphics This paper discusses how Silicon Graphics, Inc. (SGI) interface. implements direct rendering in its OpenGL implemen- tation through a combination of hardware features, op- Keywords: OpenGL, Virtual Graphics, Direct Rendering. erating system support, X server support, and X and OpenGL library support. SGI implements the described 1 Introduction   facilities in IRIX 5.3. The next section discusses the goal of direct rendering, SGI’s approach for supporting The OpenGL graphics system [14, 11] is a window sys- virtualized direct rendering, and support for direct ren- tem independent software interface to graphics hardware dering by OpenGL’s predecessor and other direct ren- for 3D rendering. GLX [8] is the OpenGL extension to dering graphics systems. Section 3 describes OpenGL’s the X Window System that specifies how OpenGL inte- implementation model and the requirements implied for grates with X. The GLX specification explicitly allows implementing direct rendering. Section 4 presents how (but does not require) implementations to support direct SGI virtualizes access to the graphics hardware to sup- rendering of OpenGL commands to the graphics hard- port direct rendering. Section 5 addresses other issues ware. Direct rendering allows OpenGL commands to not strictly related to virtualized direct access rendering bypass the normal X protocol encoding, transport, and but still important for supporting direct rendering. X server dispatch. Through sufficient hardware and sys- tem software support, OpenGL rendering can achieve the maximum rendering performance from the hardware. Direct rendering naturally implies that the direct ren- ¡ IRIX is the SGI version of the Unix operating system. Most of the dering process is running on the local graphics worksta- facilities described were originally developed for IRIX 5.2.
  • 2. 2 Background example (whereby OpenGL sends a 3D vertex to the hard- ware; a very common OpenGL operation) is handled in Support for direct rendering was purposefully designed this way. Even with streaming, the indirect case incurs into OpenGL’s GLX specification. While use of the GLX the overhead of encoding, transport, and decoding GLX extension protocol permits interoperability and network- protocol for requests and replies. extensibility of OpenGL rendering, forcing the X server The direct case can eliminate the overhead associated as an intermediary for OpenGL rendering imposes inher- with context switching between the OpenGL program and ent limitations on OpenGL rendering performance. Per- the X server, protocol encoding, transport, and decoding mitting direct rendering avoids the unacceptable situation when performing OpenGL rendering. Direct rendering where expensive, high-performance graphics hardware also improves cache and TLB behavior by avoiding fre- subsystems designed to support OpenGL [1, 6] have their quent context switches and multiple active contexts [3]. graphics performance potential starved by the overhead The examples in Table 1 assume the “fast path” of incurred by indirect rendering. SGI’s OpenGL implementation is taken. Being on the Using OpenGL display lists (non-editable sequences of “fast path” means the system resources for direct render- OpenGL commands that can be downloaded into the X ing (discussed in detail later) are already made available. server and later executed) can ease the burden of indirect Unless resources are in contention or resources are being rendering since it minimizes the GLX protocol needed used for the first time, the “fast path” is the norm. for rendering. However, use of display lists is often inap- propriate for many applications, particularly applications 2.2 “Maximum Performance” 3D Rendering with very dynamic scenes. Such applications favor us- ing immediate mode rendering that requires much higher SGI’s OpenGL implementation seeks to achieve “max- bandwidth for the OpenGL command stream. Measure- imum performance” OpenGL rendering. For our pur- ments of the IRIX 5.2 OpenGL implementation show poses, “maximum performance” means that when there indirect immediate mode rendering has inferior perfor-   is no contention for rendering resources, and once utilized mance to direct rendered immediate mode [10]. Even resources are made available, graphics rendering perfor- programs heavily reliant on display lists are slower when mance is limited only by the system’s raw graphics per- rendering indirectly. formance and the graphics software efficiency. The max- imum performance potential of the workstation should be achievable. 2.1 Direct Rendering Benefits In practice, this means no locks need to be acquired Table 1 breaks down the overhead of indirect rendering and released when rendering. Even so, multiple OpenGL relative to direct rendering for the extreme and common programs should be able to run concurrently. But the cases. The extreme example demonstrates the high-level overhead from concurrent use of graphics should only steps involved in executing the OpenGL glReadPix- be introduced when multiple processes are concurrently els command used for reading pixels from a window. using the graphics hardware; performance of the sin- The command in question requires data to be returned gle renderer case should not be compromised. Graphics to the application. In the indirect case, this requires a programs should not be burdened with overhead from context switch to the X server and back to the OpenGL window clipping; in particular, multi-pass rendering for program. And the pixel data returned is copied three correct clipping should not be necessary. And the possi- times, as opposed to a single copy in the direct render- bility of asynchronous window management operations ing case. While glReadPixels requires a round-trip such as changing window clipping or changing the win- to the X server to return the pixel data, most OpenGL dow origin should not add any overhead to the normal commands return no data. case when clips and origins are not changing. For most OpenGL commands, the context switch over- head can be amortized over multiple GLX requests by 2.3 Virtual Graphics streaming protocol requests. The table’s glVertex3f SGI workstations implement virtual graphics [17] to achieve the goal of “maximum performance” rendering. ¡ The presented results showed immediate mode OpenGL graphics performance using indirect rendering ranging from 34% to 68% of the Virtual graphics means that every graphics process has direct rendering performance depending on the model of SGI graphics the illusion of exclusive access to the graphics rendering workstation. The faster the graphics hardware, the higher the relative penalty for using indirect rendering versus direct rendering due to the engine. Many systems allow direct access to the graph- higher relative overhead of indirect rendering. ics hardware. Virtual graphics not only allows direct
  • 3. Indirect glReadPixels Direct glReadPixels Indirect glVertex3f Direct glVertex3f 1. glReadPixels(...) 1. glReadPixels(...) 1. glVertex3f(...) 1. glVertex3f(...) 2. pack GLX protocol request 2. pack GLX protocol request 2. send vertex to HW 3. write request to kernel 3. return 3. return 4. block client waiting to read reply additional non-reply 5. deschedule OpenGL client OpenGL requests can be 6. schedule X server batched in protocol buffer 7. read request from kernel 8. request decoded and 4. eventually, flush dispatched to GLX handler protocol buffer 9. read pixels from 2. read pixels from 5. deschedule OpenGL client screen to a buffer 6. schedule X server 10. write reply to kernel 7. read request from kernel 11. deschedule X server 8. request decoded and 12. schedule OpenGL client dispatch to GLX handler 13. decode reply header 9. send vertex to HW 14. copy reply data from kernel to final buffer 15. return 3. return Table 1: A comparison of the “fast path” steps involved implementing glReadPixels for the indirect and direct fast path cases. The extra steps involve data copying; protocol packing, unpacking, and dispatching; and context switch overhead. access, but treats graphics as a virtual system resource. gle display mode. The displayed buffer for the window’s This virtual view of graphics allows the system to con- display mode can be instantaneously changed. But the tend with simultaneous direct access by multiple graphics number of display modes is a limited hardware resource processes. Virtual graphics also arbitrates the contention so a virtual graphics system must be ready to virtualize for graphics resources such as screen real estate. display modes. There are three classes of contention that virtualized Each of these classes of contention are dealt with by graphics for a window system must arbitrate: SGI’s virtualized, direct access rendering for OpenGL. Concurrent access contention. When different graphics processes are using a single graphics engine, 2.4 SGI-style Graphics Hardware this hardware state must be context switched. With vir- tual graphics, this context switching is transparent to the Unlike most low-end workstation and PC graphics graphics process, much the same way processor context hardware, SGI graphics hardware does not expose a mem- switches are transparent to Unix processes. ory mapped frame buffer. Instead, graphics commands are issued to a graphics engine through the manipulation Screen real estate contention. Window systems ar- of memory mapped device registers. This interface to bitrate how windows are arranged on the screen. Vir- the graphics hardware is often called the graphics pipe tualized graphics must ensure that rendering is properly or simply the pipe. Because all rendering operations are clipped to the drawable region of the rendering window. done through the graphics pipe, the pipe’s virtual mem- Clipping should be correct even in the face of asyn- ory mapping can be used to control access to graphics chronous window management operation by the X server. rendering, i.e., virtualize graphics. Because window systems like X allow arbitrary overlap- There is substantial variation in the extent of geome- ping of windows, clipping to arbitrary regions must be try and rasterization processing implemented within vari- possible. ous SGI graphics hardware configurations. The high-end Non-visible resource contention. Modern graph- SGI graphics hardware [1] implements almost the en- ics hardware supports features like double buffering, in tire OpenGL state machine within the graphics hardware. which a front buffer is displayed while the next anima- Most pipe commands (called tokens) sent to the graphics tion frame is generated in a non-visible back buffer. When hardware have a fairly direct mapping to the OpenGL the frame is complete, a buffer swap effectively copies the API. The high-end hardware is micro-coded and makes back buffer contents into the front visible buffer. Graphics heavy use of pipelining and parallelism in its various hardware can efficiently perform buffer swaps by tagging stages. all the pixels belonging to a swapping window with a sin- The low-end SGI graphics hardware [15] implements
  • 4. only the back-end of the OpenGL state machine. Most case, the hardware can support a set of clip rectangles to high-level operations such as vertex transformation, poly- quickly clip rendered pixels not belonging to the window gon rasterization, texturing, and lighting are performed [12]. The number of clip rectangles varies with graph- on the host processor inside the OpenGL library. But ics hardware but is typically less than eight. These clip low-level operations such as line drawing, shading, span rectangles are generally part of the graphics hardware rendering, dithering, etc. are implemented in the hard- context. The context for each direct renderer can use the ware rendering engine. same rectangular clipping hardware. Despite the variations in hardware across SGI’s prod- Arbitrary window clips will require far more clip rect- uct line, a direct rendering OpenGL program will work angles than is reasonable to support in hardware. A sec- (though perhaps not find the same frame buffer capabil- ond method of clipping is used for such windows. SGI ities or achieve the same performance) across the entire hardware also supports clipping planes. Clipping planes range of SGI OpenGL-capable graphics hardware. All are non-visible frame buffer planes used to encode per- the rendering code for OpenGL that directly accesses the pixel clipping IDs (often called CIDs). If the pixels of a hardware is isolated in the OpenGL library which is im- window all have the same CID (and only pixels belonging plemented as a shared library. The shared library on the to the window have that CID), the graphics hardware can system depends on the graphics hardware installed on the clip to an arbitrary window by enabling CID testing. A workstation, hiding all device dependencies when direct pixel rendered into the window is drawn only if the CID rendering. of the pixel being modified matches the CID that is set in the graphic hardware context. 2.4.1 Virtualized, Direct Rendering Needs The number of planes set aside for maintaining CIDs can be rather small. On low-end SGI hardware, the clip- While the degree to which the OpenGL state machine ping planes are only 2 bits deep. This means three CIDs is supported in hardware varies across the product line, are possible (2 bit planes provide 4 CID values but one virtualized direct access rendering requires functionality CID value must be used for screen real estate not belong- across all SGI graphics hardware configurations. The ing to the CID assigned windows). As will be explained requirements are: later, CIDs may need to be virtualized because they are a A context-switchable graphics engine. The state of limited resource. the graphics hardware must be context-switchable and the context switch must be able to be performed preemptively, 2.4.3 Display Mode Support including in the middle of a command. Display mode IDs (often called DIDs) are like CIDs, Window relative rendering. A process using virtual but instead of providing clipping information, DIDs de- graphics renders using window relative coordinates, so termine on a per-pixel basis how each pixel value on the that the window location need not be tracked if the win- screen should be displayed. The DID for a pixel is looked dow is moved. up in the hardware display mode table to determine how Arbitrary window clipping. As mentioned earlier, a the pixel should be displayed. The DID mode indicates window system supporting arbitrarily overlapping win- whether the pixel is RGB or color index (i.e., the color is dows, can result in arbitrary window clipping regions, so determined by a colormap); if color index, what hardware the hardware must support arbitrary clipping. And the colormap to use if there is more than one; the depth of the window’s clip and location can change asynchronously pixel (how many bits of the pixel value are significant); to the direct rendering process so the window clip must if the pixel is double buffered, and if so, what buffer (A be able to change without the renderer’s knowledge. or B) should be displayed. Figure 1 gives an example of Double buffering. Support must exist for per-window how a DID determines how a pixel should be displayed. double buffering. Also, OpenGL’s front/back relative Logically, DIDs are stored in a set of non-visible frame naming of buffers must be supported. A direct renderer buffer planes. Usually 16 or 32 DIDs are available. In should be unaware of the absolute hardware buffer it is low-end hardware, devoting 4 to 5 bitplanes per pixel to rendering to. store the DID is too expensive. In this case, the DID val- ues are run-length encoded. This encoding is convenient becauses the frame buffer is scanned out in horizontal 2.4.2 Window Clipping Hardware lines. But logically, there is still a DID per pixel. In Most windows have rather simple clip regions, consist- principle, a complex arrangement of display modes on ing of a small number of rectangles. For this common the screen might be too complex to represent with a run-
  • 5. Pixel Composition ware design trade off. Pixel value sent to Buffer A Color Controller 2.5 Previous IRIS GL Support Single Buffer Display mode table The predecessor to OpenGL is SGI’s proprietary IRIS Buffer B 0 1 GL. While OpenGL leaves window system operations to 2 the native window system (for example, the X Window Display System or Windows NT), IRIS GL provides its own win- DblBuf,CI, ID 5 5 dow management routines. OpenGL has a similar 3D display A buf (DID) rendering philosophy to IRIS GL, but OpenGL is a dis- tinctly new interface. The OpenGL state machine is well Other defined and the OpenGL API has a cleaner design and pixel regular name space. One of the most important changes bits 29 in OpenGL from IRIS GL is the clean separation of ren- 30 derer state from window state. In IRIS GL, the renderer 31 and window state were coupled. Figure 1: Example of display ID hardware for supporting Locally running IRIS GL programs use virtualized, double buffering. The pixel shown is assigned display ID direct access rendering. In fact, most of the experi- 5 meaning the pixel should be treated as a double buffered, ence in supporting virtualized, direct access rendering color index pixel with buffer A being the displayed buffer. for OpenGL was a result of experience with IRIS GL. The current IRIS GL direct rendering support actually uses the same support OpenGL uses. length encoded table. In practice, run-length encoded DID tables work extremely well. Note that many windows can all share the same DID if 2.6 Other Approaches their pixels all use the same display mode. For example, all non-double buffered 8-bit color index windows using the same colormap can share the same DID. Direct rendering in the manner SGI describes in this When a process requests a buffer swap for a window (in paper is not the only option for implementing direct ren- OpenGL, glXSwapBuffers would be called), a dou- dering. And direct rendering is not a necessity for pro- ble buffer window must have an unshared or swappable duction OpenGL implementations. The IBM OpenGL DID. This is because the buffer swap is accomplished implementation described in [7] does not utilize direct by toggling the displayed buffer for the window. If the rendering. Most currently available OpenGL implemen- pixels for the window (and only the pixels for the win- tations do not support direct rendering. dow) are all in the same DID, the buffer swap happens Previous to IBM’s support for the OpenGL standard, cleanly. What this means is a double buffered window IBM licensed IRIS GL from SGI for 3D hardware in the that needs to swap must be placed on an unshared DID original RS/6000. Their IRIS GL implementation uses before the swap can happen. Like CIDs, the number of virtualized, direct rendering much like SGI’s IRIS GL DIDs is limited by hardware so DIDs may also need to implementation [16, 5]. be virtualized. Hewlett-Packard provides direct rendering support for While DIDs and CIDs are discussed here as distinct their Starbase Graphics Library [2] with an approach entities, it is possible to combine display mode and clip- that is different from SGI’s direct rendering mechanism. ping information into a single set of bitplanes. This may Hewlett-Packard’s approach acquires a fast lock to be held be useful because often the clipping ID planes and dis- during rendering to the graphics engine. This locking al- play mode ID planes both have the same regions each lows clipping to be coordinated in software via shared assigned a CID and DID. Combining CIDs and DIDs can memory window clip serial numbers and proprietary X make better use of frame buffer memory. If 32 DIDs requests to query the current clip of a window. While were supported and 4 CIDs were supported, a combined such a system avoids the complex hardware and operat- scheme would support 128 combined DID/CID values! ing system support involved in SGI’s virtualized, direct But combining DIDs and CIDs means DID information access rendering mechanism, it forces explicit, fine-grain cannot be run-length encoded. This is a graphics hard- locking to arbitrate access to the graphics hardware.
  • 6. 3 OpenGL Requirements 3.3 Per Window Double Buffering The OpenGL GLX specification provides the model OpenGL supports per-window double buffering using used to integrate OpenGL with the X Window System. the glXSwapBuffers call. A side effect of calling Understanding the GLX model motivates how SGI im- glXSwapBuffers on a window that the calling thread plements virtualized, direct access rendering specifically is currently bound to is that further rendering to the win- for OpenGL and X. dow will not execute until the buffer swap completes. Double buffer hardware usually times the buffer swap to occur during vertical retrace. The glXSwapBuffers 3.1 Context Model may return before the buffer swap completes, but the OpenGL implementation is then responsible for delaying An OpenGL rendering context (or GLXContext) is any further OpenGL rendering to the window until the logically an instance of an OpenGL state machine. When buffer swap actually occurs. a context is created using glXCreateContext, the creator has the option of requesting a direct rendering con- text. If the program is running locally and the OpenGL 4 Virtualizing SGI Graphics implementation supports direct rendering, a direct render- ing context will be created. Everything that can be done When an OpenGL process creates a direct OpenGL with a direct context can be done with an indirect context rendering context, the process opens the graphics device. (the reverse is not true) so requesting a direct context but The process allocates an IRIX kernel resource known as being returned an indirect context is acceptable. a rendering node. A rendering node is a virtual graph- Once a context is created, that context can be bound ics hardware context and permits the graphics pipe to be or “made current” to a drawable (either a window or mapped into or “attached to” the process’s address space pixmap) supporting OpenGL rendering by calling glX- so a process can directly access the graphics hardware. MakeCurrent. Not only is the context bound to the Every direct rendering OpenGL context has an associated drawable, but also to the thread calling glXMakeCur- rendering node. Note that rendering nodes are completely rent. Once bound, any OpenGL calls issued by the hidden from OpenGL programs. The allocation and use thread are issued using the current context and affect the of rendering nodes is purposefully not made available for current drawable. Only one thread can be bound to a use by applications. They exist only to support imple- given context at a time; but multiple contexts (bound to menting the OpenGL and IRIS GL APIs. different threads) can be bound to a single drawable. Sub- The SGI X server [9] also uses a rendering node to sequent calls to glXMakeCurrent rebind the thread to access the graphics pipe. But the X server’s rendering the newly specified drawable and context. node is marked as being the board manager rendering node. The board manager rendering node is allowed to call a number of special board manager ioctls used 3.2 Sharing of Window State for validating and invalidating resources associated with GLX explicitly allows the sharing of window state. other rendering nodes. By acting as the board manager, For example, all OpenGL renderers bound to a double the X server must process messages sent by the kernel buffered window share the same notion of front and back indicating the needs of rendering nodes to have their vir- buffer state. This means if one client calls glXSwap- tual graphics resources validated. The X server receives Buffers on a window bound to by other OpenGL ren- the messages through a shared memory input queue (or derers, the other renderers maintain the same view of shmiq) also used by the kernel to efficiently pass input which buffer is front and which is back. device events to the X server. The purpose of the kernel One requirement of GLX that proves difficult to meet is messages and how the X server responds to them are dis- the sharing and management of ancillary buffer contents cussed shortly. The X server also uses its rendering node for multiple renderers bound to the same window. An- for standard X server rendering. cillary buffers are non-visible buffers used by rendering operations. Examples are stencil, depth, and accumula- 4.1 Making Current to a Window tion buffers. Sharing ancillary buffers is straightforward if they are supported in hardware, but sharing of buffers Before a direct rendering process can begin referenc- implemented via software is more difficult to correctly ing the graphics pipe through the rendering node’s pipe support. memory mapping, the rendering node must be bound to
  • 7. a window. For OpenGL, this happens at glXMakeCur- pipe access. rent time via a graphics driver ioctl. Each rendering node has a clip resource which is either The kernel manages a cache of bound and recently valid or invalid. When valid, the graphics kernel driver bound windows. The cache, known as the pane cache, has up-to-date window clipping parameters in the pane allows OpenGL threads to quickly bind and rebind to cache for the window that can be loaded into the rendering windows. If the window being bound to is not found node’s graphics hardware context (either a set of clip in the pane cache, a currently unbound pane cache entry rectangles or a CID value to use). When the clip resource is selected and reused for the window, and a message is is valid and there is no other reason for the pipe to be sent to the X server notifying it that a direct renderer is invalid, the kernel graphics page fault handler validates interested in rendering to the specified X window ID. This the graphics pipe pages after loading the correct clipping message allows the X server to initialize data structures information into the graphics hardware context. for the window to keep track of direct rendering. If the rendering node’s clip resource is invalid, the A new entry in the pane cache will have two important faulting process is suspended and a clip validate mes- resources marked invalid. The first resource is the clip re- sage is sent to the X server. The X server receives the source. When valid, this means the X server has properly message through the shmiq and issues a board manager informed the kernel of the proper window origin and clip reserved “clip validate” ioctl to the graphics driver in- rectangles or CID to be used for clipping rendering to the forming the kernel of the correct clipping parameters for window. The second resource indicates if the window is the window. When the pane cache entry for the win- assigned a swappable DID. dow is updated with the new clip information, then any processes waiting for the clip resource of the window to 4.2 Context Switching be validated are awakened. The result is that a graphics process can asynchronously have its clip updated without Access to the graphics pipe is mediated using virtual any knowledge on the part of the process. An example of memory techniques. The graphics pipe is physically the clip validation process is described in Figure 2. mapped to only one process at a time. Other processes Clip resources are possibly invalidated by the X server using the graphics pipe will have invalid virtual mem- whenever the window tree is manipulated by X window ory mappings for their pages corresponding to the pipe. management operations. The X server tracks which win- If a process without valid mappings accesses a pipe ad- dows are being used for direct rendering, and if the clips dress, a page fault is generated. The kernel graphics of any of these windows change the X server uses a board driver handles this page fault. If the graphics pipe pages manager reserved “clip invalidate” ioctl to tell the ker- can be physically mapped immediately (there are reasons nel to invalidate the clip resource of any rendering nodes access to the pipe might temporarily be denied that are bound to the windows. The invalidation of a clip happens discussed later), the kernel will save the graphics context asynchronously to the direct rendering program and can of the current process and mark that process’s graphics happen at any time the X server needs it to. The next pipe pages invalid. Then the kernel restores the graphics time a direct rendering program attempts to render to the hardware context for the faulting process and validates window, the clip validation process ensures a new, correct that process’s memory mapping for the pipe. clip is loaded into the direct renderer’s graphics hardware The context switching sequence is transparent to the context. processes involved. If a single process is using the pipe, The other reason the clip resource of a window might the pipe does not need to be context switched. Graphics be invalidated is if too many windows that need CIDs as- pipe context switching only occurs when there is con- signed to them are being directly rendered to. Remember tention for the pipe. Preemptive scheduling by the kernel there are a limited number of CIDs in the hardware. The ensures graphics processes get a fair allocation of time to X server can virtualize clips by invalidating the clip of access the graphics pipe. On multi-processor machines, one window assigned a CID, reusing the CID by repaint- the scheduler needs special modifications to prevent two ing another window with the newly freed CID, and then simultaneously scheduled graphics processes from con- validating the clip of a window and assigning it the newly tinually stealing the graphics pipe from each other. reassigned CID. The X server makes no attempt to handle thrashing or starvation due to repeated CID invalidations 4.3 Clip Validation and validations, but in practice, because most windows use clip rectangles, CID thrashing is not a problem. As hinted above, the graphics pipe mapping is not Notice that clip validation happens lazily. Not until always immediately validated when a process faults on a a direct renderer actually touches the pipe does a clip
  • 8. GL IRIX X A) GL program with invalid clip resource faults when accessing program kernel server graphics registers. B) Kernel trap handler determines fault caused by invalid clip for the B rendering node. A C C) Message put in shmiq telling X server to validate the rendering node’s clip. D) X server generates a clip list for the rendering node’s window. D E) X server performs ioctl to inform kernel of new valid clip list. F) Kernel updates the rendering node to reflect its new clip, validates the node’s clip resource, maps in the graphics registers, and G E restarts the program where it stopped. F G) GL program continues running with no knowledge of the interruption. Figure 2: RRM clip validation assisted by the X server. validation begin. Often when clips on the screen change, this, the graphics driver invalidates the “allow rendering” they change repeatedly (a window manager using opaque resource and invalidates the virtual memory mapping to move is a good example of this). So it makes sense to the graphics pipe. Any further access to the pipe by the validate clips only on demand. rendering node will stall the process until the buffer swap The pane cache minimizes clip validation costs when completes. When the buffer swap completes, the “allow an OpenGL process repeatedly binds and rebinds to dif- rendering” resource will be revalidated so rendering can ferent windows (an operation expected to be common for continue. OpenGL programs). A rendering node can be bound to a Because which buffer is front and which is back is win- previously bound window, and if the window is still in the dow state shared by all OpenGL direct renderers bound to pane cache and the clip is still valid, immediately have a the window, the kernel will also update the absolute sense valid clip resource. of what buffer is front and back for any other rendering nodes bound to the window being swapped. The graphics 4.4 Fast Buffer Swaps hardware provides a means to switch what buffer is the front and back buffer without the knowledge of the direct A continuously animated application swaps buffers fre- renderer. quently enough that the operation should be optimized. The advantage of not immediately stalling the process As explained earlier, SGI implements a buffer swap by until the buffer swap completes is that most animation toggling the display mode table entry for the unshared applications have a certain amount of computation to do DID assigned to the window during vertical retrace. before the next image (typically called a frame) can be If the glXSwapBuffers command is called on the rendered. By not immediately stalling the process, this currently bound window, the buffer swap is considered to computation can be overlapped with waiting for the buffer be in the OpenGL command stream. This allows a direct to swap. renderer to provide a buffer swap without contacting the X server. The graphics driver provides a “buffer swap” 4.5 Window Display Mode Validation ioctl which can be issued by direct renderers. The result is to schedule a buffer swap at the next vertical In the discussion of buffer swapping so far, it was retrace for the thread’s currently bound window. If the assumed that the window to be swapped was indeed on glXSwapBuffers command is not for the currently an unshared DID. Since DIDs are a limited hardware bound window, the OpenGL library generates a GLX resource, this may not necessarily be true. In this case, protocol request to swap the buffers and lets the X server DIDs must be virtualized. perform the buffer swap. Similar to a rendering node’s clip resource, rendering The ioctl returns immediately. This is good because nodes also have a “swappable window” resource. The waiting for the vertical retrace could cause a delay equal resource is valid if the X server has placed the window to the vertical retrace interval (typically at the rate of 60 on an unshared DID. It is invalid if the window’s DID times per second). But further drawing to the window is shared by other windows or the X server has revoked must be held off until the buffer swap completes. To do the ability to swap (this need is made clear when mixing
  • 9.  ¡ ¡ ¢ ¡ ¡ ¡ ¡¡¢¡¡¡     ¡ ¡ ¢ ¡ ¡¡    ¡ ¡   £ ¡ ¢ ¡£¡£¢ ¡¤¢¤¡ ¡ ¤¤   £ ¢ ¡ ¢¡¢ ¡   ¡££  ¤¡   £  £¡£ DID 0 8−bit ColorIndex port these buffers in software by allocating host memory. 1 £¡ ¤£¡ ¤£ £¡£¡ £ 2   ¡¡¢¡¢¡  ¤¢¤¡£¡£¡¡ 3 ¢¡¢¡¢¡  £¢£  ¤   £¡  ¤¡ ¤ ¢ £¡ £ £¡£¡ £4 ¡ £¤¡£¤¤ DID 1 12−bit RGB double buffered, buffer A visible (shared) GLX requires the contents of ancillary buffers to be shared between renderers binding to a window and the £           ¤ ¡ DID 2 24−bit RGB contents of these buffers to be retained even when no  ¡¡¢¡¡¡  renderers are bound to the window. For hardware buffers, these requirements are typically straightforward to meet Now a glXSwapBuffers happens on window 2... since the ancillary buffers exist in the hardware frame  ¡ ¡ ¢ ¡ ¡ ¡    ¡¡¢¡¡¡    ¡ ¡  £ The X server must: buffer.   ¡ ¡ ¢¥ ¡¥ ¡ ¡ ¡ ¡ ¢¡¢ ¡   1 £¢¤¢¤¡¥ ¡¥ ¡ ¡ ¢ ¢ ¡¡¢ ¡ ¤¢¤¡ ¡ ¤¤¤  ¥¥ 3 ¡££ ¤¡ ££ ¤  ¡£ ££¡ DID 0 8−bit ColorIndex DID 1 12−bit RGB double buffered, OpenGL indirect rendering could easily allow ancillary £ ¡¡¢¡¢¡  2 £¡ ¡ £ ¥ ¡¥ ¡ ¥ ¤ ¡ ¤£ ¥¡¥¡ ¥ ¢¡¢¡¢¡  £¡  ¡  4 ¤ ¤¡ ¤¡¤¤¥ ¥¥¡ ¡¥ buffer A visible buffers to be shared between renderers since all the buffers DID 2 24−bit RGB would exist in the X server’s address space and the X  ¢£ £ DID 3 12−bit double buffered, server has immediate knowledge of the changing state of buffer A visible (unshared) windows. Combining direct rendering with retained, shared soft- Figure 3: Example of a glXSwapBuffers being per- ware ancillary buffers is difficult to achieve without formed on window 2 which shares the same display ID compromising performance. The SGI direct rendering (DID 1)with window 4. Before the buffer swap can be OpenGL implementation does not currently support the performed, the X server must rewrite the display IDs such correct sharing of ancillary buffers between renderers in that window 2 is on an unshared display ID (DID 3). different address spaces. Each OpenGL library instance allocates software ancillary buffers for its own address OpenGL and X rendering is discussed). space. These buffers can be shared between renderers If a process issues the “swap buffer” ioctl and the in the same address space. Also, the contents of these process’s current rendering node does not have its “swap- buffers are retained only for the lifetime of the address pable window” resource valid, a message is sent to the space. X server requesting the X server place the window on an Incorrectly supporting software buffers is strictly unshared DID and validate the rendering node’s “swap- speaking a violation of what OpenGL requires. But few pable window” resource. The X server complies with programs rely on sharing buffers across address spaces. the request and validates the “swappable window” for Sharing software buffers is an area where SGI’s OpenGL the window via a board manager reserved ioctl. Once implementation does not properly isolate window state validated, the buffer swap can then be scheduled. from rendering state. Further work needs to be done to As in the virtualizationof CIDs, unshared DIDs may be support ancillary buffer sharing. stolen from windows already on an unshared DID to val- One problem that must be solved is the de-allocation of idate the resources of rendering nodes attempting buffer software ancillary buffers when windows are destroyed. swaps. When a DID is stolen, the window previously The OpenGL library has no obvious way to find out when on an unshared DID will have to find another window an X window it is maintaining software ancillary buffers to share a DID with that has the identical display mode for is destroyed so it can know to deallocate those buffers. (because a window should never be assigned a DID with Since the buffers tend to be quite large, leaking ancillary a display mode not matching the correct display mode buffers is extremely expensive. for the window). Forcing a window to share a DID with another window may force that other window to have its SGI solves the problem by adding a private extension “swap buffer” resource invalidated since it might have to the X server that requests the X server to generate a previously had an unshared DID allocated to it. Figure SpecialDestroyNotify when a specified X win- 3 is an example of a window needing to be assigned an dow is destroyed. The first time an OpenGL rendering unshared DID. context is bound to a window, this request is made for the new window. Hooks in the X extension library (libX- 4.6 Shared Software Buffers ext) allow SpecialDestroyNotify events to trig- ger a callback into OpenGL to deallocate the associated OpenGL’s GLX specification requires implementa- software buffers. The event is never seen by an X pro- tions to support various types of ancillary buffers. When gram. Other mechanisms such as using the standard X there is no hardware support for these various types of DestroyNotify event proved unreliable since the X buffers, OpenGL implementations are expected to sup- client might not be selecting for that event.
  • 10. 4.7 Cursors  ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡                                 Logically, a window system cursor “floats” above the Normal  ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡   ¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡   ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡  windows on the screen. Standard dumb frame buffer graphics hardware for window systems requires special ¡¢¡¢¡£¢¡¢£¡£¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡  plane £¡£¢£¡£¢££¡¢£¡££¢¡£¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ £¡£¢£¡££¢¡£¢££¡¢£¡¢ ££¡ ££¢ ££¡ ££¢ ££¡ ¡ £££                 window £¡ software support for managing the window system cursor. £ £ £ £ £ £ £ £ ¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡  £ £ £ £ £ £¡ £  ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡  The standard “software cursor” technique [13] used to ¡¢£¡¢¡¢£¡¢¡¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡  £¡££¢¡£¢£¡£¢£¡£¢£¡£¢ ¡ ¢ ¡ ¢ ¡ ¡ ££                 ££¡¢£¡£¢£¡£¢¡£¢£¡£¢£¡£¢£¡£¢£¡£¡£ £¡£¢£¡£¢£¡£¡ Overlay manage the window system cursor is to save the pixels plane under the cursor when the cursor is rendered. When the £¡£¢£¡£¢£¡£¢£¡£¢£¡£¢¡¢¡¢¡¢¡¢¡¢¡¢¡¢¡   ££ ££ ££ ££ ££ ££ £¢£¡£¢£¡£¢£¡£¢£¡£¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡  ¡ ¢ ¡ ¢ ¡ ¡ ££                 window cursor’s image might interfere with rendering or frame £ ¡£ £ £ £ £ £ £ £ £ £¡£¢£¡£¢£¡£¡ £¡£¢£¡£¢£¡£¢£¡£¢£¡£¢¡¢¡¢¡¡£ £ £ £ £ £ £ buffer read back, the cursor must be undrawn (restor- ing the saved pixels) and redrawn on completion of the viewable clip drawable clip rendering or read back. The undraw/redraw technique described above is rea- Figure 4: The difference between the drawable clip and sonable if the window system server is the only ren- visible clip of a window occluded by a window in the derer and the window system server manages the cursor. overlay planes. But using direct rendering, asynchronous direct renderers need to render into windows containing the cursor but do not have immediate knowledge of the cursor to utilize underlays can be thought of as having multiple, stacked the undraw/redraw technique. Techniques for integrating frame buffer layers. software cursors with direct rendering have been imple- As mentioned previously, OpenGL treats windows in mented [2], but they require a graphics hardware locking the overlay and underlay planes as first class windows strategy that is incompatible with SGI’s goal of “maxi- in the X window hierarchy which is the convention for mum performance” rendering. handling overlays in X [4]. IRIS GL had a simpler notion The alternative to a software cursor is hardware sup- where frame buffer layers all existed in a single win- port for a cursor. Normally, this consists of support in dow spanning each frame buffer layer. Windows in non- the video back end that merges in the cursor image into normal layers are just like other windows excepting trans- the video output. Using a hardware cursor eliminates parency effects and potentially fewer expose events being both the rendering overhead and flicker of the software generated. technique. All SGI graphics hardware supports hardware The most important insight into the support for frame cursors, thereby decoupling direct renderering from win- buffer layers in a window system is that the drawable clip dow system cursor management. and the visible region of a window are no longer always identical. Figure 4 demonstrates this point. 5 Other Issues SGI found that its older hardware which supported a single layer of integrated CIDs and DIDs made it impos- There are other issues that do not relate directly to sible to support direct OpenGL rendering into the overlay supporting virtualized, direct access rendering, but that planes. This old hardware is sufficient to implement IRIS still are important to the implementation of SGI’s direct GL’s simpler model for layered frame buffers, but a single rendering support. layer of DIDs combined with CIDs makes it impossible to perform CID clipping for overlay plane windows while keeping the display modes correct for the normal plane 5.1 Overlays and Underlays windows. OpenGL supports overlay and underlay planes. Over- The current high-end SGI hardware supports separate lays are frame buffer image planes that are displayed DID information per frame layer to solve this problem (to preferentially to the normal frame buffer image planes. support multiple display modes for the overlays). With A special transparent pixel value can be used to “show separate CIDs and DIDs, the CID planes can generally be through” to the contents of the normal planes. Underlay used for both overlay and normal planes clipping. Using planes are like overlays but are displayed deferentially CIDs for clipping does not change how the display modes to the normal planes. Overlays and underlays are useful are arranged. But using CIDs for both overlay and normal for text annotation, rubber banding, transient menus, and planes clipping could contribute to CID thrashing since animation effects. While a simple frame buffer has a sin- the drawable region for an overlay window might overlap gle layer, graphics hardware supporting overlays and/or the drawable region of a normal plane window. Separate
  • 11. clipping planes for each frame buffer layer could remedy Ensuring X rendering always goes into the front buffer the problem, but CID thrashing due to sharing clipping cannot be done using the relative access allowed through planes between layers has not proven to be a problem in rendering nodes. Instead, the X server makes sure it practice. renders into the correct absolute buffer (buffer A or B, as opposed to front or back). 5.2 Synchronization Issues But as discussed before, buffer swaps can be scheduled by direct renderers without the attention or knowledge of GLX treats the OpenGL command stream and the X re- the X server. If the X server has validated the swappable quest stream as two independent sequences of commands. DID resource of a window, it can no longer ensure render- These streams may execute at different rates. GLX sup- ing goes into the front buffer. The X server could query ports glXWaitX and glXWaitGL that allow the X and the hardware to determine state of the buffer. Query- OpenGL command streams to be explicitly synchronized. ing the current front buffer would require a query per- glXWaitGL prevents subsequent X requests from exe- rendering operation. Worse yet, the query’s information cuting until any outstanding OpenGL commands have is only valid for the instant of the query. completed. glXWaitX prevents subsequent OpenGL If the X server wants to render into the window of commands from executing until any outstanding X re- a double buffered window that has its swappable DID quests have completed. When using indirect rendering, resource validated, the X server invalidates the unshared these calls force their appropriate sequentiality without DID resource for the window. A graphics driver ioctl the cost of a round-trip. These calls can be thought of as is used to perform the invalidation. This invalidation synchronization tokens actually embedded in the X pro- accomplishes two things: tocol stream. When direct rendering, a glXWaitX does require a XSync to ensure an X requests have completed.   Revokes the permission of direct renderers to swap the buffer. Further attempts to swap the window will be held off and result in a message to the X 5.3 Correct Front Buffer Rendering for X server to revalidate the unshared DID resource. The Support for double buffering introduces a new compli- swap will be delayed until the X server revalidates cation to mixing non-OpenGL X rendering with OpenGL the resource. rendering for SGI. When X rendering is done to an X win-   And, returns the current state of the buffer. dow, the render operations should affect the front buffer of the window. So X rendering must be directed into the While this operation is heavy-handed, it lets the X server correct relative buffer, i.e., the front buffer. render correctly into the front buffer. With stable knowl- edge of which absolute buffer is being displayed, the X As discussed earlier, rendering nodes virtualize the rel- server can render correctly. ative access to front and back buffers when using double The overhead of the scheme is minimized because buffering. But the X server renders all non-OpenGL ren- the X server only checks if it needs to revoke a dou- dering to all windows through a single rendering node. ble buffered window’s shared DID resource during the The X server does not rely on its rendering node for cor- validation of a graphics context (GC) and window. Se- rect window clipping or to determine correctly the front rial numbers determine when the window and a given or back buffer. In part this is because the X server does GC are validated with respect to each other. When the not bind to particular windows the way OpenGL does. shared DID resource is validated, the serial number of the Because the X server has full knowledge of the window window is updated with a unique value, forcing any pre- tree, it does its clipping in software (and sometimes with viously valid GCs to become invalid. Any X rendering hardware assist). The use of virtualized clipping within will then force a GC validation, at which time the shared the X server is difficult for two reasons: DID resource can be revoked.   The X server renders to many more windows than OpenGL programs do. Using virtualized clipping 5.4 X Server Multi-rendering for Indirect Ren- would quickly exhaust the hardware clipping re- dering sources. Indirect rendering is still required by the GLX spec-   Because the same thread in the X server does core ification so even the best direct rendering support still rendering as does the validation of virtual clip re- requires that indirect rendering be supported. SGI’s sources, use of virtualized clipping would deadlock OpenGL implementation treats indirect rendering as a the X server. special case of direct rendering.
  • 12. Independently scheduled threads within the X server [6] Chandlee Harrell, Farhad Fouladi, “Graphics Ren- execute indirectly rendered OpenGL commands. A dering Architecture for a High Performance Desk- distinct thread is created for each X connection using top Workstation,” SIGGRAPH 93 Conference Pro- OpenGL indirect rendering. This thread within the X ceedings, August 1993. server’s address space executes the same OpenGL ren- dering code that direct renderers use and utilizes the same [7] Chandrasekhar Narayanawami, et.al., “Software system support for direct rendering. One can think of the OpenGL: Architecture and Implementation,” IBM OpenGL rendering threads within the X server as prox- RISC System/6000 Technology: Vol. II, 1993. ies that execute OpenGL commands on behalf of an X [8] Phil Karlton, OpenGL Graphics with the X Window client using indirect OpenGL rendering. The OpenGL System, Ver. 1.0, Silicon Graphics, April 30, 1993. rendering threads within the server only coordinate with the main X server thread to hand off commands to exe- [9] Mark J. Kilgard, “Going Beyond the MIT Sample cute and return results. Otherwise, these threads do not Server: The Silicon Graphics X11 Server,” The X manipulate any X server data structures. This technique Journal, SIGS Publications, January 1993. is called multi-rendering and is discussed in greater detail in [10]. [10] Mark J. Kilgard, Simon Hui, Allen A. Leinwand, Dave Spalding, “X Server Multi-rendering for OpenGL and PEX,” The X Resource: Proceedings Acknowledgments of the 8th X Technical Conference, O’Reilly & As- sociates, Issue 9, January 1994. Much of the understanding for supporting direct ren- dering of OpenGL came from the implementation of its [11] OpenGL Architecture Review Board, OpenGL Ref- predecessor, the IRIS GL. The work of Jeff Doughty, John erence Manual: The official reference document for Giannandrea, Luther Kitahata, Paul Mielke, Jeff Wein- OpenGL, Release 1, Addison Wesley, 1992. stein, and others on IRIS GL’s resource management laid [12] Desi Rhoden, Chris Wilcox, “Hardware Accelera- the basis for the facilities described here. We would like to tion for Window Systems,” Computer Graphics, As- acknowledge all those at Silicon Graphics who helped en- sociation for Computing Machinery, Vol. 23, Num. sure OpenGL’s successful implementation, notably Kurt 3, July 1989. Akeley, Peter Daifuku, Simon Hui, Phil Karlton, Mark Stadler, Paula Womack, and David Yu. [13] David Rosenthal, Adam de Boor, Bob Scheifler, “Godzilla’s Guide to Porting the X V11 Sample Server,” X11R5 Documentation, Massachusetts In- References stitute of Technology, April 12, 1990. ¡ ¢  [1] Kurt Akeley, “RealityEngine Graphics,” SIG- [14] Mark Segal, Kurt Akeley, The OpenGL Graph- GRAPH 93 Conference Proceedings, August 1993. ics System: A Specification, Ver. 1.0, Silicon Graph- ics, April 30, 1993. [2] Jeff Boyton, et.al., “Sharing Access to Display ¡ £  Resources in the Starbase/X11 Merge System,” [15] Silicon Graphics, “Indy Graphics,” Indy Techni- Hewlett-Packard Journal, December 1989. cal Report, Ver. 1.0, 1993. [16] C. H. Tucker, C. J. Nelson, “Extending X for High [3] Bradley Chen, “Memory Behavior of an X11 Win- Performance 3D Graphics,” Xhibition 91 Proceed- dow System,” USENIX Conference Proceedings, ings, 1991. January 1994. [17] Doug Voorhies, David Kirk, Olin Lathrop, “Virtual [4] Peter Daifuku, “A Fully Functional Implementa- Graphics,” Computer Graphics, Vol. 22, Num. 4, tion of Layered Windows,” The X Resource: Pro- August 1988. ceedings of the 7th Annual X Technical Conference, O’Reilly & Associates, Issue 5, January 1993. [5] Edward Haletky, Linas Vepstas, “Integration of GL with the X Window System,” Xhibition 91 Proceed- ings, 1991.