Teradata's Dan Graham, , presentation from the 2010 Teradata User Group meetings on Active Data Warehousing over the last five years.
For more information on Active Data Warehousing, please visit Teradata.com
3. Six Active Elements Active Access Active Events Active Workload Management Active Data Warehouse Active Availability Active Enterprise Integration Active Load
4. Active Access Web speed inquiries by front line employees, consumers, partners
5.
6.
7. Active Load and Web Services for Self Service Staging Tables Active Data Warehouse ETL/ELT Hourly Mini batch “ Where’s my check?” SQL Server Main frame WebSphereMQ CSR Consumer
8. Applying for Credit: Before Document archives signed contract release money 3 Customer customer data gathering paper application Credit Committee Decision maker 2 Decision maker 1 paper decisions Decision + Terms 1 prepare contract Contract paper contract accept proposal 2 6 working days, 240 Euros
13. Near Real-Time Data Load Staging Hourly Multi-load Claims Administration 800/hour WebSphereMQ CSS Mart 24/7 Data Warehouse Nightly Batch loads Web macros Claims Payments 250K requests/day .01 second Admin dept
26. Track and Trace Architecture Data Warehouse TPump Access Module Transportation Systems External Portal Internal Portal Application Server Postal System: Item collection, posting, delivery information (27 province centers, 64 city centers) Sort Dispatch Centers Continuous Load Internet Intranet TIBCO Enterprise Message Service
28. Active Requires HA -- Some add FT or DR MPP system appl. servers Disaster Recovery DW off site Fault Tolerant DW DW sync primary site High Availability DW sync
29. When to Invest in DR or FT Availability Many cash transactions Majority of Active Applications Call centers Partner report portals Track and Trace Fraud detection High Availability Fault Tolerant or Disaster Recovery $ lost during downtime eCommerce eCommerce Passenger rebooking Labor Scheduling Out of Stock Online Mortgages IVR routing eBanking Claims triage Defect monitoring alerts
Teradata prescribes that there are six areas the architect or DBA should consider when building an Active Data Warehouse. Some may need a lot of attention, others very little depending on the state of your data warehouse and the application being built. Three of these elements when implemented cause the enterprise data warehouse to become active. Active Access – 1-5 second response time tactical queries, Active Load – loading fresh data into the data warehouse in seconds, minutes, or hours from data capture in the operational systems Active Events – the use of business activity monitoring, complex event processing, or simple triggers to respond to an “alert” event in near real time In order for the Active Data Warehouse to function correctly, the DBA or architect should review and enhance one or more of the following functional areas of the data warehouse: Active Workload Management – this is the key technology that enables active data warehousing. Its mandatory to ensure reliable response time for inquiries while loading data and running reports. Active Integration – the architect and programmer may have to connect various pieces of existing middleware to the Teradata Database software to enable the activating elements. Active Availability – Operational applications often demand higher levels of availability because so many more people are using it. A review and enhancement to the configuration or operational procedures is mandatory.
This started out as a proof of concept project as a complex OLAP analysis connected to the call center. The company is an office supply retailer with a huge B2B business. The active application was being used to help the call center agent understand the caller’s buying behaviors. The objective was to quickly understand customer share of wallet in order to help the CSR sell more goods. The call center application needs to show customer profile, risk profile, and a few other complex analysis on screen, but none of this data is dependent on the other display section. The displays were built similar to portlets or mashups or a dashboard where each block of space on screen holds a different analytic insight. What the customer wanted was clearly real time responding to up to date information coming in Just to create one display screen of information required about 10 pages of SQL. They were using Cognos ReportNet to handle these queries. Having OLAP in an active environment is unique but when you think about it, highly useful. Their database design is very well designed and their AJIs were doing a lot of the OLAP work. Originally there were 5 queries to bring up a screen doing ranking and windowing. It was taking 2 minutes to run reports this way when they needed the response in seconds. The middle tier BI Server was a big part of the performance problem. The customer’s IT architect also wanted to integrate the queries directly into the website application. Teradata PS installed a rough prototype in mid-December 2009. The prototype ran the functions in sub-second response time. The next step was to convert the prototype into web services. We developed a design pattern for access that allows multiple SQL statements to run in parallel instead of serially. So multi-statement SQL is a key differentiator for this using JDBC. These could be macros but there are some OO interfaces activities and result set complexities that drove us to build a "utility" capability for creating and running the SQL in this application. This can be a big differentiator for ADW applications where you are displaying 3-7 different portlets/mashup/dashboard blocks on the user's screen. Teradata has the unique ability to put multiple SQL statements into an open session or macro and run them all at once. Normally, each statement is submitted and processed serially. But with the Teradata Database, they can all arrive in the same transmission packet and the parsing engine will work on all of them at the same time on behalf of the single requestor. So, in this case, 5 complex requests began execution almost simultaneously as the PE finished optimizing them. Then, each SQL statement ran in parallel. So there is STATEMENT parallelism and within each statement, AMP parallelism at work. As a result, the call center agent would not have to wait for one SQL statement to complete before the application submitted the next one. So three things improved performance in this case: Removal of the BI reporting middle tier server, 5-statement parallelism, and the reduction in network packet exchanges (TCP/IP costs are not trivial).
08/25/10 When we first put this in, the VP of the contact center was predicting a drop in head count. But his call volume went up with the addition of the web services over the 1st 3-4 months. We didn't know why. We figured out that call volumes in general were going up. We implemented this in March of 2006. In 2007, we saw a trend of call volume going down as the web services increased. So it took 12-18 months for the claimants interaction with the insurer to evolve. Remember these are employees who know Unum only via their HR department. So the direct interaction with the insurer is often a first encounter. So many claimants want to talk to someone the first time. Even when we went to online billing many years ago, a similar reluctance to let go of a human voice was hard at first. But claimants eventually find the convenience of web or IVR to be helpful. Here we see: Mini-batch loads WebSphere MQ feeding mini-batches Staging tables used throughout the day in ELT mode Web services delivering information to the call center and web site. WebSphereMQ runs on MVS as well as AIX. There are queues from the operational systems that also connect to TPump. They use a named pipe access module to connect to MQSeries. It reads the rows off the queues and puts them into the ADW. One application updated 9 different tables and had performance problems. So now that data is put into a Teradata queue table through TPump. A stored procedure empties the queue table. Fallback on the queue table for HA. This design can survive many failure scenarios. MQSeries sometimes has its own challenges and the queue table is able to help compensate for MQ. Latency with Tpump is in seconds or minutes
This was the process for processing 900,000 loans annually before automating with Teradata Notice that the “snail mail” could add several days to the 6 working days required by the bank to follow theprocess.
08/25/10 Mini-batch is the most popular because its easily deployed and the most efficient use of resources. By loading small amounts of data –let’s say 5,000 records or 10 megabytes -- throughout the business day, we can use FastLoad, MultiLoad, or even BTEQ to keep fresh data moving into the data warehouse. Its highly efficient since we use bulk data loading instead of row-at-a-time methods. There are two keys to making this work: paying attention to database locking mechanisms so a mini-batch does not disrupt other activities and the use of Teradata Active System Management. In this design, it is possible to use traditional ETL tools for transformations. Replication is increasingly popular. Teradata Database is well integrated with GoldenGate Software which was acquired by Oracle in 2009. Replication tools do log sniffing – that is they watch the OLTP database redo journal for updates and when they see them, they capture the changes and propagate the update to other databases. GoldenGate can capture updates from many different vendor products and propagate them to multiple downstream data repositories. Note that replication tools do not usually have much capability to transform the data they capture and distribute. It is possible to maintain latencies that are measured in seconds – a few Teradata sites do this now. But its better to set the expectation of latencies in the 10-120 second range since there are many factors affecting the replication server speed. Generally, you should not use replication unless you have a solid business reason for up-to-the-minute updates. Streams are continuous feeds of data from OLTP systems and devices, typically sent over message oriented middleware (MOM). The more common MOM software is Java Messaging Service (an open standard) or IBM’s WebSphereMQ. When loading these message queues of data into Teradata, we use the STREAMS operator in Teradata Parallel Transporter which is the modern version of TPump. In this design, the source system software agent can do some transformations before sending the updated data to the queue or a programmer can add an “access module” to Teradata Parallel Transporter to do transformations. In this way data cleansing and adjustments to the raw data can be done. Generally, you should not use streams unless you have a solid business reason for up-to-the-minute updates. Typically these three techniques load their fresh data into staging tables. This allows us to control record locking and to apply transformations to reformat and clean up the data before putting it into the production tables. The ELT design has become popular very rapidly for many reasons. The primary reason is that the parallelism inside the Teradata SQL processing can reformat and load the data dramatically faster than traditional ETL. When using these techniques, be aware that they consume from 1%-5% of system resources depending on the incoming data volumes, frequency, and data transformation complexity.
JMS Access Module Reads/writes messaging providers via JMS Standard messaging API Same capabilities as MQSeries AXSMod, plus: MQSeries, TIBCO EMS, Oracle AQ, BEA JMS, SAP XI Supports Topics including durable subscriber Enables dual load Performance enhancement: control commit frequency
The data warehouse has evolved into an ELT architecture where data is loaded to the Teradata platform where it is transformed and loaded into integrated tables. The data architecture is a 3-tier architecture consisting of: A Staging layer that acts as a landing place for legacy source data. The format and content of the data is identical to its source attributes. An Integrated Data layer. The integrated layer is data domain oriented and is source system neutral – the tables within a domain will store the data from all of the operational sources for that domain. Staging data is cleansed, integrated and rationalized before being loaded to the Integrated layer. The Integrated layer also contains the key structures to connect data across data domains. A Presentation layer. This is a collection of dimensionally modeled facts and dimensions, and also some non-dimensional presentation assets. As claimant employees visit the web site, they issue macro requests via web services to the operational data store. The CSS Mart (ODS) is inside the Teradata System so the system requires the same level of TASM administration as if it were in a single data warehouse. The key difference is this design eliminates all row and table locking considerations. As the claimant employees issue macro requests, their updates are captured in the operational tables (ICLMNT). The claimant never directly updates the CSS mart. . When this happens, a message is sent to the claims administration system notifying them that the consumer has an update and it should be reviewed. The claimant updates as well as the people in the claims administration department both cause updates to propagate into the CSS Mart itself. This ensures that payments or status changes are available to the online visitor within an hour of any status change. Hourly mini-batch using Multiload updates 8% of the columns in the database on any given day. The integrated layer model (EDW) would have caused extensive traversing to get answers for CSS. Insurance data models are incredibly complex especially as normalized as this one. Linking claims together would also have hurt the SLAs for the marts. The integration layer is loaded nightly. There is a load direct into the marts from Navlink via MQ. It collects data from the doctors, or you could add a member to a policy. Some of this new data gets fed back into the EDW layer. The data is only there to support the web application. So the application sends the updates to MQ as well as to the Teradata CSS Mart. It acts as a cache. A lot of the updates have to go back through Navlink for approvals, clean up, and posting in the correct places before it feeds back into the EDW a few days later. While this is waiting, the CSS Mart continues to show the consumer the new data. The mart also provides a mechanism to allow the data to always be available. The data in the mart is refreshed daily by a batch load job. A dual database is used so that when the batch job is running, clients can still access their data. That is, there are two CSS Marts, the one being used by the iServices application and the one being refreshed by FastLoad. Once the load process is completed, database access views are redefined and users are seamlessly given access to the freshly loaded data. This is an old DBA tric of loading the new database then flipping all the users over to it in a second.
08/25/10 Events are actions that happen in time and space. They appear to the programmer as messages, usually zooming by on the enterprise service bus. When a truck or airplane breaks down, a message is sent that can be used by an event driven application to determine next steps. There are many event driven applications in every industry. Banks have to watch settlements for being on time as well as normalcy. Retailers are always searching for out of stock conditions that prevent them from selling. Purchasing departments and stock traders are always monitoring the internet for big price swings on their commodity or equity. Event driven applications depend on sensors or applications to send out messages whenever something happens. It does no good to hear a fire alarm two days after the building burns down. So timely detection of an event is the first thing to do. Event driven applications tend to search for anomalies, for outliers. They don’t usually get applied to business-as-usual events which are normally handled by existing operational applications. Instead, they watch for odd or unusual events. This means filtering out the business-as-usual events and those that are within normal tolerances. For example, if we monitor a manufacturing machine for temperature, it has a range of degrees that are acceptable, a warning area, and then a temperature threshold that indicates malfunction. So, as with all event processing, thresholds, filters, and normalcy tests are applied in order to sift out the ones that need special attention. A vast majority of events are discarded because they are a not unusual. Once it is known that an event is unusual and significant, its time to to decide what to do about it. This is often accomplished with a business rules engine coupled with the Active Data Warehouse. The rules engine has the business process logic while the ADW contains the context and history of similar events. If the event happens once a week on Thursday at the same time every week, maybe its not unusual even though its out of normal tolerances. So the rules engine coupled with historical data helps determine the context of the event and the best next step to take. The next step can be simply sending alerts or emails to people who subscribe to that alert. Or it can be to invoke an application business process to handle the event. Most events come from the enterprise service bus as messages. These are often fed into BAM and CEP tools where the filtering and rules are applied, and alerts sent. The data warehouse provides the facts that these tools run on: normalcy ranges, thresholds, KPIs, etc. And it provides history when an event response is needed. There are some events that are detected only inside the data warehouse, specifically when data is added or changed. In this situation, triggers fire inside the data warehouse and alert messages may be sent out to invoke an application.
08/25/10 This is an example of an event driven application using gambling mach8ines and an Enterprise Service Bus to propagate event messages. In this customer example, a casino tracks consumers using Teradata Relationship Manager. TRM handles segmentation, campaigns, next best offers, etc. TRM knows how often the gambler visits, their spending threshold, their preferences, dining preferences, and of course all past transactions. Reaching out to them online via mail is just the first step in building loyalty. When the gamer enters the casino, they use their loyalty card to play a game, in this case a slot machine. Each time the gamer gambles, the information is sent via the ESB to a business rules engine (BRE). The BRE uses the Teradata information to determine if the gambler is approaching their spend limit per visit. With the current spend rate and the historical, they can determine if the gambler is having a bad luck day. If so, there is an alert sent to the “luck ambassador” in the casino. The luck ambassador is given some “offers” to use that are personalized to the gambler and their market segment. The luck ambassador goes to the gambler’s station – they use advanced visualization to pinpoint the gambler – and interrupts the gambler. The luck ambassador tries to get the gambler to “give up” gambling for a while with a free dinner, theater tickets, anything the casino has a no-cost supply of. The objective is to always let the gambler spend all their money yet always go away happy. Happy gamblers come back. However, if the gambler won $20,000 just a month ago, perhaps we should not interrupt their losing streak. So having both the historical and current events in the same database and application helps make the right decision in real time. This custom application is now installed in multiple casinos using Teradata Active Data Warehouses.
Teradata Active System Management (TASM) is the key technology that makes it possible to handle the mixed workloads used in Active Data Warehouses. TASM helps ensure priority is given to the short 1-5 second queries so they do not get delayed by long running complex reports or data loading. By organizing the workloads in the system, a category of work can be given priority access to system resources so it performs in accordance with the needs of the corporation. Without this mixed workload management, the random arrival of queries and data loading tasks simply compete for resources inefficiently and often causing severe congestion and delays. This is a sample of TASM workload management prioritization by business function. Note that the call center and web site get high priority from TASM. This ensures they get prioritized above everything else. More specifically, you can imagine that the call center and web applications get 6 tries out of 10 to use the CPU and memory of the server. TASM is also organized to ensure that all tasks get a “fair share” of the system. This means that long running reports always make a little progress, even when higher priority tasks are arriving quickly. This prevents long running tasks from stalling as short fast tasks come into the system. The call center and web site gets high priority so they can meet the fast response times of the front line user. Short, medium, and long reports get decreasing priorities. The DBA staff always gets a high priority so they can troubleshoot problems. Notice that the Active Load workload gets a moderately high priority of 8 so that replication and TPump streams can get their job done without too much delay. TASM has the unique ability to flip all the priorities and controls based on time of day, day of the month, or some external event like a server node failure. This allows the priorities and controls to be changed automatically for day and night shift, end of month or end of quarter tasks, and to respond to system or external triggers. Again, TASM is the key enabling technology for Active Data Warehousing.
This presentation from Dell at Partners 2008 shows the benefits of adding throttles to their production workloads. Concurrent tasks dropped 70%, amp workers 51%, and query elapsed time 28%. All this adds up to faster queries and better use of the existing system. 08/25/10
Data Access Object (DAO) is an architecture design pattern for an abstract interface to a database or persistence mechanism. The DAO design pattern bridges between Object Oriented and Relational semantics, ensuring the proper protocols, data structure conversions, and interactions execute. They isolate application programs from the underlying physical implementation such that, in theory, you should be able to replace an Oracle database with Teradata by simply regenerating all the DAOs. DAOs also insulate an application programmer from the numerous and varied Java persistence technologies such as JDBC, JDO, EJB CMP, TopLink, Hibernate, iBATIS, Microsoft Linq, and others. Use of these various DAO tools may insulate the programmer so thoroughly that they do not need to learn SQL at all. So, a DAO can be built in many different ways, but it is still and architectural design pattern for accessing the RDBMS and bridging the OO-RDBMS chasm. DAOs are a best practice according to IBM, Sun, Microsoft, and others. Originally defined by Microsoft in the early 1990s, Sun later expanded their function and promoted them heavily to the Java community. The DAO design pattern typically requires a database key as an input. It executes these steps: Define return object, PreparedStatement object, and ResultSet object Create the prepared SQL statement Map prepared statement parameters passed to the primary key and other attributes Execute prepared statement via JDBC Check the result set for a returned row Map result set into a business object Identify empty result set (object not found) – if so, throw a DataAccessException Catch and handle unrecoverable database errors Close SQL statement and result set to prevent memory leaks Return the business object to the business service Data Transfer Objects (DTO) are a design pattern used by DAOs to transfer data between software application subsystems. The difference between DTOs and Business Objects or Data Access Objects is that DTOs do not have any behaviors except for storage and retrieval of its own data. DAOs and business objects can contain a lot of logic and functionality. Some practitioners feel DTOs are no longer necessary and should be avoided.
In this simplified view, we see the objects which interact to access data. In beige color are the components within each architecture layer. Immediately below these are the attributes (data) which are exchanged between the objects, if there are any. Below this are the functions or “methods” used to manipulate the data within the object. You will often hear of getters and setters which tend to be included in objects which contain data. A getter is a method that fetches the value of a specific property (like a column value). A setter is a method that sets the value of a specific property (updates the object’s hidden data). Eclipse – a powerful open source programmer workbench -- has a wizard that lets you pick which properties (roughly the same as column attributes) to create getter and setters for. Once identified, Eclipse automatically generates the method (the executable code) for you. By examining the attributes and functions in each object, we get a clearer idea of the tasks performed by the object. The business service object shown has attributes (parameters) identifying the session it needs and two data objects it may or may not ask to be executed. The business logic here is shown as function1 and 2. The business objects have multiple attributes (data fields) and accessed by the getter and setter functions. Note the query band data can come from the business service as a security parameter for logins. A better design is to use the ThreadLocalContext object as the holder of the query band information. Then methods like beginTransaction() can call upon that object to acquire the query band in real time rather than have the developer manually set the queryband on the Session Manager. The ThreadLocalContext object is a public class that can be initialized within a ServletFilter managed by the web application server running the sessions. Keep in mind that at any given time, there are 10s or 1000s of object instances running in the system, some with the same class characteristics. The business service logic may request one or more DAOs to access the same database. For each data source – typically an RDBMS or file – there will be one data source and one JDBC driver associated with it. Of course for each data source, there will typically be many login sessions whether they are pooled or not.
Here we see the Eclipse developer “work bench”. Many Eclipse displays look like this one. Since a majority of IDE tools are based on Eclipse, it should be no surprise that they all look like this display as well. With menus and icons across the top, there are three major sections of white space: one is the Main Editor (top middle) and two are called views. That’s right, views --which can be confusing when taking about “using an Eclipse view to create a database view”. Eclipse views contain their own menus and toolbar that only affect the items within that view. Typically the “view” panel on the left navigates through directories of projects, java objects, and classes. But in this case it is navigating a hierarchy of database objects. The panel at the top is called the “Main Edit Area” where source code is normally displayed. Multiple files an be edited simultaneously, for example 3 java files and one XML file. And the panel at the bottom is for debugging and results of a java object execution. Often, there are “view” areas on the right side as well. Using the “view” panel on the left, DTP (data tools platform) provides a DSE (data source explorer ) that displays a hierarchical view of database objects. The Teradata IDE plug-in “rides on top of” the Eclipse DTP plug-in to help navigate inside the Teradata server. Like all RDBMS vendors, we leverage the DTP functions in Eclipse, adding our own dialects and extensions. One of the first tasks the programmer does with DTP is set up a connection from Eclipse to the Teradata JDBC driver. When doing this, the Eclipse connection profile will ask for user ID, password, database name, and the JDBC driver name. Once connected to the Teradata Database, the user can view the list of schemas, and within a schema, the list of macros, stored procedures, tables, user-defined functions and types, and database views. The user can browse even farther within a table to view the list of columns, constraints, indexes, and triggers. The Teradata IDE plug-in extension to the DSE uses the DBC Views to obtain information about the database objects.
The Teradata IDE plug-in lets developers browse the Teradata “DBC” catalog of objects and manipulate many of them within Eclipse, this capability supports basic SQL programming tasks.
Here is a simple use of Active Data Warehouse data. While most of the value is derived from active loading data in near real time, there is also value produced by combining this data with history to find trends or operational problems. China Post Express Mail Service (EMS) built a sophisticated “Track and Trace” application with Teradata, TIBCO, and other suppliers. It handles approximately 800,000 pieces of mail per day while servicing more than 200 countries and regions as well as 2,000 cities within China. The objective of the system is to provide up-to-the-minute tracking of packages through the various pickup, handling, and delivery systems. The shipper as well as the receiver of a package is able to go to the EMS portal and using their waybill can determine where the package is at any moment. Beyond self service, this also allows China Post employees to trace packages when problems occur. Of course, there are numerous complex reports pulled from the Teradata by back office employees concurrent with the tactical queries to improve efficiency in various functional areas. Imagine a shipper calls and complains a shipment is not being delivered on time and is causing financial problems. A quick look up of today’s update on that waybill shows the shipment is indeed stalled. But a further analysis shows that several dozen shipments are similarly stalled. This combination of current and historical data shows the problem to be a high magnitude, prioritizing the employee’s objectives for the day and possibly leading them to the resolution. Otherwise, the complaint may have been treated in the business as usual fashion. EMS's system integrates data and business information from its branches, transportation and logistics network, and international co-operation and production systems to provide near real-time and more detailed delivery information to customers. In addition, it provides internal workers with accurate and on-time information to efficiently handle internal inquiries and processes external inquiries faster. The application provides an effective monitoring and supervision role of the production processes. It also helps standardize internal operation processes.
Once the data warehouse is delivering insights to front line users, its common for the end-to-end availability of the application to need review, enhancement, and maybe investments. The front line user often cannot wait a few minutes for a system or application outage to recover – the customer or opportunity may vanish in that period of time. The Active Data Warehouse and BI/ETL infrastructure must step up to providing availability commensurate with the application criticality. Not all applications and users need 99.99% uptime, in fact very few require it. But most EDW-BI sites have lagged behind their OLTP peer applications in basic HA. So understanding the real requirements for availability and the consequences of a failure is a mandatory step when activating a data warehouse. Skipping step this is dangerous. There are dozens of technologies, software features, and best practices provided by Teradata to ensure high availability regardless of which system you purchase. Unlike commodity blade servers, HA is built into every Teradata System. The majority of active implementations operate well with the standard Teradata HA capabilities. More than 90% of all Active Data Warehouses operate in the HA configuration. One caveat is that front line users perceive availability as an end-to-end experience. So just because the Teradata System provides good HA does not mean that the ETL server, BI Platforms, application code, staff, and web application servers will do the same. In most cases, an IT organization already has good systems and procedures in place for the web application servers and networks. However, a thorough review of the active application, ETL process, and BI tools is often needed when activating a data warehouse. In most cases, the result is numerous changes to operational procedures. Some active applications are truly mission critical – a lot of money is at stake if an outage occurs. If your intent is to build an eCommerce web site for example, downtime means lost money and poor customer satisfaction. When the outage is shown to reach a serious threshold of cash losses, Teradata Active Data Warehouse sites have invested in duplicate Teradata Systems. Fault Tolerant configurations occur when two Teradata Systems support the same application from one physical installation. If one system should fail and go into restart recovery, the other handles incoming requests. Note there is a requirement to synchronize the data between the two systems in real time as well as in batch processing. If the redundant system is placed in a remote location, the availability requirement emphasizes business continuity after a disaster instead of real time performance. A Disaster Recovery design is similar to the Fault Tolerant design but has performance implications regarding synchronizing over long distance networks. Its important to repeat that most Active Availability implementations are of the HA variety where the IT organization “hardens” their operational procedures, BI platforms, and ETL platforms. Activating the data warehouse does not have to be expensive.
If cost was no barrier, all applications would be 100% fault tolerant with a disaster recover backup. But since these solutions tend to operate duplicate hardware and software while introducing labor and operational complexity, its best to first consider the effects of downtime on an application. Simply said, what amount of money is lost if the system is offline for 5 minutes? Ask the same question for an hour? For a day? Obviously we never like the answer but it gives you a guideline for whether or not your Active Application needs a Fault Tolerant or Disaster Recovery implementation. A vast majority of the Active Data Warehouses are basic High Availability configurations. HA is embedded in Teradata Systems so there is no additional cost. Let’s look at the call center that does cross-selling and up-selling on inbound customer calls. If an agent handles 20 calls per hour and is able to inject a cross-sell opportunity every 5 th call (a common average), then 4 calls per hour get offers and 10-15% of those are a sale (about 3 calls per day out of 32). An outage of 5 minutes could lose a few hundred to a few thousand dollars in a big call center with 100s of CSR agents. With a good design, operations staff and procedures, an annual loss might be $5000-10,000. This is not a justification for implementing a 1 or 2 million dollar disaster recovery site or fault tolerant configuration. Similarly, for many basic lookup applications – track and trace, partner portals, yield and cycle time dashboards, etc. – there is an inconvenience to the user but not a serious loss of money. Notice that eCommerce applications are often deployed without FT/DR yet some are. Of course for some eCommerce and banking applications, downtime is unthinkable. In these cases, near instantaneous fail over to an alternate system is needed to ensure smooth continuous availability of data. Only the line of business executives working with the IT organization can decide the criticality of the application. In most cases, the application will not require FT or DR investment. But regardless of what the application is, the IT organization should always bring up the discussion with the business executives. It must be a proactive decision. That decision must be understood thoroughly by the business management, and preferably documented so that everyone knows what the exposures of not investing will be.
08/25/10 Teradata started the “real time BI” or “operational BI” with the publication of the article “Active Warehousing” in the spring issue of Teradata Magazine in 1999. Since then, there has been a steady stream of functionality in the database, utilities, and stand alone products to support Active Data Warehouse The list shown is not all inclusive. If the list of “active” relevant functionality were shown, the display would be too cluttered to read.
08/25/10 Teradata started the “real time BI” or “operational BI” with the publication of the article “Active Warehousing” in the spring issue of Teradata Magazine in 1999. Since then, there has been a steady stream of functionality in the database, utilities, and stand alone products to support Active Data Warehouse The list shown is not all inclusive. If the list of “active” relevant functionality were shown, the display would be too cluttered to read.
These sound bite summaries are worth remembering. They will help organize your designs and plans around important elements of the design and implementation of an Active Data Warehouse. Of course nothing as sophisticated and powerful as an active application is so simple. There are numerous techniques in each category – many of them already common knowledge in a Teradata installation. Reviewing these basics and digging deeper into the best practices Orange Books provided by Teradata on these topics can help you activate a data warehouse with much less risk and cost. Other vendors do not provide these best practices – they expect the IT organization to figure it out on their own. With Teradata, we knew our decision support audience would need reminders and help to implement an active data warehouse. Most of these techniques are simple, common sense. In some cases you may want to rely on Teradata Professional Services to help sort out the best path for your organization.