Real time responses to events will be feasible when enterprises are designed to be maneuverable and their flow of activity is not disrupted by a breakdown in any one component in the chain of business processes that enable the completion of an activity.
Elevate Developer Efficiency & build GenAI Application with Amazon Q
The Architecture for Rapid Decisions
1. THE ARCHITECTURE FOR RAPID DECISIONS
The management of the architecture for business intelligence, and related information
management solutions, has undergone a paradigm shift as business drivers have an overriding
influence on its design. In the past, much of architectural design was an arcane field focused on
the logical and the physical layers and was the preserve of CIOs. In the future, executives are
increasingly looking for service-oriented architectures, shorn of the complexity of the IT systems,
and allow company strategists to respond flexibly to market opportunities. They want to use SOA,
XML and web services to reduce the latencies in information processing and decision-making to
be able to respond in real time.
Service-oriented architecture is the culmination of the progress towards reusing programming,
starting with object-oriented programming, and extended to building applications and processes
on the fly. The progress continued with components and concluded with the development of
services. Objects enable the reuse of programming code as long as the operating environment,
such as Linux, Windows, do not change. Components extend the flexibility and can reuse
software even when the process changes. Services are the most flexible as they enable
developers to cross both the process and operating system boundaries.
Real time decision making is widely recognized as a potent source of competitive advantage.
According to a survey conducted by Optimize magazine, 88% of senior executives want to see
lower latencies in the availability of decision relevant information. Currently, as much as 16% of
time spent by executives, according to research conducted by Outsell, an analyst firm in the
information processing industry, is lost on accessing, reviewing, and evaluating information at a
total cost of $107 billion besides causing delays in decision-making.
The challenge before companies is to choose from an assortment of architectural choices that
vendors offer. At this point of time, most companies are wary of trying new technologies and
would prefer to extract increasing value from the enormous investments in IT sunk in the late
1990s. Emerging technologies, such as web services and SOA, enable them to extract greater
value from their existing investments.
Undoubtedly, incremental investments in architectural changes bring about disproportionate
benefits for companies and data integration and business intelligence tools play a critical role in
reaping greater investments from existing investments. An example of how smart improvements
in architecture raises the profitability of existing investments is the case of Telus Corp., a
Canadian telecommunications company based in Vancouver, British Columbia which deployed
an enterprise business-intelligence suite and integration software to rapidly route relevant
information, from its transactional databases, to its field-service staff all over Canada and lowered
its operational costs by $1 million per month.
2. Business Intelligence software can help to lower latencies even in the infrastructure and help to
extract higher value from existing investments. All too often companies invest in an array of
servers and storage devices to lower downtime and increase the speed of response; the excess
capacity helps to spread the load so that there is no interruption. People's Bank, instead, decided
to lower the downtime of its applications rather than incur extra costs on spare capacity. It
installed a monitoring tool that allows it to compare the utilization rates of its infrastructure with
the response time. The software allows it to anticipate when its demand exceeds its installed
capacity, as indicated by signals on dashboards, or it can look into the root causes of the
underutilization and to find the means to add to capacity.
IT managers almost unanimously agree that business intelligence and integration technologies
account for much of the gains in speed of response while new investments in IT infrastructure are
much less effective. In a survey of IT managers, only 16% of the executives reported that they
had achieved a faster response. On the other hand, 54% of the respondents agreed that Web
services and data warehouses/data marts have accelerated responses much more effectively. By
contrast, very few executive believe that the more mature products in the enterprise software
industry are as effective; 22% agree CRM software has, while the corresponding figure for
supply-chain management tools was 20% and call-center software.
The reality on the ground is that a relatively small proportion of companies have taken advantage
of the opportunities available to lower the latencies in decision-making. Most companies are
moderately effective (75%) in achieving real-time decision-making while 19% are very effective.
The major reported barrier to achieving a high level of effectiveness is the architecture of
business processes that can accept the software that lowers the latencies.
The impact of business intelligence on real time response to events is illustrated by Chase-
Pitkin’s success in lowering the incidence of theft in its stores by using predictive analytics. It
decided to conduct inventory checks, on items that were more susceptible to theft, at weekly
intervals instead of waiting for the lean season to arrive. The data was used SPSS Predictive
Analytics tools to anticipate the next case of shoplifting and take preemptive action to lower the
losses from robbery. This would not have been possible if it did not have the ability to rapidly
aggregate data rapidly and feed it into its business intelligence software.
A framework for governance of information
Data mining for reporting purposes and strategic management is a generation behind the needs
of a real time enterprise. The purpose of data mining was not operational management and even
less to steer business processes to respond to events in real time. Enterprises cannot react
quickly unless information flows to a single point from all its far corners, digested rapidly and
communicated to operational staff just in time to deploy resources to be the master of a situation.
In the past, this was not possible because individual departments frequently had distinct data
sources and applications. When consolidated information was available in a data warehouse, the
3. data was not refreshed fast enough to respond to events and business processes could not
adjust in time to cope with contingencies.
An enterprise governance model is an aid to the management of an enterprise cohesively. Any
one of the information assets of a company has a functional purpose and an administrative
component to manage resources. A database, for example, is a means to store data and its
properties section helps to manage the format, allocation of addresses and its extraction and
presentation. Similarly, an application such as a spreadsheet is a means to store data which can
be manipulated with the menu functions. The relational database and the spreadsheet can work
together using middleware, such as ODBC (On-line Database Connectivity), which has the ability
to mediate between the distinct administrative tools of the two information assets. A real time
enterprise needs more than a point-to-point interconnection of information assets. It has to be
able to link all its assets so that sequential activities can flow from one end to another without a
break.
An enterprise governance model is the means to centralize the administrative tools of all
information assets and business processes. It separates the management of information assets
and processes from the physical activity of completing tasks; the individual functions are invoked
when a task is completed from a remote center which has an overall picture of the objectives to
be achieved. In technical terms, metadata is the information that governs individual databases,
applications or business processes. While much of the metadata in the past was hard coded to
use resources within an application or database or monitored business processes selectively, a
real time enterprise needs more liquid resources.
The enterprise governance model allows metadata of distinct applications or databases to talk to
each other. It is also a means to administer all information resources from a single point of
control. An enterprise governance model maps the functional, analytical and resource flows of a
company and interlinks the metadata of individual assets to work as a single entity.
Several benefits follow when the information assets operate as a continuous process instead of
discrete units. For one, the assets do not have to be present in a single physical location where a
task is being done. Also, the data required to conduct is not reproduced in several different
applications which eliminates the risk of inconsistent definitions. Finally, business processes
operate independently of applications and can be invoked by several different applications.
Industries like the health and the financial industries have a plethora of systems to manage their
varied information stores. The health sector, for example, has information on clinical trials, FDA
regulations, academic literature, product information scattered in several repositories. When
doctors use information, they need to be able to find all this information before they can begin to
prescribe drugs. Companies supplying drugs have to be package information in response to
specific queries that can often to extremely time consuming.
4. Aventis Pharmaceuticals deployed a medical information system that its call center staff could
use to retrieve information to respond to queries from doctors. It needed to link its Siebel CRM
system to its document databases with the medical data. Aventis installed integration middleware,
deployed on a server, from RWD Technologies, with additional capabilities to write rules for
packaging information, so that inter-related data can be presented in a cohesive manner.
Fragmentation of data sources and applications also take place when companies create their
business intelligence infrastructure incrementally. A big bang approach to data warehouses, i.e.,
a centralized data warehouse is risky and requires a large investment at the outset so companies
prefer a data mart for some departments. When their data marts prove to be profitable,
companies want to migrate to a data warehouse or they want to create a federation of data marts.
When they decide to create a federation of data marts, companies need to find a way to integrate
their data sources and applications. Once investments have been sunk in one infrastructure,
companies prefer to continue to use their old infrastructure rather than start all over again with a
data warehouse.
An example of an integration product is IBM’s WebSphere which has business process
management capabilities besides coordinating application-integration and workflow technologies.
The software it acquired from Holosofx software affords modeling of business processes,
simulation of the outcomes of the chosen business processes and comparisons of actual results
with the expected outcomes.
Pointers to information stores
In the past, metadata or the descriptions of data structures were inconspicuous as they were
incorporated into the applications that drove business intelligence functions. When the metadata
was associated with a particular application, it also had definitions that were hard to generalize.
For example, one database would have information on the products in the company. These
products would not have information on the related applications or the technology or solution.
When information is used in a business context, information about products is rarely useful when
it is not seen together with information on its uses. For example, consumers have to choose
between several cell phones. To make their choice, they would need to know which ones can be
used to access the internet, pre-paid cards, etc.
The available metadata, as long as it is embedded in particular applications, cannot help in the
centralized management of an enterprise. In the new world of business intelligence, companies
have to be able to associate categories such as products with customers for them to identify
relationships. Consumers are interested in the attributes of products; they would look for luxury
cars or fuel efficient cars rather than any specific product. Companies have to move beyond
metadata to taxonomies to provide information in the language people use in everyday life. The
rub is that perspectives differ and definitions vary with them. As a result, the corporate world has
been struggling to define categories in a manner that is acceptable to all.
5. A real time enterprise business intelligence infrastructure is centrally managed and needs
cohesive administration to be able to manage the heterogeneous applications common in
enterprises today. In order to work with existing applications and databases, companies have to
migrate to open standards metadata. The absence of metadata implies that companies spend
about 35-40% of the programming budget on transferring data from legacy systems or from one
type of information asset to another.
A data model underpins the Meta data that governs a network of applications. The data model
helps to remove any superfluous Meta data that exists in the enterprise and adds definitions that
help to govern information assets more effectively. The data model prepares the ground for
sharing of information across applications and databases and collaboration based on agreed
definitions.
Data warehouses are so yesterday
Data warehouses have been the widely accepted means to consolidate data for analytical
purposes. When the data stored in data warehouses is subjected to analysis, it is not possible to
refresh it with new data. Any updates in the data warehouse can be done at intervals outside of
working hours which contributes to data latency. On the other hand, transactional data stores
have been reserved for mission critical operational purposes while analysis is accorded a lower
priority if at all.
The conflicting goals of gaining access to current data and operational efficiencies are achieved
by intermittently sending trickles of data from the source operational data store to a data
warehouse. Not all data is required for analytical purposes and has to be selected depending on
the metrics chosen for performance management or other specific purpose. Any change of data
is transferred to a separate partition in the source database and enters a message queue. This
data is then streamed into the receiving database into a separate “active” partition and transferred
periodically. The steady outflow and inflows of data ensure that the performance degradation, if
any, is not consequential.
Data warehouses will continue to be useful to store historical data which has been scrubbed for
quality defects, inconsistencies in definitions and other errors. The emerging scenario is one of a
federated information infrastructure which would include access to real time data. A typical case
is that of 17th Judicial Court of Florida which acquired an events management system from Neon.
Any updates in the data are stored in a real time warehouse which can be accessed for analysis
by case managers. Eventually, the same data can be transferred to a data warehouse.
A heterogeneous and integrated infrastructure needs additional information assets to manage
data access, queries and data storage. It needs an integration server which stores the master
data to manage the conversions of information from one source to another. A heterogeneous
architecture has to be able to manage queries both in the familiar SQL format and the emerging
6. standards of XML. At a more advanced level, the integration can include the management of
workflows across a variety of information systems.
A typical application of heterogeneous systems, co-existing in a federated infrastructure, is the
case of the patient information system that hospitals in Massachusetts are creating for access to
patient prescription information for use in emergency rooms. The data is accessed from a variety
of databases of insurance companies and presented on a dashboard which doctors can use
when patients are treated in emergency rooms. None of the latencies, common in data
warehouses, inhibit the use of data.
Extraction of information from a variety of sources including relational databases and other
documents such as text documents, e-mails, instant messages and spreadsheets requires a
universal system of queries that is not available with the familiar SQL. The alternative XML based
system of queries, executed by XQuery, is able to access information from a wider range of
sources.
A web of services
While a great deal of skepticism is routinely expressed about the maturity of Service-Oriented
Architecture and web services, the early experience suggests a high level of satisfaction with
these technologies especially when it comes to the task of integration of legacy infrastructure.
According to a very recent survey, about 18% of a thousand companies are “SOA-ready,” i.e.,
their infrastructure uses standardized, loosely coupled services while 35% of them expect to have
a fully functional web services infrastructure in place soon. A majority of 55% of the respondents
recognize that web services yield substantial gains in terms of integration of their IT infrastructure.
Minimalist integration, i.e. inter-connection of a few applications is not uncommon but the real
benefits of the SOA architecture are realized when a network is created. Individual applications or
data repositories are reduced to services that can be accessed on the network much like the way
telecom devices, such as telephone instruments, DSL or hosted services are plugged into a
telecom network. The network then becomes instrumental in achieving business objectives; a
group of companies might want to collaborate and use devices such as conferencing tools to talk
to each other. Similarly, the SOA creates a network whereby individual applications provide
services while an overarching network helps to achieve objectives such as providing real time
analytics for coping with an emergency.
The lightweight integration made possible by SOA enables companies to evolve as their business
reality changes without being hamstrung by their infrastructure. They can add new applications
that are complementary to their existing applications and synchronize them with business
processes which can be adapted for new needs. A typical example is that of Owens & Minor, a
Glen Allen, Va., distributor of medical and surgical supplies needed to build a supply chain
network and retain the functionality of their legacy infrastructure installed in the 1970s and 1980s.
7. The core functionality of these systems was exposed and linked to a SOA network. In addition,
the company added business processes management to the network.
An audacious attempt at introducing the SOA architecture is Siebel’s $100 million investment in
Siebel Universal Application Network (UAN). Siebel UAN makes a departure from an applications
oriented architecture to one that makes business processes as the bedrock of an IT system.
Individual units of business processes, such as quote to cash, campaign to lead, etc., are defined
using the BPEL4WS (Business Process Execution Language for Web Services) standard.
The levers of an agile enterprise
Real time responses to events will be feasible when enterprises are designed to be
maneuverable and their flow of activity is not disrupted by a breakdown in any one component in
the chain of business processes that enable the completion of an activity. An analogous situation
is the case of airports which represent a network of processes that have to be completed before
the activity of flying passengers from one end to another is completed. The completion of the
activity involves applications such as the technology for flying airplanes, the communication
technology that enables airplanes to receive information about the safety about their flight path
and the technology for the management of landing of aircraft. All these processes are seamlessly
interconnected and the breakdown of any radar, defects in the control systems of any aircraft or
closure of an airport does not necessarily disrupt air transportation. Airlines have maneuverable
systems in that they routinely cope with fluctuations in traffic and have to be able to route aircraft
to alternative paths.
By contrast, software applications embed business processes that are rarely interconnected to
complete the flow of an activity. These applications can complete some components of tasks in a
value chain analogous to flying an aircraft or managing communications. Real time enterprise, on
the other hand, seeks to mimic a network such as the management of air traffic.
A business process oriented system helps to automate the management of business processes,
integrates them with functionally relevant applications and creates a framework for collaboration
within a team. In addition, it helps to organize work flows and leverages the existing IT
infrastructure for completion of tasks.
One example of business process oriented architecture is the Lloyd’s of London which needs an
integrated business process to inter-connect its many branches and activities spread over several
countries. It installed a BI reporting tool to keep track of money inflows as a result of premiums
paid and outflows from payment of claims which could happen in any country or office. Typically,
employees work with a network of banks, underwriters and other professional services
companies or claims processors. Consequently, they need to report on data as well as
collaborate and communicate with their partners in real time before they can come to decisions
about risk management. A typical task is to reconcile data on transactions conducted in eight
different currencies. Lloyd’s needs data of all these dispersed transactions at a portal interface to
8. be able to estimate its net position. This is only possible when its entire IT infrastructure,
transactional databases, Windows XP operating systems and file servers are inter-connected.
In the past, business processes or workflows were wrapped up with the monolithic applications
that governed their operations. In an environment of heterogeneous applications, similar and
often potentially inter-related business processes lie fragmented divided by their disparate
environments. The emerging design for business processes seeks to mimic the value chain of a
business and constructs seamlessly integrated business processes that complete the flow of
activity to achieve a task. A prerequisite to achieve this goal is to separate the management of
business processes from their specific functional applications and manage the individual units
cohesively in accordance with the desired workflow of an enterprise. The tool for centrally
managing individual units of business processes is a platform, assisted by a graphical user
interface, spells out their semantics and executes their logic consistent with the desired workflow.
An example of the benefits of reducing business processes to their components and then to
interlink them and manage them as a single workflow is the case of inventory management at
Owens & Minor. In the past, the company had to manually complete a series of tasks before it
could return inventory, when it was close to the expiry date, to their manufacturers. The staff has
to trace every item in warehouses, check the return policy of the manufacturer, contact the
manufacturers to obtain a return authorization, create return orders and then inform accounts
receivable to expect a refund. All these series of repetitive functions are represented by a well
defined process, which can be completed with the help of business process management
software. These functions are inter-linked with related applications like financial software to
ensure that all the related information is also available.
In addition, enterprises should be able to loosely couple their business processes with
applications whenever they need to do so to respond to unexpected events which is possible in
SOA architecture. When business processes mirror the value chain of an enterprise, it is possible
for managements to take impromptu actions to respond to contingencies or unexpected events
that might roil their tactical and strategic plans.
The emergence of a variety of middleware enables companies to manage inter-related business
processes and to couple them with applications. Message brokers, transactional queue
managers, and publish/subscribe mechanisms are the means to automate processes; each of the
inter-connected component applications have the ability to send alerts about events and to
receive information about events which need their response. The platforms managing business
processes invoke a message broker, transactional queue manager, or publish/subscribe
middleware layer to tie applications, detect business process related events, and to ensure
routing of events and messages to applications.
One classic case of events triggering alerts and driving business processes is the application
installed at Jet Travel Intelligence. It uses information on natural disasters, political disturbances,
9. and a variety of other metrics to assess the risk for travel by executives in multinational
companies and other organizations. The business process engine it acquired from Fujitsu
embeds units of intelligence or units of validated information with metadata that associates it with
information on traveler profile and itinerary data besides information sources and content. The
customers receive alerts when their travel plans are affected. The task of verifying the information
is divided into units of work; the data flows in and individuals with expertise in individual regions,
subject and legal matter. Each time one step of the work is done, it passes on to another stage.
One example of a product that integrates business processes with services is SAP’s NetWeaver
platform which composes applications, called xApps, from several services. SAP's Enterprise
Services Architecture creates enterprise services, synchronizes them with processes using the
SAP Composite Application Framework, and provides a platform on the NetWeaver application
server to execute them.
Measuring progress
The optimization of business processes would be impossible unless it is also possible to collect
intelligence about activities. Companies need to be able to estimate the metrics of performance at
each of their level of business processes. For example, they need to be able to estimate the time
taken to complete a task or its cost and compare it to the desired level of performance. The task
of measurement is undertaken by Business Activity Monitoring (BAM) software. It measures, for
example, the flow of traffic to a call center. The data feeds from BAM are fed into a real time data
warehouse.
The availability of data, flowing from BAM, enables corporate performance management for
aligning a company’s strategy with its resources. Enterprises have found in balanced scorecards
a precise way to translate their strategies into measurable performance parameters. For example,
the strategic goal of a company could well be to increase its market share. After studying the
market, a company could come to the conclusion that it can increase its market share if the
quality of its products is improved and price is lowered. In terms of operations, this would imply
that the company would have to reduce defects, change its technology, use less materials, raise
labor productivity, improve functionality and usability, improve training, etc.
The log data of business process management software throws up a wealth of information about
labor use, time spent on each process, materials consumed, etc. Corporate performance
management tools in business intelligence tools can pick up this information and analyze it for
improving the metrics.
The data received from business processes would not be actionable unless it can be compared
with the desired performance. Furthermore, any anomaly in the performance has to be
communicated to participants in the work force for taking decisions. This is the task of a rule
engine which compares the actual performance with the required standards and sends out an
alert with there is a hiatus between the two.
10. One of the applications of Business Activity Monitoring is the case of Motorola which used the
data from its business processes to reduce the time involved in order processing. It needed to
reduce the time lapses between the occurrence of a problem and its detection and other
inefficiencies caused by human processing of data as well as from the use of call centers instead
of on-line channels. The implementation of BAM helped Motorola to reduce the inefficiencies in
problem identification and its resolution by 85% and reduction in hold-ups of orders by 75%.
The more advanced versions of business process management offer modeling and simulation
features that help to optimize processes and resource allocation. One instance of the use of
business process data for optimization is the case of Cannondale Bicycle Corp.of Bethel, CT. The
company launches new products frequently which affect the expected revenue, the demand for
raw materials, scheduling of the production processes and the costs of production. The simulation
of alternative scenarios helps to determine the most optimal way to manage uncertainties created
by new products. The availability of real time data helps to respond to events faster and to
explore alternative decisions as the data is received.
Warm ups for an agile enterprise
Services-oriented architecture disentangles components of software that constitute enterprise
software. This includes the separation of workflow management software from applications
software. The autonomous workflow management software is then linked to the network of
resources that are available on a services-oriented architecture.
Performance improvement and optimization is best achieved when the resources invested in
workflows can be modeled, measured and streamlined. Workflow management software begins
where business process management software leaves off; it manages resources including
resources, applications and data after the process design has been spelt out. What is more,
workflow management software allows companies to reallocate resources and change the
associated applications and data sources as needs change. Without an audit of the resources
expensed as work is completed, it is hard to explore avenues for efficiency gains. Workflow
Management Systems also record events such as the initiation and completion of the numerous
activities in business processes; they keep a record of the outcome of an activity, the resources
invested in the activity, etc.
Workflow Management Systems begin with the modeling of workflows in a way that completes an
entire process. This is followed by an effort to optimize the series of tasks that complete a
process much like project management techniques such as PERT/CPM do. Finally, workflow
management software manages the resources required to complete tasks and monitors their use.
An example of the use of workflow software and process monitoring is the way Citigroup uses
them to keep track of the value of a variety of assets after their price information has been
received from all its myriad sources. If marked changes in values of assets are observed, the
11. matter is escalated to the manager. The more advanced modules in the software have risk
management options which prompt contingency plans to guard against grievous loss.
Cleaning the dirty numbers
The transformation of data from transactional databases involves several steps before it can be
used for analysis; the tasks can include consolidating data from several sources, semantics and
reconciling data structures. The volume of work can be enormous; the British Army, for example,
had to extract data from 850 different information systems, and integrate three inventory
management systems and 15 remote systems in order to move suppliers to the scene of war in
Iraq.
One of the first tasks is to consolidate the data that is scattered in several different sources. For
example, a financial institution such as the Citibank does business with a large corporation like
IBM which has subsidiaries and strategic business units in several different geographical
locations. As autonomous units, they will make purchase decisions of their own accord often
without the knowledge of the parent company. For its business intelligence purposes, Citibank will
look for data from customers and will need to aggregate data from all the independent units so
that it can determine the sales to all units of IBM.
When the data is extracted from several different databases, it is very likely that duplicate data
would be obtained. For example, individual units of a company record names of customers in a
variety of ways. Typically, some units will record both the first and the second name of a person
in a single line. Other units within the company could well record the first name in one line
followed by the last name in another line or vice versa. When the data is consolidated, the huge
number of duplicate records cannot be corrected manually and companies have to find a way to
automate this task. The duplications will also happen because the name has been misspelled or
some records have the initials for the middle name or the full name. Companies have to write
rules for correcting the errors; for example, they could try to match addresses and e-mails to
remove the superfluous records.
Similarly, the supply databases have to be able to reconcile the identification numbers of a large
variety of parts, products and materials. This is rarely standardized across the sources of supply.
In the case of the British Army, for example, the identification numbers for a cold-climate ration
pack and an electronic radio valve were identical which would have upset many people if the data
was not reconciled.
Once the data has been collated, it has to be categorized in a consistent manner. A financial
institution, for example, would like to distinguish its customers; a typical classification would be
retail customer and commercial or a corporate customer. For most data, it would not be hard to
tell when a customer is from the retail sector or the commercial sector. The rub is that some
persons belong to both categories; a management partner of a consulting company, for example,
is an individual and a commercial entity when the company is not a limited liability company.
12. Often, individual stakeholders in the company will come to their own conclusions about the
definitions or the metadata when they consolidate the data. When the analysis is done, however,
the company could find itself coming to invalid conclusions.
Finally, companies need to reconcile differences in data structures across several different
applications. Data extracted from a CRM database is not likely to be consistent with similar
information from a supply chain database. In the case of the British Army, for example, the supply
database defined a unit of supply as a can of 250 liters. On the demand side, it was not
uncommon to request one liter cans. If they decided to order 250 cans, the supply side would end
up with a logistical spike that they could not handle.
Data, as it is stored in transactional databases, is not meant to have related contextual
information that helps in analysis of the data. For example, customer orders are gathered by
CRM databases but these would not be adequate if related information on the geographical
region, demographic characteristics and the time of the order cannot also be correlated with it.
Data profiling technology employs a range of analytical techniques to cross-check information
across columns and rows to ferret out the flawed data. For example, it can compare the name in
one column and the gender in another to check for the accuracy of the names. Similarly, it could
scan the range of the data values to check for their accuracy. Information on the employment
status of people could be compared with their age; people with ages exceeding sixty-five have a
low probability of being employed. The reported benefits from data profiling are high; British
Telecommunications realized a saving of £600 million during the last eight years.
The quality of data profiling depends on the number and range of relationships that are examined
in the process of cleaning the data. For example, a vendor could be studying the procurement
behavior of its clients. It could do a simple check of the size of the order against the dimensions
of the package. The checks could be more advanced and correlate information from invoices,
frequency and the revenue base of the company to validate the data based on its consistency.
Some of the tools that do the data profiling are Trillium Discovery, SAS Dataflux and Oracle's
Enterprise Data Hubs. One of the functions of Oracle’s Data Hub is to help data managers
reconcile differences among source systems with the help of an overarching master data
management solution which converts the data definitions from a variety of sources to a single
repository of universally applicable descriptions of data.
The increasing adoption of EII and EAI require real time data improvement that has to be built
into business processes. When disparate applications and data sources are consolidated,
without the benefit of transformation available with data warehouses, the combined data will
bristle with inconsistencies and other inaccuracies.
Considerations about the future
XBRL and Textual Financial Information
13. Today, most companies process largely their internal data on customers, inventories, labor use
and financial information while the potential for using external data, such as regulatory filings, has
been mostly unexploited. This kind of data is extremely valuable for competitive intelligence,
benchmarking, trend analysis and for anticipating the impact of macroeconomic policy. The rub is
that this data is voluminous and hard to categorize. With the advent of XML and its tags, it is
much easier to index information at a microscopic level. The indexing of unstructured data paves
the way for text mining and in extracting insights from unstructured data. XBRL (Extensible
Business Reporting Language) is focused on using XML for indexing of business documents.
One of the applications of XBRL is the categorization of SEC reports which can be used for
financial comparisons. Edgar On-Line has a product I-Metrix, which offers XBRL-tagged
information from all 12,000 or so SEC filers, including widely held private companies and private
firms with public debt instruments. Beginning with 80% of the line-item information contained in
the 10-K annual and 10-Q quarterly filings, the company will cover all line-item information as well
the analytically critical footnotes and other textual information that helps to understand the
quantitative information in context. OneSource Applink is another company that has developed
search engines to access information from its XBRL-tagged database of some 780,000 public
and private companies.
XBRL, in its initial years saw a sluggish pace of adoption, but lately Business Intelligence vendors
have recognized the potential in the technology. Business Objects and Ipedo have set the pace
by announcing a partnership to use Edgar Online data to deliver XBRL-enabled BI capabilities.
Companies need both external and their internal corporate data to make comparisons. Ipedo's
provides the querying capability (XQuery- and SQL-enabled XIP 4.0 information integration
engine is used to bring XBRL data) to access external data from Edgar Online's Web-based I-
Metrix service and puts it together with internal corporate data on the Business Objects XI
platform for making comparisons with competitors. The standardization of the definitions helps in
making valid comparisons.
Growing support from the government has improved the prospects for adoption of XBRL. Federal
Deposit Insurance Corp. uses XBRL to collect process and distribute the quarterly financial
reports that it receives from banks.
Visualization
Large datasets present intimidating challenges when it comes to extracting patterns in a way that
is intelligible for those who have to take decisions in real time. The volume of data continues to
grow exponentially as a variety of sensors, such as those placed to monitor traffic in cities,
overwhelms known ways of analysis such as statistical analysis and data mining. An example
would be emergency response in the face of a monstrous hurricane that frequently strikes the
southern regions of the USA. Governments have to be able to absorb the information about the
movement of the hurricane which often changes direction, velocity and intensity. A variety of
14. variables about atmospheric pressure and temperature, topography and population data have to
be taken into account for assessing potential damage. All this would be an impossible task if the
entire data is not visualized.
Business intelligence, with its multi-dimensional focus, pushes the envelope for visualization of
data. A plain SQL query can aggregate information by a few of the dimensions while a cube helps
to look at data from several angles. When it is visualized in a multi-dimensional space, the data
can bring into relief outliers, clusters and patterns that help to classify the data. Auto theft data
rendered on a map, with additional information on neighborhoods and demographic information
and income status of the population, helps to extract patterns from colossal databases to help
forewarn car owners about their risks and to deploy police in locations where occurrence is
concentrated.
Geography lends itself well to 3D visualization as location provides a context in which several
different types of data can be overlaid on a map and their interactions can be simulated in a
graphical form. One application of this kind of advanced visualization of large data sets is the
solution Vizible Corporation developed for a California municipality. In an emergency situation,
several different resources of a city administration are deployed such as the police, fire fighters,
medical services and they need information on neighborhoods, people, traffic, etc. to come to
impromptu decisions as the situation evolves. Vizible Corporation developed a virtual control
center using GIS data to simulate graphically the situation on the ground so that an emergency
response can be coordinated efficiently.
The major departure that next generation visualization tools make from the familiar world of Excel
charts and graphs is in the interactivity of the diagrams. When people look at diagrams, they want
to test their test hypothesis about how outcomes will change as the actionable variables are
altered. They want to be able to see how the results will change as some of the values are
excluded. For example, people could be looking at the academic performance of men and women
and males are likely to be tickled if they find females are performing better. They may want to find
out whether the academic performance of women in also higher in the more quantitative sciences
likes physics and chemistry. The emerging technology for visualization is more decision-oriented
and enables graphical simulation. These visualization tools allow for querying, animation,
dynamic data content and linking of several diagrams.
The decline in decision latencies, as a result of investment in data visualization, is illustrated by
the experience of Briggs and Stratton, a manufacturer of gasoline engines. After installing a
business intelligence server and web enabled visualization tools, the company was able to find a
way to predict the failure rate of its engines and metrics influencing quality and operational
metrics of the company. The chief benefit of the application was that it allowed the company to
anticipate the costs incurred on warranty and to take pre-emptive action rather than wait for
several months before the weight of cumulative evidence of failures perforce lead to change in its
15. processes. The real time monitoring the metrics, influencing the quality of its engines, helped to
react to any adverse turn of events.
Predictive Modeling for operations
Data mining is not an activity that is normally synonymous with operations. Instead, it is reserved
for the strategists of companies who are aided by statisticians, market research analysts and data
mining specialists. The Predictive Modeling Markup Language brings the benefits of data mining
to operations. While model design is still found in the exclusive world of data mining specialists
with advanced degrees, the model is executed on data received in real time.
In the first generation of applications of predictive analytics, companies used limited data from
their CRM or other transactional databases to rate customers or detect fraud. All the complex
analysis of statistical models is reduced to scores that operations staff can use arrive at credit
scores for estimating credit-worthiness or use CRM data to rank customers for the level of service
they should be provided.
In the future, the applications will go further and profile customers and provide the information to
call center staff to make offers. This second generation of CRM tools mine data and estimate the
customer's lifetime value, credit risk or probability of defection. Some solutions are more
sophisticated and help in differentiating good prospects from casual enquiries. Call center will
receive enough information to engage customers and offer them compelling deals they find hard
to refuse.
64 bit architecture
New hardware technologies, such as the 64 bit processors are expanding the technological ability
to efficiently process colossal databases. While the currently popular 32-bit processor can only
address 232 bytes, or 4Gbytes, of data, 64-bit processors can access 264 bytes, or 16 Exabytes,
of data.
The superior processing capability of computers with 64-bit processors can help to run databases
and other business applications faster as they can access more data from the cache rather than a
disk, manage larger data files and databases on fewer servers, allow more users to use
applications concurrently and reduce software-licensing fees as fewer processors are required for
the same amount of data processing.
One example of the application of 64 bit architecture is in crunching large volumes of health data.
Apex Management Group uses SAS Enterprise BI Server to understand the complex interplay of
plan offerings, disability benefits, medical system utilization and cost. By using a 64-bit
processing platform, Apex could run through 17 million rows and test hypothesis on the fly.
An increasing number of applications will be possible with 64-bit architecture. The advent of RFID
will substantially increase the volume of data that needs to be crunched and that would be
possible with the added memory available. Similarly, city administrations are using sensors to
keep track of crime and this data can be processed with 64 bit architecture.