DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Managing Dirty Data In Organization Using Erp
1. Managing dirty data in organizations using ERP:
lessons from a case study
Jodi Vosburg
The University of Wisconsin-Whitewater, Wisconsin, USA
Anil Kumar
The University of Wisconsin-Whitewater, Wisconsin, USA
Keywords achieve a competitive advantage in the
Data, Data integrity, 1.0 Introduction marketplace (Sellar, 1999). On the other hand,
Enterprise resource planning,
Systems management Daily operations, planning, and decision- ``bad data can put a company at a competitive
making functions in organizations are disadvantage'' comments Greengard (1998). A
Abstract increasingly dependent on transaction data. recent study (Ferriss, 1998) found out that
The integrity of the data used to ``Canadian automotive insurers are taking a
This data is entered electronically and
operate and make decisions about
a business affects the relative manually and then organized, managed and major hit from organized and computer-
efficiency of operations and extracted for decision-making. The same data literate criminals who are staging crashes
quality of decisions made. entered and used to facilitate building, and taking advantage of dirty data in
Protecting that integrity can be corporate databases''. The study found out
shipping, and invoicing goods is also
difficult and becomes more
extracted and manipulated to evaluate that in one case several insurance firms lost
difficult as the size and complexity
of the business and its systems factory and sales force performance in the $56 million to one fraud ring.
increase. Recovering data short term. In the long term this data is used How does a company end up with dirty
integrity may be impossible once it data and what can be done to prevent this?
to chart the course of the business in terms of
is compromised. Stewards of
manufacturing facilities, products, and Disparate data stores (individual,
transactional and planning
systems must therefore employ a marketing. The integrity of the data used to departmental, and organizational) that have
combination of procedures operate and make decisions about a business been developed and used by organizational
including systematic safeguards users over the years lead to dirty data
affects the relative efficiency of operations
and user-training programs to
and quality of decisions made. Protecting problems. For example, dissimilar data
counteract and prevent dirty data
in those systems. Users of data integrity is a challenging task. Redman structures for the same customer data
transactional and planning (1995) comments that ``many managers are (spelling discrepancies, multiple account
systems must understand the
unaware of the quality of data they use and numbers, address variations), incomplete or
origins and effects of dirty data
perhaps assume that IT ensures that data are missing data, lack of legacy data standards,
and the importance of and means
of guarding against it. This perfect. Although poor quality appears to be actual data values being different from meta-
requires a shared understanding the norm, rather than the exception, they labels, use of free-form fields, etc. (Kay, 1997;
within the context of the business
have largely ignored the issue of quality''. Knowles, 1997; Weston, 1998). These problems
of the meaning, uses, and value of can be compounded by the volume of data
data across functional entities. In Other scholars (Greengard, 1998; Kilbane,
this paper, we discuss issues 1999; Tayi and Ballou, 1998; Wallace, 1999) that is stored and used in organizations. One
related to the origin of dirty data, also point out the importance of data quality way of overcoming this problem is to use
associated problems and costs of
for organizations. technologies that integrate the disparate data
using dirty data in an organization, stores for an organization and help
the process of dealing with dirty Maintaining the quality of the data that is
data in a migration to a new used in an organization is becoming an companies clean up their data. Enterprise
system: enterprise resource increasingly high priority for businesses. In resource planning (ERP) systems (SAP,
planning (ERP), and the benefits of
a recent survey of 300 IT executives Peoplesoft, Baan, J.D. Edwards, etc.) are
an ERP in managing dirty data. examples of such systems. ``A good ERP
These issues are explored in the conducted by Information Week (Wallace,
paper using a case study. 1999), majority of the respondents (81 per system offers an integrated option,
cent) said, ``improving customer data quality implementing browser and client-server
was the most important post-year 2000 modes while maintaining consistent data and
technology priority''. The respondents function within the enterprise and out to the
further stated that there would be supply chain'' (Stankovic, 1998). In recent
``significantly increased spending'' on data years, ERP vendors have gone beyond
quality in their organizations. Companies providing the traditional integrated
Industrial Management & that manage their data effectively are able to applications, such as manufacturing,
Data Systems financials, and human resources. Newer
101/1 [2001] 21±31 applications that have emerged include
The current issue and full text archive of this journal is available at
# MCB University Press supply chain management, customer-
[ISSN 0263-5577] http://www.emerald-library.com/ft
relationship management, data mining and
[ 21 ]
2. Jodi Vosburg and Anil Kumar data warehousing (Caldwell and Stein, 1998; the organization who were involved with this
Managing dirty data in Stankovic, 1998) and browser modes that project. These employees included the
organizations using ERP: enable organizations to reach out to manager of the CSC and marketing services,
lessons from a case study
customers and the supply chain. Caldwell an information analyst in the marketing
Industrial Management &
Data Systems and Stein (1998) also point out that ``most services group, and a customer support
101/1 [2001] 21±31 important, ERP forces discipline and representative (CSR). The manager of the
organization around processes, making the CSC is responsible for managing domestic
alignment of IT and business goals more order processing and sales and marketing
likely in the post-ERP era''. Aligning IT and reporting for the division. The information
business goals has always been a top priority analyst works with users and programmers
for senior management. Thus it might be to specify report requirements and does
helpful for a company to implement an ERP much of the testing and trouble-shooting for
system. those reports. The CSR is the data entry point
In this paper, we discuss the experiences of analyzing and translating customer purchase
a company, which implemented an ERP orders into ERP documents. This study will
system in their organization. The discussion look primarily at issues relating to the CSC.
is focussed primarily on the data aspect of the
implementation. The paper is organized as
follows. In the next section we describe the 3.0 Dirty data defined
case-study organization. Section 3 defines the At first, the abbreviation for black was blk.
concept of dirty data and its impact on the Then it was changed to bck. We didn't
integrity of organizational data. In Section 4 discover this change until someone said the
we list the costs incurred by organizations as color mix didn't look right (Horwitz, 1998).
a result of using dirty data. Section five
Dirty data exists when there are inaccuracies
highlights several lessons learnt from the
or inconsistencies within a collection of data
case-study organization and, finally, in
or when data extraction is inconsistent with
Section 6 we summarize the guidelines for
intent. Inclusion of dirty data in a data
companies planning to implement ERP
source may pollute the entire data source
solutions to overcome dirty data problems.
making it difficult or unwise to use the data
for analysis. Dirty data in a transactional
system can mean incorrect order taking,
2.0 The case study products not built to specification, or errors
The organization where this case study was in packaging, documentation, or billing. The
conducted is a $650 million division of a result is dissatisfied customers, loss of
Fortune 500 company located in the Midwest. shareholder confidence, unnecessary
This company is a manufacturer of electrical, material and labor costs, and the real and
lighting, and automotive equipment. The opportunity costs of time spent correcting
products of this company are marketed errors resulting from dirty data. Those
domestically and internationally. The interviewed define dirty data as follows:
The GIGO (garbage in, garbage out) theory
company employs approximately 1,600 people
applies to dirty data. If you don't have checks
in manufacturing and sales facilities located in the system that prevents human error, you
both domestically and internationally. There will have errors in your data. Data integrity
are 17 manufacturing facilities located in refers to data that is systematically edited or
North America and Asia. The case study was edited by ``experts'' after data entry to remove
used to understand the implications of dirty errors (Manager, CSC).
data at the company before and after the
Duplicate data or data that is incomplete or
implementation of an ERP system. The ERP extraneous (Information Analyst, Marketing
implementation in the company replaced a Services).
number of independent mainframe legacy
Anything that is entered incorrectly (CSR).
systems used for order and quotation
processing, manufacturing, transportation, The definitions used reflect each one's
billing, and finance applications. One of the experience with dirty data. Awareness of this
co-authors of the study works at the company problem is growing within the organization
as the system/support supervisor for the as users, systems people, and management
Customer Support Center (CSC). In this role, uncovers and deals with problems resulting
the author was directly involved in from dirty data.
identifying, trouble-shooting, and training Data integrity requires awareness and
for dirty data concerns in data entry and with control of dirty data. A collection of data has
specifying, testing, and distributing integrity if the data are logically consistent
customer and sales-force reports. In addition, and accurate. Data integrity requires that
we interviewed several other employees in data additions or changes be reflected in each
[ 22 ]
3. Jodi Vosburg and Anil Kumar of the locations where that data is stored and Each person's perspective is culled from that
Managing dirty data in that data is consistent across the storage person's training and experience. The CSR
organizations using ERP: medium(s) used. Data integrity also requires
lessons from a case study indicated that she had little understanding of
that the users of that data understand the the way in which the data she enters is used in
Industrial Management &
Data Systems meaning of the data within the context of the peripheral departments and how it becomes
101/1 [2001] 21±31 business. Maintaining data integrity part of reporting. For that reason, it is
requires a systematic approach to data important to examine the data and rationalize
processing, storage, sharing, manipulation, it. Data rationalization involves determining
and reporting. what data is important to which department
and prioritizing the value of those data sets.
Once this determination is made, plans to
4.0 Cost of using dirty data correct and prevent dirty data can be laid.
``Errors in data can cost a company millions
of dollars, alienate customers, and make
implementing new strategies difficult or 5.0 The ERP implementation:
impossible'' (Redman, 1995). The manager of lessons learned
CSC commented that: The start of data integrity problems is really
Any business that has to issue debits and a failure to treat data as a strategic business
credits or that throws out surplus, unusable resource. Scholars (Redman, 1995; Tayi and
inventory, understands the costs of dirty
Ballou, 1998) point out that data is a key
data. Each credit or debit is estimated to cost
organizational resource. However, as pointed
the company $75 for the clerical efforts of
analyzing, generating and disseminating the out by Kilbane (1999), ``Many companies who
document. Added to that are the following: use data contained in legacy systems are not
production errors from erroneous bills of leveraging it as a strategic company asset.''
material or misinterpretation of a customer's The primary challenge to maintaining data
specifications; freight costs for shipping and integrity is the lack of resources allocated to
returning product; inventory scrapping it. To maintain data integrity, people with an
charges where the product cannot be resold; understanding of the origins and results of
financial penalties charged by the customer
dirty data and the ways to prevent and
for our error; ordering of unneeded materials;
scrapping of raw materials; wasted labor correct it, must be dedicated to the task.
charges at the organization and its customer; Redman (1995) says that: ``Due largely to the
warranty charges to fix the product, if it can organizational politics, conflicts, and
be modified; and unknown cost of the passions that surround data, only a
customer not ordering additional product corporation's senior executives can address
from you because of your data problems. The many data quality issues. Only senior
managers and people involved in warranty, management can recognize data (and the
credit and collection and finance understand
processes that produce data) as a basic
the ramifications. The rest of the organization
understands what their managers or corporate asset and implement strategies to
supervisors have shared with them. Our proactively improve them.'' Where data
quality program emphasizes feedback to the integrity is one of many responsibilities of
person involved with a quality problem. It is people with no understanding of the concepts
up to the management team to insure that all surrounding data integrity, dirty data is the
people understand the problems dirty data result. Integrity, issues receive attention in
can cause as well as prevention. times of crisis, but as soon as the crisis is
The information analyst for marketing over, those with responsibilities other than
services was of the opinion that ``most of the data integrity turn to the pressing deadlines
costs associated with dirty data cannot be or daily tasks that they are responsible for. In
measured in terms of dollars. If these costs a complex ERP environment, this can result
could be quantified the management would in perpetual crisis management. In the
be shocked''. She stressed the cost of the following paragraphs we discuss the lessons
endless number of consultants required to learned from this case study.
configure the system to prevent a particular
data problem or to determine or correct the 5.1 Understanding and communicating
results of one. new demands of an ERP system
The CSR focused on ``costs associated with Before the move from legacy applications to
customer dissatisfaction ± lost confidence an ERP system takes place, considerable
and business are hard to measure and harder thought should be given to how the system
to win back''. She pointed out the frustration change will change the roles of the users. The
and time lost at the factory, in the marketing conversion to an ERP system is not just a
departments, and at the CSC in correcting data extraction, cleansing, transformation,
problems resulting from dirty data. and populate process to effectively
[ 23 ]
4. Jodi Vosburg and Anil Kumar implement an ERP system. An organization this way of working. The combination of
Managing dirty data in needs a strategy and a plan. Atre (1998) points these factors has increased the occurrence of
organizations using ERP: out that ``legacy data is invariably in worse
lessons from a case study inaccurate, inconsistent data being entered
condition than you realize''. Caldwell and on the ERP via sales orders, as CSRs attempt
Industrial Management &
Data Systems Stein (1998) comment that ``ultimately, by to complete their complex and time-
101/1 [2001] 21±31 feeling their way through the initial shock of consuming data entry work in the same
an ERP implementation ± new business amount of time they did prior to the ERP
processes, new job roles, new management implementation and without a clear
structures, and new technologies ± understanding of how that data is to be used
companies are transforming themselves''. by other functional areas in the business and
In this company there are 48 CSRs in the by upper management for analysis and
CSC. These CSRs are responsible for entering business decisions.
orders taken from domestic customers. Now, Lesson: Organizational users need to be
with the ERP, the items on these orders not educated and prepared for the changes that
only initiate the manufacturing, shipping, will take place as a result of ERP
and invoicing functions, but also are the raw implementation.
data used to generate the sales and marketing
reports. The sales and marketing reports feed 5.2 Developing shared understanding of
the decision-making processes that steer the data
business. The correct and consistent entering The lack of a shared understanding of the
of these orders is critical to preventing uses and value of data among those
dirty data. performing the same tasks and among those
Most CSRs believe that the order entry performing different tasks can lead to
process has increased in complexity with the creation of dirty data. Tayi and Ballou (1998)
implementation of the ERP. Some estimate point out that ``the data gatherer and initial
that the time required to enter an order has user may be fully aware of the nuances
increased two to four times. The reasons for regarding the meaning of the various data
this widely-held perception are threefold. items, but that will not be true for all of the
First, the ERP is still quite new ± system other users''. Where those performing the
glitches can mean several unsuccessful same tasks have a different understanding of
attempts at entering a single order and the the data being processed, inconsistencies are
eventual involvement of system support inevitable. For example, if the marketing
personnel in processing. Second, there are services department members differ on
more steps to the order entry processes, and whether abbreviations are to be used in
greater variation across product lines. customer master data, inconsistent entry is
Legacy systems were used for narrowly- the result. Locating this data becomes
defined transaction sets. For example, each of difficult for CSRs because they cannot be
the four product groups in the company had sure if they are using the wrong
their own manufacturing system. The abbreviation, or if the data has not been
homogeneity of the transactions and of the entered. The result of this lack of shared
users meant that the legacy systems could be understanding is duplicate records ± when
customized to accommodate those tasks the CSR cannot find the record that they are
without affecting the ability of other users to looking for, a new record is requested. Even
perform other tasks. Now that all users share if marketing services is able to locate the
a single system, transactions must be record and corrects the abbreviation before
generalized to fit all tasks. Where creating a duplicate record, both the CSR and
customization cannot be automated, it marketing services have spent unnecessary
becomes a manual part of user work time.
processes ± the order entry process varies A lack of a shared understanding is
greatly from product line to product line. common among data generators and report
Greater expertise is required on the part of writers. A CSR knows that the promised ship
the user, not only in the performance of their date on an order with a production block is
assigned tasks but also in those of others that not valid, but a consultant writing a backlog
are affected by their system transactions. The report probably does not. As a result, the
learning curve has been steeper than anyone invalid date is published on the report.
imagined. Third, data entry skills are no Geographical distances and functional
longer enough to successfully enter orders ± barriers exacerbate this complexity. The
the ERP requires system savvy and an further an employee is from another
analytical approach. It has become critical employee, and the less that employee
that CSRs understand the logic behind the understands what is required in the other's
processes and the ramifications of their position, the less likely they are to share a
actions on-line. Many are inexperienced in common understanding of the importance of
[ 24 ]
5. Jodi Vosburg and Anil Kumar the data each deals with. According to the ERP data structures that define transactional
Managing dirty data in CSC manager: data, and for authoring and generating sales
organizations using ERP: In the business right now, those entering the and marketing reports. Marketing services
lessons from a case study
data and those using the data are so confused has been successful in guarding against
Industrial Management & that there is little understanding of the data
Data Systems duplicated records, misspelled names,
101/1 [2001] 21±31 in the system. We are working with users inverted text, missing fields, outdated area
AND the IT departments to share the
codes and ZIP codes, and other kinds of dirty
knowledge about the entered, calculated AND
extracted data. Without this, we are, and have data in customer and salesforce master data
been, subject to interpretation of a field with a by employing combination of user training,
title meaning different things to those well-defined procedures, and tight control
entering versus using the data. We are and auditing of additions, changes, and
finding how difficult it is to deal with a deletes. Every CSR received at least four
program written in another language, as field hours of training on the use and import of
translations have always assisted users and this master data. During that training, CSRs
IT people in the past. In our ERP, there is no were asked to review the master data for their
such extra help available for those looking for
assigned customers and to advise marketing
field definitions and understandings.
services of any necessary changes. New
Lesson: Champions of the ERP master data requires the completion of a form
implementation project should ensure that all to ensure all necessary information is
users understand the organizational data in a provided. Only two people in the marketing
manner that is consistent throughout the services department do the actual addition of
organization.
the new data to ERP. In addition, an audit
report is run regularly to identify changes
5.3 Ownership of data and responsibilities made to the data. This report helps to catch
Responsibility for ensuring data integrity
mistakes and identify where additional
belongs to all employees. Tayi and Ballou
training is required. A data steward, who is
(1998) comment: ``The capability of judging the
responsible expressly for protecting data
reasonableness of the data is lost when users
integrity, should support the efforts of the
have no responsibility for the data's integrity
CSRs and the marketing services department.
and when they are removed from the
This data steward would be responsible for
gatherers.'' Atre (1998) points out: ``IT staff
raising awareness about data issues and
need help and cooperation from business
implementing systematic procedures for data
users to identify and cleanse operational data.
auditing and user training.
Users should be primarily responsible for Lesson: Ensuring that all stakeholders of an
determining the business value of data. Don't ERP system understand their responsibilities
rely on systems integrators ± they don't with respect to maintaining data integrity
understand the business value of the data.'' will lead to a better quality system. Data that
One has also to consider the ``politics'' which is a part of an ERP system belongs to an
play an important role. Often managers may organization and not to any individual
agree that they own the data, but may want department or user.
everybody to be involved in cleaning it. The
manager of the CSC commented: 5.4 Migrating legacy data
I believe that data integrity is the Ruber (1999) comments, ``Migrating
responsibility of every company employee. information from departmental databases
All positions, all departments are responsible and transaction-processing systems . . . is a
for insuring the data they are entering, daunting task.'' He goes on to say that the
reviewing or utilizing is error free. It is the ``hardest part is cleansing the data, yet people
responsibility of every manager to make sure tend to underestimate that part of the
the tools are in place to insure data integrity
process.'' Legacy systems in corporations,
for the data they are responsible for. . .. In the
past, users relied on the IT departments to
which were created in different generations,
make sure the edits were in place to make the create major stumbling blocks for migrating
data correct. With ERP systems and more data to integrated application systems. Quick
user controlled systems and input, it is a joint fixes that become embedded in the case of
responsibility. Users must understand legacy systems create complexities that are
systems better and IT personnel must difficult to overcome. Most of these systems
understand business problems better in order are usually lenient with the data that is
for them to work together to achieve the maintained, resulting in lack of data
highest level of data integrity. Too many IT standards or documentation in the form of
people are good programmers but not good
metadata. Before this data is migrated there
business analysts.
is a need to clean it. An effective strategy for
The marketing services staff is responsible companies planning to implement integrated
for maintaining customer and salesforce applications, such as ERP, may be to use
master data, for testing and maintaining the automated tools for cleaning legacy data
[ 25 ]
6. Jodi Vosburg and Anil Kumar before integrating it. Tools provided by Customer master data was loaded
Managing dirty data in vendors such as id.Centric, Vality programmatically initially. Customer master
organizations using ERP: data includes addressing for billing and
lessons from a case study Technology, HarteHanks, etc. (Knowles, 1997)
may benefit organizations significantly. shipping, tax identification numbers and
Industrial Management &
Data Systems Sales of such tools used for data extraction, designations of customer type and pricing
101/1 [2001] 21±31 refining and loading, was expected to reach levels. Migration from legacy systems to the
$210 million by the end of 1999 (Kay, 1997). ERP has allowed marketing services an
The initial ERP implementation involved a opportunity to scrutinize and clean data
programmatic load of legacy sales order maintained about our customers and sales
backlog onto the ERP. The order load force. More stringent master data
program was developed and tested over a requirements in the ERP, in fact, made this a
period of months by a programmer familiar necessity. For example, the legacy system
with the organization's business practices had used an address in Varnons, Georgia, for
and a team of users. The load was simplified one customer for years. This address was
by the fact that the legacy system was well kicked out in the programmatic load of
supported. That support meant not only that customer master data. On investigation, it
data to be converted was relatively clean, but was discovered that Varnons was not a city,
but a stop on a railway line. The ERP will
also that the data in the legacy system was
determine the zip code and county given a
well-defined and understood ± a program
city and state. This not only ensures that the
could be written to capture only relevant
city and state are entered accurately, but also
data. Unfortunately, the idiosyncrasies of the
that the customer has provided a valid city
order entry process for the various product
and state combination. Customer data moved
lines and of the ways in which CSRs entered
from the legacy to the ERP system were
the orders meant that no program could
relatively cleaner than they had been on the
convert the data without some errors.
well-maintained legacy system.
Because the data integrity of the orders was
so important, each converted order was Migrating poorly-maintained legacy data
reviewed on the ERP by the responsible CSR. Atre (1998) comments that ``you are likely to
Many data errors were caught and corrected run into problems such as incompatible data
during this review, including item quantity, formats, codes that no one can decipher, data
material number, and ship to errors. But that's embedded in long text fields,
some were missed. A tremendous amount of overlapping customer records from multiple
time has been spent and is still being spent to systems, some with redundant data and
correct these errors. One of the most common others with conflicting or outdated data and
errors involved contract release orders. The even chunks of mystery data of long-
program designed to convert the data forgotten provenance and uncertain
ownership''. Weston (1998) suggests using
somehow selected and input the wrong
flags for dirty data that is migrated. As a
material number into the converted order.
result, a decision-maker can decide if he/she
The CSRs, possibly tired after consecutive 12-
wants to use the information or leave it out
hour days of data verification, missed many
during data analysis activities. The customer
material errors. These kinds of errors,
and salesforce master load for the migration
though, are always found eventually ±
from a much less well-maintained system
usually by the customer. The results of these
was a more tedious and difficult procedure.
errors were: shipment of the wrong
Keeping that data clean on this legacy system
materials; angry customers; time spent was never a priority. The data was entered by
investigating the error; cost of processing the order entry group, as there was no
credit orders and replacement orders; position assigned to the management and
expedited production of the correct materials control of master data for this system.
(resulting in late shipments of other orders); Misspellings, duplicate records, and
transportation costs for returning the wrong inconsistencies were the result of a lack of
units; and/or cost of scrap or storage. No control over who could add, change, or delete
attempt has been made to assign a dollar customer master data, of instructions for
value, though, to the results of this dirty data. proper management of the data, and of
Overall, though, this data migration was a auditing procedures. The problems were
success ± the inevitable data errors were exacerbated by the fact that, when the
identified, some sooner, some later, and company purchased this facility, a
corrected. The success was due in large part completely new group of users began to enter
to the fact that the legacy system was well this data. A lack of shared definitions of the
supported, the migration process was well components of the master data and their uses
tested and documented, and those closest to increased the number of discrepancies and
the data verified the data after the migration. errors. Where the original group might
[ 26 ]
7. Jodi Vosburg and Anil Kumar define a salesperson as a customer or a and some in a closed status, some in an open
Managing dirty data in vendor or an agent, the company group status. Attempts to suppress this data on the
organizations using ERP: defined the salesperson as an agent only. conversion order might have, without
lessons from a case study
Subsequently, where there was no agent-type extensive testing, resulted in inadvertent
Industrial Management &
Data Systems record for a particular salesperson, one was suppression of materials that should be
101/1 [2001] 21±31 created, thereby creating the potential for converted ± the Miami order entry location
inaccurate reporting of sales data. Thus, uses freight items not to communicate
before any master data could be moved to the shipment information, but to charge the
ERP, each record had to be manually customer.
reviewed. The marketing services group This project, though, was also a success.
again handled this process. Spelling errors, While the manual conversion presented an
duplicate records, and incomplete data were opportunity for entry error, the process was
addressed before the data was loaded largely error free. This can be attributed to
to the ERP. the extensive testing of the backlog report
The sales order and production data on this serving as the basis for the conversion,
system had been subject to inexplicable simple comprehensive check-list style
changes. For example, in November of 1998, instructions for the CSRs in the use of the
the order entry group started to notice that backlog report, and, most importantly, a
some items on orders were being closed by group of CSRs now more comfortable and
the system for no apparent reason. Thus, experienced in the use of the current ERP.
they would never be built or shipped. The in- Again, migration to the new ERP was a boon,
house support could not identify the cause or because it drove the process of examining
propose a solution, nor could the and cleaning current data.
manufacturer of the software. The in-house Lesson: Migrating dirty data is a challenging
support group advised CSRs to address these task. Use of automated tools is a good strategy
for organizations planning to implement
system-generated cancellations as they
integrated application systems. The most
happened ± a virtually impossible task. After important factor is that the data needs to be
much discussion, the support team agreed to cleaned before it is migrated to an ERP
write a report to locate these items. system.
The data on this legacy system was not well
supported or understood. The data was in 5.5 Recognizing the complexity of
such poor condition that sales and marketing integrated data
reports generated from system data were The integration of several business functions
virtually useless. For example, the Canadian on a single system holds tremendous
order entry location might enter orders using potential for reporting. All transactional data
the same customer master record for is now available from one source. Reporting
different customer locations by overwriting that was difficult or not feasible in the past is
the sold-to-address text to reflect the different now possible. This consolidation of functions
location addresses. The domestic location onto one system has forced the various units
would add new customer master records for of the business to develop a greater
each customer location. Existing reports understanding of the work done by other
could not accurately reflect these units of the business and their interpretation
contradictory approaches. of the data. With this potential, though,
These factors combine to make a comes increased complexity. Tayi and Ballou
programmatic migration of the sales order (1998) point out ``personnel databases
data to the ERP infeasible. Instead, sales situated in different divisions of a company
orders were manually loaded onto the ERP by may be correct but unfit for use if the desire
CSRs using an expanded backlog report. The is to combine the two and they have
lack of understanding of the way the system incompatible formats''. Kilbane (1999) says
stores data, coupled with inaccuracies and ``the problem is that data is, too often, in
inconsistencies in order entry and different formats and companies don't know
processing, made the writing of this report how to properly bring it together and turn it
very difficult. For example, the initial run of into actionable information''.
the report included thousands of freight Locating data tables within the ERP system
items. Freight items are added to sales orders appropriate for the intended reporting has
by the shipping department to indicate turned out to be more tedious and difficult
carrier and shipment date of materials on the than anyone imagined. Reports used by the
order ± they are not backlogged. These were salesforce and in manufacturing to describe
difficult to suppress in the report because of sales order backlog have been found to be so
the inconsistent ways in which they have error-ridden that they have been totally
been added to the sales orders ± some were scrapped and rebuilt. Reports meant to
loaded as text items, some as freight items, describe incoming businesses took months to
[ 27 ]
8. Jodi Vosburg and Anil Kumar write. Several iterations of these reports suspicions were confirmed in March, when it
Managing dirty data in were developed before the set currently in was discovered why incoming business
organizations using ERP: circulation was completed. numbers seemed too high. CSRs had been
lessons from a case study
The information analyst describes an error entering the sales credit designation on
Industrial Management &
Data Systems that she stumbled across while researching orders more than once. Whenever a new item
101/1 [2001] 21±31 another reporting data discrepancy. It seems is added to an existing ERP sales order, the
that the same incoming business report was ERP returns an error indicating that sales
run for the month of March on April 1 and credit is missing. The correct action is to
then again on April 3. She noticed that the activate the existing sales credit designation
totals were different. This should never occur on the order for the new items. This problem
± once the month is closed, no updating was not anticipated or clearly understood.
should occur. She indicates that locating the Thus the correct handling was never made
cause of a problem like this is difficult and part of the ERP training for CSRs. So, CSRs
time consuming and sometimes proves to be generally entered an additional sales credit
impossible. In this case, though, they were designation with each addition to an order.
able to locate the source of the problem ± the Some orders showed a sales credit allocation
reporting structure was identifying the of 400 per cent or more of the net value of the
wrong date field as the determinant for order. The sales credit numbers are also used
which month a particular type of order would to report incoming business. In total, this
be allocated to. The correction of this data entry error resulted in an eight million-
structure error is perhaps more tedious than dollar overstatement of incoming business.
finding the cause of it ± the field reference Because this affected incoming business and
must be changed in more than 100 places in not shipments or production, the cost was
each of the several data structures. These and minimal financially. However, sales
other integrity problems detected in the early managers were forced to adjust sales
going have meant that several manual engineer bonuses downward as a result of the
adjustment schedules must be published with discovery.
each run of this report ± the data cannot be This data cannot be corrected on the ERP
cleaned. The information analyst attributes system. All adjustments had to be handled
these errors to a lack of comprehensive manually. Some preventative measures were
testing of the updating that occurs when immediately put in place. In the short term,
these orders are processed. She sights a lack additional training was provided to the
of communication between those that people who enter orders to make them aware
understand the way the company accrues of the impact of this error. A daily report is
and processes data and those responsible for being run to identify these errors as they are
building the data definition structures. As a made, allowing on-line corrections. In the
result, some basic assumptions were made in long term, the ERP configuration changes
the definition of data that were incorrect. have been requested to eliminate the
The complexity entailed by system misleading error message and to add
integration is compounded by the marketing messages when more than 100 per cent of the
services staff's inexperience with the selected value of the order is allocated as sales credit.
reporting bolt-on, the ERP data structures, According to the manager of the CSC: ``The
and the architecture of the data itself. Basic problem might have been prevented if we all
reporting requirements to operate the knew how to test wrong. In all the massive
business, coupled with this inexperience, testing done on order entry and reporting on
have resulted in an inordinate reliance on it, not enough was done to try to enter bad
consultants for report writing. While these data. Some of the edits seemed so self-evident,
consultants are skilled in report writing and that there lack was almost impossible to
the integration of ERP, their lack of comprehend. I think we are just now learning
understanding of company business and the how important understanding and testing for
transactional data and processes, and dirty data is in a truly integrated system.''
subsequent ERP configuration changes, has Lesson: Test, test and test again. Testing is a
impeded accurate reporting. crucial aspect of implementing ERP
Lesson: It takes time for users to comprehend solutions. There should be no short-cuts in
and use integrated data as a result of using testing. Different user groups should be
ERP packages. Care should be taken to ensure involved in the testing process to ensure that
that all users understand the concept of all possible scenarios are used for testing the
integrated corporate data and use it ERP system before the conversion to ERP is
accordingly. implemented.
5.6 Testing the new system 5.7 Training
The costs of insufficient testing prior to Lack of proper training can frustrate users
implementation can be very high. Months of when they begin using an ERP system in an
[ 28 ]
9. Jodi Vosburg and Anil Kumar organization. Caldwell and Stein (1998) point Even marketing services, though, does not
Managing dirty data in out the example of Amoco, where ``managers have a system in place to check data
organizations using ERP:
lessons from a case study found SAP so unfriendly they refused to use it. regularly for problems. The information
Few [of our] people use SAP directly because analyst indicates that the department spends
Industrial Management &
Data Systems you have to be an expert''. The authors further so much time ``putting out fires'' that there is
101/1 [2001] 21±31 comment that in the case of Owens Corning, little time left over for carrying out
the organization found out that ``the cultural systematic data checks. The problem is
and organizational impact on IT organizations exacerbated by a lack of tools for auditing.
is a little short of revolutionary''. The entry The information analyst attributes this to the
and extraction of dirty data can be prevented newness of the implementation.
with greater dedication to initial and on-going At present, data integrity is protected
training for those responsible for entering and through a combination of system safeguards,
extracting data. A lack of time is typically user training, and data entry procedures.
sighted as the reason for inadequate training. System safeguards are the result of building
The time required to investigate, understand, data integrity rules into ERP. For example,
correct, and prevent problems due to dirty ERP will prevent a CSR from entering a ship
data is considerably more, though, than that to address in a sold-to-field. This is a hard
required simply to understand and prevent error, preventing saving of the data. Soft
those problems. The additional cost of this error messages give the CSR the opportunity
reactive approach is the loss of shareholder to review potentially erroneous data.
confidence in the system, employees, and Additionally, many fields are populated from
data. A significant training effort was put into drop-down boxes, eliminating the chance for
teaching those that would be using and misspelled entries or entries outside the
entering data in the system. Each CSR acceptable domain for the field.
received in excess of 50 hours of training in Lesson: Organizations should emphasize that
meaning and population of the various fields maintaining data integrity is an on-going
comprising the order entry screens. Order process and everybody needs to play an active
entry procedures are documented in detail role. Maintaining data integrity does not stop
and available to all CSRs. The difficulty lies in with the implementation of the ERP system.
knowing how much training is enough ± a
difficult question to answer at conversion 5.9 Using consultants
Care must be taken to ensure that if
time, given the consultants' lack of
consultants are hired for the transition
understanding of the particular business and
the employees' lack of understanding of the project, the internal stewards of the system
new system and the potential problem areas. understand their work. For example, in this
There is no question, though, that additional company, consultants were responsible in
training will be required after large part for developing data structures for
implementation to address the numerous the new system, and form the system
unanticipated problems that will arise. metadata. These structures are used in
Lesson: On-going training is a prerequisite for conjunction with raw data to define the
success in implementing ERP systems. context of the data and to ensure that data
Organizations should plan ahead of time to reported is consistent with what is intended
train all users before and after the or required. For example, a structure may
implementation. Periodic exchange of ERP define incoming business as the value of the
experiences by users in an organization from selling prices on sales orders not including
their work environment will go a long way. items that have been cancelled. Thus reports
providing incoming business data will not
5.8 Prioritizing data maintenance include cancelled items. Direct involvement
According to the CSC manager: ``Data
by the manager of the CSC and the marketing
integrity is assigned a high priority at the
services staff throughout the development,
management and IT level. It is not as high a
ensured that data structures defined by the
priority at the middle manager and lower
consultants matched data the way users of
levels, as worrying about data integrity can
that data defined it. This prevents the
slow down production, order entry, shipping,
possibility that once the consultants leave
etc..'' The information analyst and the CSR
the project, the users of the system
expressed similar opinions when asked about
understand the data that is being processed
the prioritization of data integrity at CPS.
by the system.
The information analyst indicated that data Lesson: Hiring consultants to assist with the
integrity was critical in the marketing ERP implementation is an effective strategy if
services department, but prioritized much organizations ensure that all work done by
lower in departments dealing with the day-to- the consultants is understood and
day operations. documented. The ERP implementation
[ 29 ]
10. Jodi Vosburg and Anil Kumar knowledge should not leave the organization problems should be systematically
Managing dirty data in after the consultants work is completed. documented and stored so as to be easily
organizations using ERP:
lessons from a case study accessible to interested users. If a similar
5.10 Post-ERP implementation problem occurs, documentation of other
Industrial Management &
Data Systems Counteracting and preventing dirty data ± similar instances would be readily available.
101/1 [2001] 21±31 current perceptions and practices Where necessary, the communication should
Data entry procedures have been created to be followed up by training.
control the potential damage accruing to Regular training sessions should also be
dirty data. For example, CSRs are required to scheduled to ensure that users understand
place a production block on orders for some data integrity concepts and methods. These
material types. This gives the product group sessions would not only build a shared
marketing departments an opportunity to interpretation of data and preferred
review the order and correct any errors processing methods, but would also foster a
before production begins. Taken more global perspective on the part of the
individually, this procedure seems like a users ± instead of seeing only their own role,
reasonable safeguard. Taken together with users would see their role in the context of
all of the other exceptions and qualifiers to the business. This perspective would assist in
the basic order entry procedure based on paring down some of the current procedural
material type, or product line, and order complexity. Simpler procedures would
type, though, the procedures begin to seem further increase data accuracy.
like the source of the errors rather than the Equipped with an understanding of the
way to avoid them. As the procedures grow impact of their work on other areas of the
more complex, the likelihood of entering data
business, users can be analysts rather than
accurately and consistently drops.
data entry clerks. Analysts can make good
In almost every department within the
decisions in a complex and dynamic work
business, the increased complexity of
environment. Broadly-trained analysts
performing the job has meant more time
would also be in a position to work effectively
required to do the same work. This means
with consultants thus reducing our reliance
even less time and attention to data integrity
on them.
issues and more dirty data ± where someone
Performance measures should be taken
may have taken the time to find out what an
regularly to gauge the effectiveness of and to
error message means and to address the data
improve on the system and training
entry error prior to the implementation of
initiatives. All of these measures would
ERP, now they may pass the error without
directly improve data integrity and would
addressing it because of the work backlog.
serve to underline the importance of data
Out of necessity, though, where data
integrity to all users. These measures would
integrity is compromised, user involvement
reduce errors in carrying out tasks
into testing and reporting procedures is
throughout the business and all their
increasing. The correction will begin with an
associated costs and help to draw a sharper
investigation by system/support and/or
marketing services of the problem. Then, picture of the business to improve long- and
reporting tools will be generated to find all short-term decision-making.
instances of the error. Finally, users will be
enlisted to implement corrections. CSR
involvement in the corrections is critical 6.0 Conclusion
because of their intimacy with the data and Implementing ERP systems requires
as a training tool ± those repeating the error reinventing the business. Several legacy
most frequently will have the most systems are integrated in the process with a
corrections to make. The more corrections single integrated system for managing
that the CSR is required to make, the greater operations across the organization. Data that
the likelihood that they will be able to avoid resided in dozens of disparate sources is now
the mistake in the future. available through one integrated system for
Counteracting and preventing dirty data ± all users in an organization. To achieve
areas for improvement success in ERP systems implementation,
A systematic approach should replace the project champions should make sure that
more reactive crisis management approach they address the relevant issues. Some of the
to data integrity. Data audits should include key lessons from this study include, among
daily integrity checks within the system and others, the following issues:
regular audits performed by user groups. . The champion of the ERP implementation
Problems uncovered in those audits should project should ensure that the
be shared with all affected parties. The transformation is not viewed as an IT
causes, effects, and resolutions of those initiative, rather a business necessity.
[ 30 ]
11. Jodi Vosburg and Anil Kumar This requires educating the stakeholders from this case study would be valuable for
Managing dirty data in about the transition to an ERP. organizations planning to implement ERP
organizations using ERP: . The champion for the change to ERP systems.
lessons from a case study
should recognize the value of data as an
Industrial Management &
Data Systems organizational resource and educate users References
101/1 [2001] 21±31 about it. The issue of sharing corporate Atre, S. (1998), ``Beware dirty data'',
data and assigning responsibilities for Computerworld, Vol. 32 No. 38, pp. 67-9.
managing it should be done with a view to Caldwell, B. and Stein, T. (1998), ``New IT agenda'',
avoid any political issues arising from Information Week, No. 711, November, p. 30.
owners of disparate data sources. Ferriss, P. (1998), ``Insurers victims of DBMS
. The ERP implementation should be fraud'', Computing Canada, Vol. 24 No. 36,
planned to prepare users for the change. 28 September, pp. 13-15.
The expectations based on new Greengard, S. (1998), ``Don't let dirty data derail
responsibilities should be outlined you'', Workforce, Vol. 77 No. 11, November,
upfront to avoid any conflicts. pp. 107-8.
. The user community should be given time Horwitz, A.S. (1998), ``Ensuring the integrity of
your data'', Beyond Computing, Vol. 7 No. 4,
to accept the changes in their work
May.
environment to minimize the impact on
Kay, E. (1997), ``Dirty data challenges warehouses'',
organizational culture, such as
Software Magazine, October, pp. S5-S8.
overcoming comments like ``we've always
Kilbane, D. (1999), ``Are we overstocked with
done it this way''. Users should be
data'', Automatic I.D. News, Cleveland, OH.
encouraged to use the new system by
Vol. 15 No. 11, October, pp. 75-9.
providing incentives. Knowles, A. (1997), ``Dump your dirty data for
. All data that is migrated to an ERP system added profits'', Datamation, Vol. 43 No. 9,
should be cleaned before the migration. September, pp. 80-2.
Automated tools for data migration can be Redman, T.C. (1995), ``Improve data quality for
very useful for companies. competitive advantage'', Sloan Management
. Training users on a continual basis is Review, Cambridge, Vol. 36 No. 2, Winter.
very important. It is important that users Ruber, P. (1999), ``Migrating data to a warehouse'',
do not get bogged down by activities that Beyond Computing, November/December,
take up too much of their time. pp. 16-20.
. Extensive testing is required for Sellar, S. (1999), ``Dust off that data'', Sales and
implementing ERP systems. A good strategy Marketing Management, New York, NY,
would be to phase-in the implementation Vol. 151 No. 5, May, pp. 71-3.
rather than making a direct conversion. Stankovic, N. (1998), ``Dual access: lower costs,
. Consultants experienced with ERP tighten integration'', Computing Canada,
implementation can be very helpful. Care Vol. 24 No. 27, July, p. 30.
must be taken to ensure that all the work Tayi, G.K. and Ballou, D.P. (1998), ``Examining
done by consultants is documented for data quality'', Communications of the ACM,
future use. New York, NY, Vol. 41 No. 2, February,
pp. 54-7.
In this paper, we listed and discussed issues Wallace, B. (1999), ``Data quality moves to the
pertaining to ERP implementation. Though forefront'', Information Week Online,
implementation in different organizations 30 September.
can vary based on the organizational culture Weston, R. (1998), ``Using dirty data'',
and business needs we feel that the lessons Computerworld, Vol. 32 No. 22, 1 June, p. 54.
[ 31 ]