Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Mining

422 vues

Publié le

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Data Mining

  1. 1. Information Is Your Business Volume 15, Number 10 www.dmreview.com data mining How to Buy Data Mining A Framework for Avoiding Costly Project Pitfalls in Predictive Analytics
  2. 2. data mining How to Buy H A Framework How does someone purchase an intangible, cryptic, seemingly immeasurable technology? Beyond the inher- for Avoiding ent up-front risks of engaging in what is essentially a discovery process, just identifying a starting point can be intimidating and mystifying. Despite its elusive nature, Costly Project data mining technology has surpassed the flash-in-the- pan “miracle tool” stigma with widespread and sustained success stories highlighted in mainstream publications, Pitfalls in Predictive along with recurring case studies of improved opera- tional efficiencies, enhanced business intelligence and Analytics residual payback. For any organization with annual rev- enues more than $50 million, employing data mining technology is not a matter of whether, but when. By Eric A. King Data mining has been seeping into mainstream business applications for more than two decades. Numerous case studies may be quickly referenced via a simple Internet or publication search. Its progress is unstoppable, propelled by sustained value justifications - yet stinted by the complexities of development, interpretation, integration and adoption. This article will suggest how to properly approach the starting line and
  3. 3. Data Mining how to implement a purposefully flexible (OLAP) or SQL queries. An example of between July 1 and August 15. For this framework for establishing an efficient OLAP or SQL queries would entail min- query, we know the exact question to ask of and effective organizational data mining ing a large repository to identify females the database. This practice typically explores process. between the ages of 28 and 45 from just 5 to 15 percent of a large database. New York, New Jersey and Delaware For the purposes of this article, data Cutting Through The Buzz with incomes between $65,000 to mining shall refer to computer-aided pat- Let’s first make sure that we’re on the $90,000 who purchased blue slacks tern discovery of previously unknown same page when talking about data min- interrelationships and recurrences across ing. It is not wholly incorrect to label seemingly unrelated attributes in order to data mining as retrospective searches predict actions, behaviors and outcomes. on a large database for specific Simply put, when referring to data mining criteria, otherwise known as in this article, we are looking at predic- online analytical processing tion derived from information hidden within large volumes of data rather than retrospection drawn from an OLAP or SQL query. It is important to recognize and relate much of the popular terminology that is thrown about in order to provide context going forward. Data mining tech- nology is not new. Methods for automating pattern discovery and pre- diction have existed for decades. Despite a considerable level of hype and strategic misuse, data mining has not only perse- vered but also matured and adapted for practical use in the business world. How could a community that is so data-rich, yet information-poor and profit-driven abandon a tool that can validate its own ability to predict customer behavior? Alongside the technology, terminol- ogy has evolved over the last four decades. Names from 40 years ago are still recognizable as common phrases today. In the ‘70s and ‘80s, names such as artificial intelligence and machine learn- ing that implied the computer had its own consciousness were somewhat oversold (perhaps even “over-souled”). The names of various data mining build- ing blocks such as neural networks, genetic algorithms and evolutionary computing deservedly carry Darwinist
  4. 4. tones of natural selection, as the underly- a blend of their perception of what data ing presented in this article has a perfect ing mathematics emulates biological mining is with a standard corporate prac- track record of matching performance to processes. From a mathematician’s per- tice for evaluating and purchasing prod- expectations. Data mining is essentially a dis- spective, these processes may be viewed ucts and services. The result is a popular covery process, which requires a purpose- as statistics on steroids. yet doomed approach: fully flexible framework with numerous In the ‘90s through the early ‘00s, the 1. Collect product literature from data checkpoints for assessment and adjustment. technology has been commonly referred mining tool vendors at industry events Be wary of any vendor who proposes to to as data mining and knowledge discov- or as advertised in journals. deliver a fully implemented data mining ery. However, due to the duality of the 2. Invite vendors whose retail price of their solution without early decision points. term data mining often referring to both flagship product fits within available Consulting firms with reputable names will OLAP and pattern discovery, a shift is discretionary budgets to visit on site. often win sizable data mining contracts and rapidly moving toward far more descrip- 3. Gain a free education in data mining proceed with a weak strategy free of tive and accurate nomenclature such as through subjective presentations at the checkpoints and adjustable stages. The predictive modeling and predictive ana- vendor’s expense (too many are anx- project quickly migrates into an exercise of lytics. In fact, this will probably be one of ious to chase any sales bait, qualified post-justifications, blame casting, contract- the last articles I write using the label data or otherwise). wiggling and backpedaling. mining - which is too mainstream to 4. Purchase a data mining tool from the The following five stages provide a abandon just yet. vendor who presented last. foundation to drive a successful data 5. Throw some data at the tool and await mining strategy and implementation. Just What is Data Mining? magical results. Is data mining considered a service? Is it 6. Stare at the numbers or even visualiza- 1. Training hardware? Software? A scored file? A sys- tions thereof, wondering why an angelic The best results in data mining are tem or a process? A customized solution? chorus did not accompany the results. achieved when a data mining expert com- There does not seem to be a consensus, 7. Without knowing whether the results bines experience with an organizational which makes data mining all the harder are useless or phenomenal, data min- domain expert. While neither needs to be to visualize, define, manage ... and pur- ing is dismissed as hyped and/or pie- fully proficient in the other’s field, it is cer- chase. Two people may discuss data min- in-the-sky technology. tainly beneficial to have a basic ground- ing and have entirely different concepts in The ultimate cost of a failed first pass ing across areas of focus. Even if a data mind. Of course, all of the previously can be tremendous. Not only will the mining project is entirely outsourced, mentioned descriptions are technically organization suffer opportunity costs from substantial advantages await the organi- correct. While the business community value never realized, but competitors will zation whose principals are trained to rec- may appropriately view data mining as a also have a greater window to capitalize on ognize elusive pitfalls, speak confidently productive, value-driven solution, that the benefits. Furthermore, morale will be about data mining methods, appreciate perspective focuses on the destination, adversely affected, which can wreak untold tradeoffs between accuracy and explain- not the journey. If credit were given to the havoc on any organization. ability, collaborate more effectively for best definition of data mining, process Ultimately, data mining will be uti- data preparation and interpret more would score the point. lized by all medium and large accurately the model’s results. Such Viewing data mining as organizations in one form knowledge can also serve well toward a process encompasses all or another. Not evaluating vendors, interacting with proj- the hard and soft employing predictive ect managers and effectively questioning resources, and implies analytics against a any suspect results or methods. a structured yet large repository of Numerous data mining conferences ongoing approach data (which all and public training courses exist. Many to an evolving medium and tool vendors have excellent instructors and optimization prob- large businesses worthwhile courses, particularly for their lem. When viewed have) would be customers. Most times, however, courses as a process, data analogous to offered by tool vendors restrict the scope of mining projects may building drilling content to highlight the capabilities of their be planned and p l a t f o r m s , product(s). Because tools should not be implemented in a pro- pipelines and stor- considered until later in the process, try to cedural way that all but age tanks with no identify vendor-neutral conferences and ensures success. As well, intentions for a refinery. courses to receive an unbiased, broad and expectations should be inher- Although the term data min- nonpromotional presentation. ently leveled to never expect a “final ing may fade, the technology will If staff or time simply does not exist to answer” nor anticipate a single pass. not. If a company makes a failed train internally, consider hiring an inde- When implemented properly, productive approach now, it will only need to repeat pendent data mining expert who may act results should be expected early and con- the attempt later. The question is whether as a liaison and third-party project advo- tinually improved. the organization will repeat its mistakes. cate between your organization and the main project vendor. The consultant should How Not to Buy Data Mining A Best-Practice Approach hold three qualities in combination: It is far too common for organizations to to Data Mining 1. He or she should be well-steeped in adapt their data mining project design to The recommended approach for data min- the data mining process with a strong
  5. 5. track record of application success. implementation. To name a few: do not pass the “so what” test strategically. 2. He or she should be multilingual - G Data Certification: A topical survey In this situation, the client and consultant able to converse fluently with analysts, of the structure and nature of the data stare at each other, wondering whether IT staff, users, directors and executive to support predictive analytics. they arrived at outstanding results or management. G Existing Resources: Additional overall failure. 3. Most importantly, he or she should be tools may be recommended to sup- The DMPA should be created inde- business-oriented, not rushing to ana- port or replace existing products. Are pendently, allowing the organization to lyze the data, but focusing first on the skills available in house to sup- freely choose how the resulting plan will be amassing a comprehensive under- port the modeling process after developed and implemented - whether by standing and assessment of the client’s deployment? What other technolo- the same services company who submitted business model and all available gies or methods have been used in it, another third-party vendor or the client resources, as well as any applicable the past? Are previous performance itself. The consultant who conducts the history, benchmarks and objectives. benchmarks available? DMPA should not incorporate proprietary Whether conducting your project G Stakeholder Objectives: Are components or aspects that internally or outsourcing, it pays to incor- the questions to which subjectively commit the porate the direction of a data mining executives seek answers resulting build-out expert. Working in concert with an orga- aligned with the exclusively to the nizational domain expert, the data mining resources amassed DMPA author. The consultant will form a symbiotic relation- in the findings? value of the ship that provides the benefits of knowl- Are there desired DMPA is all in edge transfer, redundancy and inherent and/or required the strategy, not reinforcement training. This accumulated performance the tactics. knowledge drives well-informed choices, levels? Are the The recom- validates sound judgment and combines benchmarks mendations the perspective that practically assures a realistic from the report from the solid definition of data mining project consultant’s DMPA will produce success, and then achieves it. experience? an overarching G F u n c t i o n a l project plan. Early 2. Assessment Managers: There are stages may be firmly This is the stage in which the true buy for many situations in which priced. However, later data mining occurs. Unfortunately, many companies are either unable or stages may only be estimated organizations are reluctant to engage in a unwilling to take the actions recom- because it cannot be known in advance data mining project assessment (DMPA), mended by the model. (In the words what information will be derived from the because they have been burned on assess- of Jack Nicholson in A Few Good Men, it data and how it should be leveraged. ments by services companies who basi- should be determined in advance if Newly discovered information can drive cally charge to exploit opportunity from “You can’t handle the truth!”) the remaining project in slightly different their client. When done properly, however, G Constraints: Are there hard bound- directions. Most times, there is not a the DMPA is an essential component of a aries that must be identified and built significant departure from the overarching successful data mining project. into the decision process - either before plan, but it is not realistic to ever fix the From the client’s perspective, any or after the model’s implementation? price of a data mining project beyond a few assessment is risky. The value of the results Because virtually all data mining near-term tasks. This does not make data is unknown in advance. A full data mining methods present a tradeoff between mining any easier to buy, but risk cannot implementation cannot be estimated in accuracy and explainability, a point be effectively managed without an dollars or time prior to this exercise. There on the scale should be defined. What adjustable, staged approach to a project are far too many unknown factors that can are tolerable levels of false positives that by its nature is about discovery of the dramatically affect the approach and scale or negatives from the model? unknown. A flexible structure and repeat- of a data mining project. Further, a DMPA G User Buy-in: If they won’t adopt it, able process with sound guidelines must be may reveal that an organization is not even why build it? How may the system be designed to effectively manage discovery. at the starting line - thus saving substantial designed to encourage dedicated use? time and money resources by preventing a G IT Support: While usually not a deal- 3. Strategy premature project. When performed by a killer, IT is typically far more willing to Data mining strategy is far too often over- reputable services company, this aspect of support the model’s function when looked or retrofitted to a resulting model. the assessment can arrive at the precise they are included in the strategy and Nearly all neophytes to the data mining opposite of exploiting opportunity by sav- are invited to become data mining process are anxious to run straight to the ing needless effort and expense in advance. advocates. If IT is going to support data and push whatever is readily avail- The DMPA should offer a compre- another project that requires data able into an analytical tool. While modern hensive situational report of findings that access, it helps if they can also appre- tools may help to some degree in data support a draft overarching plan (later ciate the high-level vision and benefits preparation, exploration and visualiza- described as the recommendations report). to the organization. tion, even the best tools on the market The findings report should manifest the Without the DMPA exercise, some cannot anticipate, interpret or implement readiness of numerous factors that need to modeling projects can be carried to com- around environmental and political be present for a successful data mining pletion tactically, but the results ultimately aspects of model integration. Moreover,
  6. 6. the modeler may take poor results and outset, too much industry expertise can sidered at this milestone, encouraging a proceed as though they were superior, or introduce subjectivity and preconceived progression of knowledge transfer while obtain great results (thanks to modern notions that may skew the way models pushing model innovation. software’s automation and wizards) and are developed and interpreted. While this article has presented not know it - essentially building excellent Models by their nature are objective, frameworks for both failing and succeed- models that answer the wrong questions. and the consultant should be, too. The ing in data mining, the most critical phase Most of the strategy framework is best results are achieved when the data is the data mining project assessment. All established during the DMPA. As discover- mining expert drives the model-building other aspects of the process are forgiving ies unfold and unforeseen information from process but not the results. The data min- and easily repairable. Foregoing a data the data is interpreted, the strategic direction ing consultant should then work with mining expert’s comprehensive situational may adjust somewhat, but usually not dra- your organization’s domain expert to and goal-driven assessment is a costly matically. This is why planning for a flexible mutually interpret the results, validate way to arrive at the false conclusion that framework is such a critical component for them and determine the most effective predictive analytics is overrated. a successful data mining implementation. way to make them useful. Once you have gained a base educa- Any vendor who claims to have a complete tion in data mining strategy and methods framework to fit your organization’s overall 5. Iteration and have commissioned a thorough project situation and forego a DMPA should be Many industry standard and best practice assessment, you will be well on your way regarded with some suspicion. The purpose process diagrams show data mining as a toward data mining success. When the of the DMPA is to assess the overall situa- linear process, ending with deployment. starting line is measured and the right tion and resources for data mining and Rather, an ongoing process toward ana- framework is established, the rest of the draft the overarching strategy to direct the lytic enlightenment is a more realistic journey is relatively straightforward. You project to completion - and beyond. expectation and effective mind-set from may proceed through the discovery process which to work. with a structured yet flexible plan for nearly 4. Implementation As part of the assessment stage, a feed- any scenario, confident that you will reap Thanks to automated software with effec- back strategy should be derived to capture tremendous rewards for having taken the tive wizards, the implementation is valuable performance results (in marketing, right approach to data mining. DMR arguably the easiest and least risky part of this is referred to as a solicitation file). Not a full-scale data mining project. It is far only is the results data used for model val- Eric A. King is president and founder better to have a mediocre model with idation, but it is also valuable fuel for the of The Modeling Agency (TMA). solid strategy than the inverse. next iteration of model building. The model TMA is a structured team of senior One misconception about selecting should be updated and enhanced with the consultants that provides training and consulting in an external data mining consultant is that latest performance data, perhaps even predictive modeling for those who are data-rich, yet information-poor. King holds a BS in computer science the consultant should also be an expert in weighting the latest feedback more heavily from the University of Pittsburgh and has focused on data your industry. It may be helpful for the to encourage a greater recency effect for mining business development and project management consultant to have background in order novel behavioral patterns. since 1990. Prior to TMA, King worked for NeuralWare, a to speak and interpret industry lingo The closing of this loop from results neural network tools company, and American Heuristics Corporation, an artificial intelligence consulting firm. while appreciating the competitive envi- interpretation to model update is an He may be reached at eric@the-modeling-agency.com or ronment and primary drivers, but unlike excellent opportunity for reinforcement (281) 667-4200 x210. building a knowledge base, it is actually training. Reconvene with your data min- preferable not to have the industry’s ing trainer or consultant to review the Editor’s Note: The Modeling Agency offers on-line, on-site, and public vendor-neutral courses in strongest domain expert who also hap- first pass and prepare for the next itera- predictive modeling for practitioners and managers pens to do some data mining. While the tion. Advanced or alternative approaches who are ready to implement data mining solutions. For consultant may appear impressive at the for the next model update should be con- info: www.the-modeling-agency.com/dmr-special ©2008 SourceMedia, Inc. and DM Review. All rights reserved. SourceMedia, One State Street Plaza, New York, N.Y. 10004 (800) 367-3989