A critical discussion on the statement "Enterprises today have access to large amounts of information from internal as well as external sources. The information typically comes in either structured or less structured forms. However, enterprises generally do not make the best use of the information they have access to, tending instead to focus on just internal structured data generated by core transactional systems"
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Unstructured BI in pharmaceutical company
1. K6221 Business Intelligence Mini Assignment
K6221 Business Intelligence 2011-2012
Mini Assignment
Sesagiri Raamkumar Aravind (G1101761F)
Mane Shivaji Dilip Kumar (G1101841A)
“Enterprises today have access to large amounts of information from internal as well as external sources.
The information typically comes in either structured or less structured forms. However, enterprises
generally do not make the best use of the information they have access to, tending instead to focus on just
internal structured data generated by core transactional systems.”
Statement Elucidation
As per the problem statement, even though enterprises have access to plethora of required information around them,
they make good use of the data coming from traditional OLTP systems only and it is restricted to structured content.
Internal and external unstructured data is not leveraged for making business decisions. Wittles (n.d.) asserts that only
20% of an organization’s data is structured and ready for use in BI data analysis. The remaining 80% is unstructured
data. Therefore, the significance of unstructured data is highly underestimated in most enterprises.
Scenario
The authors opt to critically discuss the problem statement based on a particular scenario. The scenario is
“„Marketing Director‟ of a major pharmaceutical company monitoring the performance of a newly launched
potential blockbuster drug in the Asia Pacific region (excluding Japan).”
Discussion
Large enterprises of today rely on enormous and complicated information systems to fuel their growth and help with
their daily operations and sustainability. The amount spent on such systems even reaches billions in certain
companies. In our scenario, pharmaceutical companies inadvertently rely on unstructured data for leading the race
against competitors as studies show that the average company makes decisions based on data that is 14 months old.
It has become clear that companies that can make faster decisions will spearhead that particular market. Strategic
adoption of the IT systems is very critical as it has direct impact to the process of research, development and sales of
drugs (Dave). Enterprises have reached a stable stage with respect to the setup of BI infrastructure that can handle
internal data extracted from different sources such as ERP and CRM systems. Enterprise data warehouses are
updated on a daily basis with transactional data coming from different regions. Data from EDW cascades to
region/domain specific data marts and ODS so as to meet local reporting needs. In totality, EDW provides a good
canvas for supporting transactional and historical reporting needs of MIS, ESS and DSS systems.
A product launch is a major make or break event for a pharmaceutical company as it feels the push to realize
revenue generation through short term and long term strategies so as to fund further R&D activities. A marketing
director cannot afford to rely entirely on transactional data for making sound business decisions. These decisions are
made to increase visibility and saleability of the new drug in a particular market. As a part of the job, the marketing
director would be expecting to get information about different aspects. The table 1.1 provides the details
Page 1 of 4
2. K6221 Business Intelligence Mini Assignment
Sl. Readily
No Information Source Type Available Remarks
Assumption that internal DSS
Sales of drug in each market (split-up by day, has data from all markets at
1 region, distributor etc) Internal Structured Y required frequency
Marketing Cost in each market (by media)- this Assumption that internal DSS is
2 includes free samples Internal Structured Y integrated with CRM systems
Perception about the drug from Doctors, Sales Internal
Personnel, Marketing staff, other internal staff and Can be got only after collation
3 and general public External Unstructured N from different sources
Market Share of new drug by value and volume Can be got at end of every
by each market on comparison to other quarter from market
4 competitor drugs from same therapeutic area External Structured N intelligence firms such as IMS
Assumption that internal DSS
Actuals vs Budget and Actuals vs Forecast has data from all markets at
5 comparison by each market. Internal Structured Y required frequency
'Y' because Readily available in
Details about dept level decisions recorded in repository and 'N' because not
6 documents Internal Unstructured Y and N in integrated state
Table 1.1: Valuable information for pharmaceutical company during drug launch
It is clear that information about some important aspects is of unstructured format. Examples of unstructured data in
an enterprise are HTML content (e.g. web chat, blogs and web pages), Documents (e.g. memos, research papers,
MoMs and articles), Forms (e.g. patent applications), Emails, SMS content and Multimedia content (audio, video,
images) (Ferguson,2011; McCallum, 2005; SPSS, 2003).
Decision makers in a company have to rely on facts to make sound business decisions. The availability of sufficient
and timely facts can help in the process. In this case, the Marketing Director should be able to pull the required data
and the system should have the mechanism to push specific information as well. A distinction is made between data
and information because only information should be pushed to a user as he/she will not have time to analyze plain
facts without any context. Typical examples applicable to this case are listed below.
Pull data: Sales & Expenses data, Market share, and Supply chain inventory data.
Push information: Supply chain deficiencies, summarized content delivery from analytics systems pertaining to
sentiment and opinion about the new drug from internal and external social media platforms, flash updates on sales,
libel cases on new drug from FDA and other sources.
The Push type of information is mostly of unstructured format thereby justifying its importance. Unstructured data
characteristics are visibly and intrinsically different from transactional data. Differentiating factors are mainly
related to representation, source, context, understandability, timeliness and shelf-life. In general, characteristics of
unstructured data are:-
Page 2 of 4
3. K6221 Business Intelligence Mini Assignment
Does not reside in relational database tables.
Has no predefined structure or format.
Not arranged in any order.
Difficult to categorize for use in BI.
Resides in several documents over multiple sources
Internal (data within an organization)
External (data outside the organization)
These characteristics make it difficult for technical personnel to store and catalog unstructured data in an Enterprise
Data Warehouse (EDW) apart from the inherent difficulty in capturing required data. The heterogeneous nature of
the sources adds to the complexity. Typical sources for unstructured data include Email archives, Call center
transcripts, Customer feedback databases, Enterprise intranets, Enterprise content management systems, File
systems, Document management systems, Social networking sites and RSS Newsfeeds (Ferguson 2011:6).
There are techniques for unstructured data to be captured and utilized. Crawlers can be used for capturing relevant
information from enterprise data ecosystem, social media sites and WWW. The captured information is then tagged
and indexed for retrieval purpose. The final stage is the knowledge discovery stage that involves text mining and
web mining (popularly called as content analytics), to derive insight for business benefits.
An ideal BI system should provide the ability to create Enterprise Mashups. Mashups are used to integrate
information sources and functionality from different sources to create new services. These kinds of applications are
more suitable for agile development project thereby suitable to our scenario to look at data from different sources
that help in making decisions. However, there are few challenges to it. Choosing the right information sources
amongst unstructured data and content sifting mechanisms are some known challenges. Mashups are an emerging
trend that is there to stay as it provides a one-stop shop for decision makers.
Future considerations for handling unstructured data
Ensuring that user content is accurately tagged.
Ensure that content is up-to-date and relevant.
Validating content sources.
Identify business drivers to get the best solution.
For scalability issues allocate adequate processing power to analytics.
Figure 1 gives a pictorial representation of the current usage of BI in pharmaceutical companies and the neglected
blue ocean segment of unstructured data BI.
Page 3 of 4
4. K6221 Business Intelligence Mini Assignment
Fig 1: Usage of Business Intelligence in a pharmaceutical company
Conclusion
Enterprises are aware of the importance of unstructured data in current day scenario but they fail to leverage it due
to technical (capturing and storing) and logical (classification and integration) constraints. This situation is bound to
improve with best practices and simpler technical processes. Investment in Content Analytics and Enterprise
Mashups will definitely be realized in the long run.
References
Wittles, G. (n.d.). Unstructured data offers a vast store of untapped BI value . Retrieved from
http://www.themanager.org/strategy/Unstructured_data.htm (Wittles)
Dave , W. (n.d.). Unstructured data in life sciences. Retrieved from
http://blogs.hds.com/storagestat/2011/11/unstructured-data-in-life-sciences.html (Dave)
Ferguson, M. (n.d.). Integrating and analyzing unstructured data. Info 360 BI Conference. Washington DC.
(Ferguson, 2011)
McCallum, A. 2005. Information Extraction. (http://people.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf )
Retrieved 17 February 2011. (McCallum, 2005)
SPSS. 2003. Meeting the challenge for text: Making text ready for predictive analysis. Chicago (SPSS, 2003)
Grimes, S. (n.d.). Nimble intelligence: Enterprise bi mashup best practices. Retrieved from
http://www.jackbe.com/downloads/nimblebi_grimes.pdf (Grimes)
Page 4 of 4