SlideShare a Scribd company logo
1 of 6
Download to read offline
Semantic Web Investigation within Big Data
Context
Murad Daryousse
Damascus University-FITE
Abstract
Data is everywhere; nearly everything can be represented
by a number. In addition, in its simple form, data is pure,
a collection of measured information that, when analyzed
and processed, tells a story backed by numerical truth. On
other hand challenges associated with (5V’s) volume,
variety, velocity, veracity, and value of this data need to
be addressed when we process, analyze, and ultimately
derive insight from data. Data that characterised by this
5V’s is called “Big Data”, and we discuss in this research
how the Semantic Web - as a platform - can be utilized
to address challenges that associated with each of Big
Data characteristics. We organize our work as a state of
the art of works and researches in the same context.
Keywords: Big Data, semantic web, linked data, state of
the art.
1 Introduction
Recently, Big Data has made its appearance in the shared
mindset of researchers, practitioners, and funding
agencies, driven by the awareness that concerted efforts
are needed to address 21st century data collection,
analysis, management, ownership, and privacy issues.
While there is no generally agreed understanding of what
exactly is (or more importantly, what is not) Big Data, an
increasing number of V’s has been used to characterize
different dimensions and challenges of Big Data: volume,
velocity, variety, value, and veracity. Interestingly,
different (scientific) disciplines highlight certain
dimensions and neglect others. For instance,
supercomputing seems to be mostly interested in the
volume dimension while researchers working on sensor
webs and the internet of things seem to push on the
velocity front. The social sciences and humanities, in
contrast, are more interested in value and veracity. The
variety dimensions seems to be the most intriguing one
for the Semantic Web and the one where we can
contribute most as a research community (Hitzler, et al.,
2013).
At the end, all V’s have to be addressed in an
interdisciplinary effort to substantially advance on the Big
Data front. The 4th Paradigm of Science is yet another
notion that has emerged within the last years and can be
understood as the scientific view on how Big Data
changes the very fabric of science. With the omnipresence
and availability of data from different times, locations,
perspectives, topics, cultures, resolutions, qualities, and
so forth, exploration becomes an additional (4th)
paradigm of science. This raises synthesis to a new level.
In other words, we can gain new insights by creatively
combining what is already there – an idea that seems to
align very well with Linked Data and Semantic Web
technologies as drivers of integration (Hitzler, et al.,
2013).
2 Characteristics of Big Data
We discuss the primary characteristics of the Big Data
problem as is pertain to the 5V’s.
2.1 Volume
Volume dimension of Big Data relates to the size of data
from one or more data sources in Tera, Peta, or Exabyte
(Anjomshoaa, et al., 2014). The sheer volume of data
being stored today is exploding. Of course, a lot of the
data that’s being created today isn’t analyzed at all
(Eaton, et al., 2012). Some expectations point to this
number to reach 35 Zettabyte (ZB) by 2020. Twitter
alone generate more than 7 terabytes (TB) of data every
day, Facebook 10 TB, and some enterprises generate
terabytes of data every hour of every day of the year
(Eaton, et al., 2012). We are going to stop right there with
the factoids: Truth is, these estimates will be out of date
by the time you read this paper. However, availability of
fine-grained raw data is not sufficient unless we can
analyze, summarize or abstract them in meaningful ways
that are actionable (Thirunarayan, et al., 2014).
However, we still need to investigate how to effectively
translate large amounts of raw data into a few human
comprehensible nuggets of information necessary for
decision-making. Furthermore, privacy and locality
considerations require moving computations closer to the
data source, leading to powerful applications on
resource-constrained devices. In the latter situation, even
though the amount of data is not large by normal
standards, the resource constraints negate the use of
conventional data formats and algorithms, and instead
necessitate the development of novel encoding, indexing,
and reasoning techniques (Thirunarayan, et al., 2014). In
summary, the volume of data to be processed on available
resources creates the following challenges: (1) Ability to
abstract the data in a form that summarizes the situation
and is actionable, that is, semantic scalability (Sheth,
2011)(Sheth, 2013) to transcend from fine-grained
machine-accessible data to coarse-grained human
comprehensible and actionable abstractions; and (2)
Ability to scale computations to take advantage of
distributed processing infrastructure and to reason
efficiently on mobile devices where appropriate.
2.2 Variety
Data today exists in various formats like texts, images,
videos, audios, relational data, and so on. Quite simply,
variety represents all types of data (Eaton, et al., 2012),
with explosion of sensors, and smart devices, as well as
social collaboration technologies, data has become more
complex, because it includes not only traditional
relational data, but also raw, semi structured, and
unstructured data from web pages, web log files
(including click stream data), search indexes, social
media, e-mail, documents, sensor data from active and
passive systems, and so on. A fundamental shift in
analysis requirements from traditional structured data to
include raw, semi structured, and unstructured data as a
part of decision-making and insight process. So
traditional analytic platforms cannot handle variety
because it designed for only handle traditional structured
(mostly relational) data. The truth of the matter is that
80% of world’s data is unstructured or semi structured at
best (Eaton, et al., 2012). However, the value of Big Data
can be realized when we able to draw insights from the
various kinds of data available to us, which include both
traditional and nontraditional. On the other hand,
available knowledge that can be drawn from data has a
mix of declarative and statistical flavor, capturing both
qualitative and quantitative aspects that when integrated
can provide complementary and Corroborative
information (Sheth, et al., 2012). In summary, the variety
in data formats and the nature of available knowledge
creates the following challenges: (1) Ability to integrate
and interoperate with heterogeneous data (to bridge
syntactic diversity, local vocabularies and models, and
multimodality); and (2) Semantic scalability
(Thirunarayan, et al., 2014).
2.3 Velocity
The conventional understanding of velocity typically
considers how quickly the data is arriving and stored, and
its associated rates of retrieval (Eaton, et al., 2012). This
definition of velocity is nothing more than one of the
reasons of data volumes that we are looking at, which
make it as one of Big Data volume’s characteristics. We
believe the idea of velocity, within Big Data context, is
actually something far more compelling than this
conventional definition. We are in agreement that today’s
enterprises are dealing with petabytes of data instead of
terabytes, and the increase sensors and other information
streams has led to a constant flow of data at a pace that
has made it impossible for traditional systems to handle.
Sometimes, getting an edge over your competition can
mean identifying a trend, problem, or opportunity only
seconds, or even microseconds, before someone else
(Eaton, et al., 2012). In addition, more and more of the
data being produced today has a very short shelf life, so
we must be able to analyze this data in near real-time if
we hope to find insights in this data. After all, the velocity
idea doesn’t mean only the speed of data generating and
storing, but the time required to exploit it too. The
importance lies in the speed of the feedback loop
(Dumbill, 2012), taking data from input through to
decision. To accommodate velocity, a new way of
thinking about a problem must start at the inception point
of the data. This requires online algorithms to efficiently
crawl and filter relevant data sources, detect and track
events, and anomalies, and collect and update relevant
background knowledge (Thirunarayan, et al., 2014).
Another key challenge is the creation of relevant domain
model or domain ontology on demand quickly to be
useful for semantic searching, browsing, and analysis of
real-time content. In summary, the rapid change in data
and trends creates the following challenges: (1) Ability to
focus on and rank the relevant data; (2) Ability to process
data quickly (such as incrementally) and respond; and (3)
Ability to cull, evolve, and hone in on relevant
background knowledge.
2.4 Veracity
Generally, Big Data characterized according to previous
three V’s (Volume, Variety, and Velocity), but we do
think that Big Data can be better explained and
characterized by adding a few more V’s. This V’s explain
important aspects of Big Data and Big Data strategy that
we cannot ignore. One of this V’s is Veracity, where
having a lot of data in different volumes coming in at high
speed is worthless if that data is incorrect or incomplete.
Incorrect data can cause a lot of problems for
organizations as well as for consumers. Therefore,
veracity means to what degree we can sure about
correctness and trustworthiness of data that coming from
many different heterogeneous sources. Statistical
methods can be applied in the context of homogeneous
data, while semantic models are necessary for
heterogeneous data (Thirunarayan, et al., 2014). In
summary, determination of veracity of data creates the
following challenges: (1) Ability to detect anomalies and
inconsistencies in data that can be due to defective
sensors or anomalous situations; and (2) Ability to reason
about and with trustworthiness that exploits temporal
history, collective evidence, context, and conflict
resolution strategies for decision-making.
2.5 Value
Another additional V that characterize Big Data is Value,
where the ultimate goal of Big Data is to get a value from
it. Of course, data in itself is not valuable at all. The value
is in the analyses done on that data and how the data is
turned into information and eventually turning it into
knowledge. The value is in how we will use that data and
turn our organisation into an information-centric
company that relies on insights derived from data
analyses for their decision-making. A key challenge to
get this value is the acquisition, identification (e.g.,
relevant knowledge on Linked Open Data (LOD)),
construction and application of relevant background
knowledge needed for data analytics and prediction
(Thirunarayan, et al., 2014). This does not mean ignoring
statistical techniques as apart of extracting value process
from data, actually semantic and statistical approaches
are complementary, and they have mutual benefits. For
example, we can use statistical techniques and
declarative knowledge as a hybrid approach in many
situations (Perera , et al., 2013), where there is a need for
filling gaps in existing declarative knowledge by
statistical techniques, and in contrast, we can use this
declarative knowledge for error detection and correction,
and compensation incomplete data. In summary,
extracting value using data analytics creates the
following challenges: (1) Ability to acquire and apply
knowledge from data and integrate it with domain
ontology; and (2) Ability to learn and apply domain
models from novel data streams for classification,
prediction, decision-making, and personalisation.
3 Role of the semantic web in the
creation of value
As we mentioned previously, the ultimate goal of Big
Data is creating a value by processing and analysing this
data. We have noticed the presence of a strong
relationship between challenges organized around Big
Data 5V’s, and the need for dealing with knowledge and
semantics for addressing this challenges. The question
here is how this 5V’s and their related challenges are
reflected on the process of value creation, therefore data
analysis process, and how we can enable semantic web
concepts and technologies to overcome this challenges
and get the desired value at the end? In order to answer
the question we have to know that currently there is a
wide gap between Big Data analysis potential and its
realization. Below we will explain all phases of the
pipeline that can create value from data, trying to address
related challenges with the mentality of the semantic
web.
3.1 Big Data analysis pipeline
Before we can get a value from data, there are a number
of distinct phases that should data pass through as shown
in figure 1 below. Each phase has its own challenges, and
there are common challenges between all these phases.
Heterogeneity, scale, timeliness, complexity, and privacy
problems with Big Data impede progress at all phases of
the pipeline that can create value from data (Agrawal, et
al., 2012).
Figure 1: the Big Data analysis pipeline.
The problems start right away during data acquisition,
when the data tsunami requires us to make decisions,
currently in an ad hoc manner, about what data to keep
and what to discard, and how to store what we keep
reliably with the right metadata. Much data today is not
natively in structured format; for example, tweets and
blogs are weakly structured pieces of text, while images
and video are structured for storage and display, but not
for semantic content search (Agrawal, et al., 2012).
Transforming such content into a structured format for
later analysis is a major challenge. The value of data
explodes when it can be linked with other data, thus data
integration is a major creator of value. The semantic web
plays a big role here, where we can use its concepts and
standards like ontology and linked data principles to
realize this data integration and linkage. Below we will
discuss this in more details.
3.1.1 Data acquisition and recording
Big Data does not arise out of a vacuum: it is recorded
from some data generating sources (Agrawal, et al.,
2012). For example, consider our ability to sense and
observe the world around us, from the heart rate of an
elderly citizen, and presence of toxins in the air we
breathe, to the planned square kilometer array telescope,
which will produce up to 1 million terabytes of raw data
per day. Similarly, scientific experiments and simulations
can easily produce petabytes of data today. This is what
so-called Volume and Velocity as we mentioned
previously, but much of this data is of no interest, and it
can be filtered and compressed by orders of magnitude.
One challenge is to define these filters in such a way that
they do not discard useful information. We need research
in the science of data reduction that can intelligently
process this raw data to a size that its users can handle
while not missing the needle in the haystack.
Furthermore, we require “on-line” analysis techniques
that can process such streaming data on the fly, since we
cannot afford to store first and reduce afterward. The
second big challenge is to automatically generate the
right metadata to describe what data is recorded and how
it is recorded and measured (Agrawal, et al., 2012). By
defining or curating domain ontologies as a conceptual
coverage for generated data, we can define such semantic
filters. Furthermore, to address volume issues, we can use
these ontologies to change the level of abstraction for
data processing to information that is meaningful to
human activity, actions, and decision-making. This is
what so-called semantic perception (Henson, et al.,
2011). Similarly, generating right meta data can be done
by relaying on this ontology for further analysis steps.
Besides using manually curated ontologies and reasoners
as discussed above, Linked Open Data (LOD) and
Wikipedia can be harnessed to overcome syntactic and
semantic heterogeneity with applications from social
media to Internet of Things. On the other hand, to
addressing velocity we need to deal with continuous
semantics. Formal modeling of evolving, dynamic,
domains and events is hard (Thirunarayan, et al., 2014).
First, we do not have many existing ontologies to use as
a starting point. Second, diverse users will have difficulty
committing to the shared worldview, further exacerbated
by contentious topics. Building domain models for
consensus requires us to pull background knowledge
from trusted, uncontroversial sources. Here, we can
harvest the wisdom of the crowds, or collective
intelligence, to build lightweight ontology (an informal
domain model) for use in tracking unfolding events, by
classifying, annotating and analyzing streaming data.
Therefore, we have to do more research about dynamic
creation and updating of ontologies from social-
knowledge sources such as Wikipedia and LOD that offer
exciting new capabilities in making real-time social and
sensor data more meaningful and useful for advanced
situational-awareness, analysis and decision-making.
3.1.2 Data analysis prerequisites
Data analysis require information extraction and data
integration, aggregation, representation, and cleaning
before we can analyse data effectively. Frequently, the
information collected will not be in a format ready for
analysis. For example, consider the collection of
electronic health records in a hospital, comprising
transcribed dictations from several physicians, structured
data from sensors and measurements (possibly with some
associated uncertainty), and image data such as x-rays
(Agrawal, et al., 2012), and this is what so called data
variety. We cannot leave the data in this form and still
effectively analyze it. Rather we require an information
extraction process that pulls out the required information
from the underlying sources and expresses it in a
structured form suitable for analysis. Doing this correctly
and completely is a continuing technical challenge. Note
that this data also includes images and will in the future
include video; such extraction is often highly application
dependent (e.g., what you want to pull out of an MRI is
very different from what you would pull out of a picture
of the stars, or a surveillance photo). Furthermore, we are
used to thinking of Big Data as always telling us the truth,
but this is actually far from reality. Existing work on data
cleaning assumes well-recognized constraints on valid
data or well-understood error models; for many emerging
Big Data domains, these do not exist and this what we
called data veracity. Given the heterogeneity of the flood
of data, it is not enough merely to record it and throw it
into a repository (Agrawal, et al., 2012). With adequate
metadata, there is some hope, but even so, challenges will
remain due to differences in information details and in
data record structure. Data analysis is considerably more
challenging than simply locating, identifying,
understanding, and citing data. This requires differences
in data structure and semantics to be expressed in forms
that are computer understandable, and then “robotically”
resolvable (Agrawal, et al., 2012). If we have high quality
semantic meta data from data acquisition and recording
phase, then we suggests to investigate linked data
principles for data representation. In other words, we can
use RDF formalism to representing, integrating,
interoperating, structuring, and linking data as a graph of
<subject, predicate, object> triples. This formalism can
help in addressing variety issues of Big Data by
representing it as a highly structured, machine-readable
format. We can do that by investigate domain ontology
as a background knowledge, in addition to benefiting
from linked open data (e.g., DBPedia) for linking,
integrating, and disambiguating our triples. The
remaining cleaning issues can be addressed by gleaning
trustworthiness, where this may require exploring robust
domain ontologies and other information like context,
history, correlations, and meta data that can distinguish
between erroneous data and data that caused by an
abnormal situation. On the other hand, data provenance
tracking and representation can be the basis for gleaning
trustworthiness (Manuel, et al., 2010). Unfortunately,
there is neither a universal notion of trust that is
applicable to all domains nor a clear explication of its
semantics or computation in many situations. The Holy
Grail of trust research is to develop expressive trust
frameworks that have both declarative-axiomatic and
computational specification, and to devise
methodologies for instantiating them for practical use, by
justifying automatic trust inference in terms of
application-oriented semantics of trust (Anantharam, et
al., 2013).
3.1.3 Query processing, data modeling, and
analysis
We have to think about querying and mining Big Data in
different mentality, where traditional query languages
(e.g., SQL, SPARQL, and even NoSQL) and statistical
analysis methods is not enough for realising the desired
value. Big Data, as we mentioned, is often noisy,
dynamic, heterogeneous, inter-related and untrustworthy.
Further, interconnected Big Data (in linked data style)
forms large heterogeneous information networks, with
which information redundancy can be explored to
compensate for missing data, to crosscheck conflicting
cases, to validate trustworthy relationships, to disclose
inherent clusters, and to uncover hidden relationships and
models. Actually data mining and analysis is a cyclic
process, where mining requires integrated, cleaned,
trustworthy, and efficiently accessible data, declarative
query and mining interfaces, scalable mining algorithms,
and Big Data computing environments. At the same time,
data mining itself can also be used to help improve the
quality and trustworthiness of the data, understanding its
semantics, and provide intelligent query functions. The
value of Big Data analysis can only be realized if it
applied under these difficult conditions (Agrawal, et al.,
2012). On the flip side, knowledge developed from data
can help in correcting errors and removing ambiguity.
Big Data is also enabling the next generation of
interactive data analysis with real-time answers, where
scaling complex query processing techniques to terabytes
while enabling interactive response times is a major open
research problem today (Agrawal, et al., 2012). In the
context of RDF data representation, we need new
methods that enable a tight coupling between declarative
query languages and the functions of analysis and
mining, and these will benefit both expressiveness and
performance of the analysis.
3.1.4 Interpretation
Having the ability to analyze Big Data is of limited value
if users cannot understand the analysis (Agrawal, et al.,
2012). Ultimately, a decision-maker, provided with the
result of analysis, has to interpret these results. This
interpretation cannot happen in a vacuum. Usually, it
involves examining all the assumptions made and
retracing the analysis. Furthermore, there are many
possible sources of error: computer systems can have
bugs, models almost always have assumptions, and
results can be based on erroneous data. For all of these
reasons, no responsible user will cede authority to the
computer. Rather she will try to understand, and verify,
the results produced by the computer. The computer
system must make it easy for her to do so. This is
particularly a challenge with Big Data due to its
complexity. In short, it is rarely enough to provide just
the results. Rather, one must provide supplementary
information that explains how each result was derived,
and based upon precisely what inputs. Such
supplementary information is called the provenance of
the (result) data. By studying how best to capture, store,
and query provenance, in conjunction with techniques to
capture adequate metadata, we can create an
infrastructure to provide users with the ability both to
interpret analytical results obtained and to repeat the
analysis with different assumptions, parameters, or data
sets. With our semantic mentality, we think about that the
best way to provide such provenance is relaying on
domain ontologies and LOD as a proof framework.
Representing domain knowledge as ontology using
semantic web standard can help us to justify analysis
results. In other words, analysis results must be justifiable
by used domain knowledge.
4 Conclusion
We have entered an era of Big Data. Through better
analysis of the large volumes of data that are becoming
available, there is the potential for making faster
advances in many scientific disciplines and improving
the profitability and success of many enterprises.
However, in this paper we have investigated how the
semantic web can be an enabler for addressing many
aspects of Big Data challenges like heterogeneity, lack of
structure, error-handling, timeliness, and provenance at
all stages of the analysis pipeline from data acquisition to
result interpretation. These challenges are common
across a large variety of application domains, and
therefore not cost-effective to address in the context of
one domain alone. Actually, there are many challenges,
which are out of scope of this paper, that need to be
addressed before Big Data potential can be realized fully.
As a result, semantic web concepts and technologies can
play a major role, as a mediation layer between Big Data
as is and transforming it ultimately to a big value. Finally
we must support and encourage fundamental research
towards addressing these challenges in different
mentalities if we are to achieve the promised benefits of
Big Data.
References
An ontological approach to focusing attention and
enhancing machine perception on the Web [Journal] /
auth. Henson Cory , Thirunarayan Krishnaprasad and
Sheth Amit . - Amsterdam : Applied Ontology, 2011. -
4 : Vol. 6.
Big Data Now [Book] / auth. Dumbill Edd. - United
States of America : O’Reilly Media, Inc., 2012.
Big data: The next frontier for innovation,
competition, and productivity. [Book] / auth. Manyika
James [et al.]. - [s.l.] : McKinsey Global Institute, 2011.
Challenges and Opportunities with Big Data: A
white paper prepared for the Computing
Community Consortium [Report] / auth. Agrawal
Divyakant [et al.]. - USA : [s.n.], 2012.
Comparative trust management with applications:
Bayesian approaches emphasis [Journal] / auth.
Thirunarayan Krishnaprasad [et al.] // ScienceDirect. -
2014. - pp. 182–199.
Linked Data, Big Data, and the 4th Paradigm
[Journal] / auth. Hitzler Pascal and Janowicz
Krzysztof // IOS Press. - [s.l.] : IOS Press, 2013.
Provenance and Trust [Online] / auth. Manuel José
and Pérez Gómez // Slide Share. - Jun 30, 2010. -
http://www.slideshare.net/jmgomez23/provenance-and-
trust.
Semantics Driven Approach for Knowledge
Acquisition From EMRs [Journal] / auth. Perera Sujan
[et al.] // IEEE Journal of Biomedical and Health
Informatics. - 2013. - pp. 515 - 524.
Semantics Empowered Web 3.0: Managing
Enterprise, Social, Sensor, and Cloud-based Data
and Services for Advanced Applications. [Book] /
auth. Sheth A. and Thirunarayan K.. - [s.l.] : Morgan &
Claypool, 2012.
Semantics-Empowered Approaches to Big Data
Processing for Physical-Cyber-Social Applications
[Book] / auth. Thirunarayan Krishnaprasad and Sheth
Amit. - Dayton : AAAI, 2014.
Towards Semantic Mashup Tools for Big Data
[Book] / auth. Anjomshoaa Amin, Tjoa A. Min and
Hendrik. - Bali : Springer Berlin Heidelberg, 2014.
Traffic Analytics using Probabilistic Graphical
Models Enhanced with Knowledge Bases.
Proceedings of the 2nd International Workshop on
Analytics for Cyber-Physical Systems (ACS-2013)
[Conference] / auth. Anantharam Pramod ,
Thirunarayan Krishnaprasad and Sheth Amit // Ohio
Center of Excellence in Knowledge-Enabled
Computing. - 2013.
Understanding Big Data [Book] / auth. Eaton Chris [et
al.]. - USA : McGraw-Hill, 2012.
Using Data for Systemic Financial Risk Management
[Conference] / auth. Flood Mark [et al.] // Proc. Fifth
Biennial Conf. Innovative Data Systems . - 2011.

More Related Content

What's hot

An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesDr. Amarjeet Singh
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdfAkuhuruf
 
Semantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and OpportunitiesSemantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and OpportunitiesCSCJournals
 
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata PrivacyTwo-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacydbpublications
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdfbig-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdfAkuhuruf
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...AnthonyOtuonye
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 

What's hot (19)

An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: Challenges
 
Big data
Big dataBig data
Big data
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdf
 
Semantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and OpportunitiesSemantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and Opportunities
 
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata PrivacyTwo-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
 
Big data road map
Big data road mapBig data road map
Big data road map
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
How does big data impact you
How does big data impact youHow does big data impact you
How does big data impact you
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdfbig-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
 
Big Data Ethics
Big Data EthicsBig Data Ethics
Big Data Ethics
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
BIG DATA RESEARCH
BIG DATA RESEARCHBIG DATA RESEARCH
BIG DATA RESEARCH
 

Similar to Semantic Web Investigation within Big Data Context

Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Communications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docxCommunications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docxmonicafrancis71118
 
Introduction to Data Science 1118.pptx
Introduction to Data Science 1118.pptxIntroduction to Data Science 1118.pptx
Introduction to Data Science 1118.pptxmark828
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfRanjeet Bhalshankar
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioAI Publications
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdfAkuhuruf
 
s40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfs40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfAkuhuruf
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data scienceJohnson Ubah
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 

Similar to Semantic Web Investigation within Big Data Context (20)

Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Communications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docxCommunications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docx
 
Introduction to Data Science 1118.pptx
Introduction to Data Science 1118.pptxIntroduction to Data Science 1118.pptx
Introduction to Data Science 1118.pptx
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present Scenario
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
s40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfs40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdf
 
Sample
Sample Sample
Sample
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Big Data Challenges faced by Organizations
Big Data Challenges faced by OrganizationsBig Data Challenges faced by Organizations
Big Data Challenges faced by Organizations
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 

Recently uploaded

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Semantic Web Investigation within Big Data Context

  • 1. Semantic Web Investigation within Big Data Context Murad Daryousse Damascus University-FITE Abstract Data is everywhere; nearly everything can be represented by a number. In addition, in its simple form, data is pure, a collection of measured information that, when analyzed and processed, tells a story backed by numerical truth. On other hand challenges associated with (5V’s) volume, variety, velocity, veracity, and value of this data need to be addressed when we process, analyze, and ultimately derive insight from data. Data that characterised by this 5V’s is called “Big Data”, and we discuss in this research how the Semantic Web - as a platform - can be utilized to address challenges that associated with each of Big Data characteristics. We organize our work as a state of the art of works and researches in the same context. Keywords: Big Data, semantic web, linked data, state of the art. 1 Introduction Recently, Big Data has made its appearance in the shared mindset of researchers, practitioners, and funding agencies, driven by the awareness that concerted efforts are needed to address 21st century data collection, analysis, management, ownership, and privacy issues. While there is no generally agreed understanding of what exactly is (or more importantly, what is not) Big Data, an increasing number of V’s has been used to characterize different dimensions and challenges of Big Data: volume, velocity, variety, value, and veracity. Interestingly, different (scientific) disciplines highlight certain dimensions and neglect others. For instance, supercomputing seems to be mostly interested in the volume dimension while researchers working on sensor webs and the internet of things seem to push on the velocity front. The social sciences and humanities, in contrast, are more interested in value and veracity. The variety dimensions seems to be the most intriguing one for the Semantic Web and the one where we can contribute most as a research community (Hitzler, et al., 2013). At the end, all V’s have to be addressed in an interdisciplinary effort to substantially advance on the Big Data front. The 4th Paradigm of Science is yet another notion that has emerged within the last years and can be understood as the scientific view on how Big Data changes the very fabric of science. With the omnipresence and availability of data from different times, locations, perspectives, topics, cultures, resolutions, qualities, and so forth, exploration becomes an additional (4th) paradigm of science. This raises synthesis to a new level. In other words, we can gain new insights by creatively combining what is already there – an idea that seems to align very well with Linked Data and Semantic Web technologies as drivers of integration (Hitzler, et al., 2013). 2 Characteristics of Big Data We discuss the primary characteristics of the Big Data problem as is pertain to the 5V’s. 2.1 Volume Volume dimension of Big Data relates to the size of data from one or more data sources in Tera, Peta, or Exabyte (Anjomshoaa, et al., 2014). The sheer volume of data being stored today is exploding. Of course, a lot of the data that’s being created today isn’t analyzed at all (Eaton, et al., 2012). Some expectations point to this number to reach 35 Zettabyte (ZB) by 2020. Twitter alone generate more than 7 terabytes (TB) of data every day, Facebook 10 TB, and some enterprises generate terabytes of data every hour of every day of the year (Eaton, et al., 2012). We are going to stop right there with the factoids: Truth is, these estimates will be out of date by the time you read this paper. However, availability of fine-grained raw data is not sufficient unless we can analyze, summarize or abstract them in meaningful ways
  • 2. that are actionable (Thirunarayan, et al., 2014). However, we still need to investigate how to effectively translate large amounts of raw data into a few human comprehensible nuggets of information necessary for decision-making. Furthermore, privacy and locality considerations require moving computations closer to the data source, leading to powerful applications on resource-constrained devices. In the latter situation, even though the amount of data is not large by normal standards, the resource constraints negate the use of conventional data formats and algorithms, and instead necessitate the development of novel encoding, indexing, and reasoning techniques (Thirunarayan, et al., 2014). In summary, the volume of data to be processed on available resources creates the following challenges: (1) Ability to abstract the data in a form that summarizes the situation and is actionable, that is, semantic scalability (Sheth, 2011)(Sheth, 2013) to transcend from fine-grained machine-accessible data to coarse-grained human comprehensible and actionable abstractions; and (2) Ability to scale computations to take advantage of distributed processing infrastructure and to reason efficiently on mobile devices where appropriate. 2.2 Variety Data today exists in various formats like texts, images, videos, audios, relational data, and so on. Quite simply, variety represents all types of data (Eaton, et al., 2012), with explosion of sensors, and smart devices, as well as social collaboration technologies, data has become more complex, because it includes not only traditional relational data, but also raw, semi structured, and unstructured data from web pages, web log files (including click stream data), search indexes, social media, e-mail, documents, sensor data from active and passive systems, and so on. A fundamental shift in analysis requirements from traditional structured data to include raw, semi structured, and unstructured data as a part of decision-making and insight process. So traditional analytic platforms cannot handle variety because it designed for only handle traditional structured (mostly relational) data. The truth of the matter is that 80% of world’s data is unstructured or semi structured at best (Eaton, et al., 2012). However, the value of Big Data can be realized when we able to draw insights from the various kinds of data available to us, which include both traditional and nontraditional. On the other hand, available knowledge that can be drawn from data has a mix of declarative and statistical flavor, capturing both qualitative and quantitative aspects that when integrated can provide complementary and Corroborative information (Sheth, et al., 2012). In summary, the variety in data formats and the nature of available knowledge creates the following challenges: (1) Ability to integrate and interoperate with heterogeneous data (to bridge syntactic diversity, local vocabularies and models, and multimodality); and (2) Semantic scalability (Thirunarayan, et al., 2014). 2.3 Velocity The conventional understanding of velocity typically considers how quickly the data is arriving and stored, and its associated rates of retrieval (Eaton, et al., 2012). This definition of velocity is nothing more than one of the reasons of data volumes that we are looking at, which make it as one of Big Data volume’s characteristics. We believe the idea of velocity, within Big Data context, is actually something far more compelling than this conventional definition. We are in agreement that today’s enterprises are dealing with petabytes of data instead of terabytes, and the increase sensors and other information streams has led to a constant flow of data at a pace that has made it impossible for traditional systems to handle. Sometimes, getting an edge over your competition can mean identifying a trend, problem, or opportunity only seconds, or even microseconds, before someone else (Eaton, et al., 2012). In addition, more and more of the data being produced today has a very short shelf life, so we must be able to analyze this data in near real-time if we hope to find insights in this data. After all, the velocity idea doesn’t mean only the speed of data generating and storing, but the time required to exploit it too. The importance lies in the speed of the feedback loop (Dumbill, 2012), taking data from input through to decision. To accommodate velocity, a new way of thinking about a problem must start at the inception point of the data. This requires online algorithms to efficiently crawl and filter relevant data sources, detect and track events, and anomalies, and collect and update relevant background knowledge (Thirunarayan, et al., 2014). Another key challenge is the creation of relevant domain model or domain ontology on demand quickly to be useful for semantic searching, browsing, and analysis of real-time content. In summary, the rapid change in data and trends creates the following challenges: (1) Ability to focus on and rank the relevant data; (2) Ability to process data quickly (such as incrementally) and respond; and (3) Ability to cull, evolve, and hone in on relevant background knowledge. 2.4 Veracity Generally, Big Data characterized according to previous three V’s (Volume, Variety, and Velocity), but we do think that Big Data can be better explained and
  • 3. characterized by adding a few more V’s. This V’s explain important aspects of Big Data and Big Data strategy that we cannot ignore. One of this V’s is Veracity, where having a lot of data in different volumes coming in at high speed is worthless if that data is incorrect or incomplete. Incorrect data can cause a lot of problems for organizations as well as for consumers. Therefore, veracity means to what degree we can sure about correctness and trustworthiness of data that coming from many different heterogeneous sources. Statistical methods can be applied in the context of homogeneous data, while semantic models are necessary for heterogeneous data (Thirunarayan, et al., 2014). In summary, determination of veracity of data creates the following challenges: (1) Ability to detect anomalies and inconsistencies in data that can be due to defective sensors or anomalous situations; and (2) Ability to reason about and with trustworthiness that exploits temporal history, collective evidence, context, and conflict resolution strategies for decision-making. 2.5 Value Another additional V that characterize Big Data is Value, where the ultimate goal of Big Data is to get a value from it. Of course, data in itself is not valuable at all. The value is in the analyses done on that data and how the data is turned into information and eventually turning it into knowledge. The value is in how we will use that data and turn our organisation into an information-centric company that relies on insights derived from data analyses for their decision-making. A key challenge to get this value is the acquisition, identification (e.g., relevant knowledge on Linked Open Data (LOD)), construction and application of relevant background knowledge needed for data analytics and prediction (Thirunarayan, et al., 2014). This does not mean ignoring statistical techniques as apart of extracting value process from data, actually semantic and statistical approaches are complementary, and they have mutual benefits. For example, we can use statistical techniques and declarative knowledge as a hybrid approach in many situations (Perera , et al., 2013), where there is a need for filling gaps in existing declarative knowledge by statistical techniques, and in contrast, we can use this declarative knowledge for error detection and correction, and compensation incomplete data. In summary, extracting value using data analytics creates the following challenges: (1) Ability to acquire and apply knowledge from data and integrate it with domain ontology; and (2) Ability to learn and apply domain models from novel data streams for classification, prediction, decision-making, and personalisation. 3 Role of the semantic web in the creation of value As we mentioned previously, the ultimate goal of Big Data is creating a value by processing and analysing this data. We have noticed the presence of a strong relationship between challenges organized around Big Data 5V’s, and the need for dealing with knowledge and semantics for addressing this challenges. The question here is how this 5V’s and their related challenges are reflected on the process of value creation, therefore data analysis process, and how we can enable semantic web concepts and technologies to overcome this challenges and get the desired value at the end? In order to answer the question we have to know that currently there is a wide gap between Big Data analysis potential and its realization. Below we will explain all phases of the pipeline that can create value from data, trying to address related challenges with the mentality of the semantic web. 3.1 Big Data analysis pipeline Before we can get a value from data, there are a number of distinct phases that should data pass through as shown in figure 1 below. Each phase has its own challenges, and there are common challenges between all these phases. Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data (Agrawal, et al., 2012). Figure 1: the Big Data analysis pipeline. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images
  • 4. and video are structured for storage and display, but not for semantic content search (Agrawal, et al., 2012). Transforming such content into a structured format for later analysis is a major challenge. The value of data explodes when it can be linked with other data, thus data integration is a major creator of value. The semantic web plays a big role here, where we can use its concepts and standards like ontology and linked data principles to realize this data integration and linkage. Below we will discuss this in more details. 3.1.1 Data acquisition and recording Big Data does not arise out of a vacuum: it is recorded from some data generating sources (Agrawal, et al., 2012). For example, consider our ability to sense and observe the world around us, from the heart rate of an elderly citizen, and presence of toxins in the air we breathe, to the planned square kilometer array telescope, which will produce up to 1 million terabytes of raw data per day. Similarly, scientific experiments and simulations can easily produce petabytes of data today. This is what so-called Volume and Velocity as we mentioned previously, but much of this data is of no interest, and it can be filtered and compressed by orders of magnitude. One challenge is to define these filters in such a way that they do not discard useful information. We need research in the science of data reduction that can intelligently process this raw data to a size that its users can handle while not missing the needle in the haystack. Furthermore, we require “on-line” analysis techniques that can process such streaming data on the fly, since we cannot afford to store first and reduce afterward. The second big challenge is to automatically generate the right metadata to describe what data is recorded and how it is recorded and measured (Agrawal, et al., 2012). By defining or curating domain ontologies as a conceptual coverage for generated data, we can define such semantic filters. Furthermore, to address volume issues, we can use these ontologies to change the level of abstraction for data processing to information that is meaningful to human activity, actions, and decision-making. This is what so-called semantic perception (Henson, et al., 2011). Similarly, generating right meta data can be done by relaying on this ontology for further analysis steps. Besides using manually curated ontologies and reasoners as discussed above, Linked Open Data (LOD) and Wikipedia can be harnessed to overcome syntactic and semantic heterogeneity with applications from social media to Internet of Things. On the other hand, to addressing velocity we need to deal with continuous semantics. Formal modeling of evolving, dynamic, domains and events is hard (Thirunarayan, et al., 2014). First, we do not have many existing ontologies to use as a starting point. Second, diverse users will have difficulty committing to the shared worldview, further exacerbated by contentious topics. Building domain models for consensus requires us to pull background knowledge from trusted, uncontroversial sources. Here, we can harvest the wisdom of the crowds, or collective intelligence, to build lightweight ontology (an informal domain model) for use in tracking unfolding events, by classifying, annotating and analyzing streaming data. Therefore, we have to do more research about dynamic creation and updating of ontologies from social- knowledge sources such as Wikipedia and LOD that offer exciting new capabilities in making real-time social and sensor data more meaningful and useful for advanced situational-awareness, analysis and decision-making. 3.1.2 Data analysis prerequisites Data analysis require information extraction and data integration, aggregation, representation, and cleaning before we can analyse data effectively. Frequently, the information collected will not be in a format ready for analysis. For example, consider the collection of electronic health records in a hospital, comprising transcribed dictations from several physicians, structured data from sensors and measurements (possibly with some associated uncertainty), and image data such as x-rays (Agrawal, et al., 2012), and this is what so called data variety. We cannot leave the data in this form and still effectively analyze it. Rather we require an information extraction process that pulls out the required information from the underlying sources and expresses it in a structured form suitable for analysis. Doing this correctly and completely is a continuing technical challenge. Note that this data also includes images and will in the future include video; such extraction is often highly application dependent (e.g., what you want to pull out of an MRI is very different from what you would pull out of a picture of the stars, or a surveillance photo). Furthermore, we are used to thinking of Big Data as always telling us the truth, but this is actually far from reality. Existing work on data cleaning assumes well-recognized constraints on valid data or well-understood error models; for many emerging Big Data domains, these do not exist and this what we called data veracity. Given the heterogeneity of the flood of data, it is not enough merely to record it and throw it into a repository (Agrawal, et al., 2012). With adequate metadata, there is some hope, but even so, challenges will remain due to differences in information details and in data record structure. Data analysis is considerably more challenging than simply locating, identifying, understanding, and citing data. This requires differences
  • 5. in data structure and semantics to be expressed in forms that are computer understandable, and then “robotically” resolvable (Agrawal, et al., 2012). If we have high quality semantic meta data from data acquisition and recording phase, then we suggests to investigate linked data principles for data representation. In other words, we can use RDF formalism to representing, integrating, interoperating, structuring, and linking data as a graph of <subject, predicate, object> triples. This formalism can help in addressing variety issues of Big Data by representing it as a highly structured, machine-readable format. We can do that by investigate domain ontology as a background knowledge, in addition to benefiting from linked open data (e.g., DBPedia) for linking, integrating, and disambiguating our triples. The remaining cleaning issues can be addressed by gleaning trustworthiness, where this may require exploring robust domain ontologies and other information like context, history, correlations, and meta data that can distinguish between erroneous data and data that caused by an abnormal situation. On the other hand, data provenance tracking and representation can be the basis for gleaning trustworthiness (Manuel, et al., 2010). Unfortunately, there is neither a universal notion of trust that is applicable to all domains nor a clear explication of its semantics or computation in many situations. The Holy Grail of trust research is to develop expressive trust frameworks that have both declarative-axiomatic and computational specification, and to devise methodologies for instantiating them for practical use, by justifying automatic trust inference in terms of application-oriented semantics of trust (Anantharam, et al., 2013). 3.1.3 Query processing, data modeling, and analysis We have to think about querying and mining Big Data in different mentality, where traditional query languages (e.g., SQL, SPARQL, and even NoSQL) and statistical analysis methods is not enough for realising the desired value. Big Data, as we mentioned, is often noisy, dynamic, heterogeneous, inter-related and untrustworthy. Further, interconnected Big Data (in linked data style) forms large heterogeneous information networks, with which information redundancy can be explored to compensate for missing data, to crosscheck conflicting cases, to validate trustworthy relationships, to disclose inherent clusters, and to uncover hidden relationships and models. Actually data mining and analysis is a cyclic process, where mining requires integrated, cleaned, trustworthy, and efficiently accessible data, declarative query and mining interfaces, scalable mining algorithms, and Big Data computing environments. At the same time, data mining itself can also be used to help improve the quality and trustworthiness of the data, understanding its semantics, and provide intelligent query functions. The value of Big Data analysis can only be realized if it applied under these difficult conditions (Agrawal, et al., 2012). On the flip side, knowledge developed from data can help in correcting errors and removing ambiguity. Big Data is also enabling the next generation of interactive data analysis with real-time answers, where scaling complex query processing techniques to terabytes while enabling interactive response times is a major open research problem today (Agrawal, et al., 2012). In the context of RDF data representation, we need new methods that enable a tight coupling between declarative query languages and the functions of analysis and mining, and these will benefit both expressiveness and performance of the analysis. 3.1.4 Interpretation Having the ability to analyze Big Data is of limited value if users cannot understand the analysis (Agrawal, et al., 2012). Ultimately, a decision-maker, provided with the result of analysis, has to interpret these results. This interpretation cannot happen in a vacuum. Usually, it involves examining all the assumptions made and retracing the analysis. Furthermore, there are many possible sources of error: computer systems can have bugs, models almost always have assumptions, and results can be based on erroneous data. For all of these reasons, no responsible user will cede authority to the computer. Rather she will try to understand, and verify, the results produced by the computer. The computer system must make it easy for her to do so. This is particularly a challenge with Big Data due to its complexity. In short, it is rarely enough to provide just the results. Rather, one must provide supplementary information that explains how each result was derived, and based upon precisely what inputs. Such supplementary information is called the provenance of the (result) data. By studying how best to capture, store, and query provenance, in conjunction with techniques to capture adequate metadata, we can create an infrastructure to provide users with the ability both to interpret analytical results obtained and to repeat the analysis with different assumptions, parameters, or data sets. With our semantic mentality, we think about that the best way to provide such provenance is relaying on domain ontologies and LOD as a proof framework. Representing domain knowledge as ontology using semantic web standard can help us to justify analysis
  • 6. results. In other words, analysis results must be justifiable by used domain knowledge. 4 Conclusion We have entered an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific disciplines and improving the profitability and success of many enterprises. However, in this paper we have investigated how the semantic web can be an enabler for addressing many aspects of Big Data challenges like heterogeneity, lack of structure, error-handling, timeliness, and provenance at all stages of the analysis pipeline from data acquisition to result interpretation. These challenges are common across a large variety of application domains, and therefore not cost-effective to address in the context of one domain alone. Actually, there are many challenges, which are out of scope of this paper, that need to be addressed before Big Data potential can be realized fully. As a result, semantic web concepts and technologies can play a major role, as a mediation layer between Big Data as is and transforming it ultimately to a big value. Finally we must support and encourage fundamental research towards addressing these challenges in different mentalities if we are to achieve the promised benefits of Big Data. References An ontological approach to focusing attention and enhancing machine perception on the Web [Journal] / auth. Henson Cory , Thirunarayan Krishnaprasad and Sheth Amit . - Amsterdam : Applied Ontology, 2011. - 4 : Vol. 6. Big Data Now [Book] / auth. Dumbill Edd. - United States of America : O’Reilly Media, Inc., 2012. Big data: The next frontier for innovation, competition, and productivity. [Book] / auth. Manyika James [et al.]. - [s.l.] : McKinsey Global Institute, 2011. Challenges and Opportunities with Big Data: A white paper prepared for the Computing Community Consortium [Report] / auth. Agrawal Divyakant [et al.]. - USA : [s.n.], 2012. Comparative trust management with applications: Bayesian approaches emphasis [Journal] / auth. Thirunarayan Krishnaprasad [et al.] // ScienceDirect. - 2014. - pp. 182–199. Linked Data, Big Data, and the 4th Paradigm [Journal] / auth. Hitzler Pascal and Janowicz Krzysztof // IOS Press. - [s.l.] : IOS Press, 2013. Provenance and Trust [Online] / auth. Manuel José and Pérez Gómez // Slide Share. - Jun 30, 2010. - http://www.slideshare.net/jmgomez23/provenance-and- trust. Semantics Driven Approach for Knowledge Acquisition From EMRs [Journal] / auth. Perera Sujan [et al.] // IEEE Journal of Biomedical and Health Informatics. - 2013. - pp. 515 - 524. Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications. [Book] / auth. Sheth A. and Thirunarayan K.. - [s.l.] : Morgan & Claypool, 2012. Semantics-Empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications [Book] / auth. Thirunarayan Krishnaprasad and Sheth Amit. - Dayton : AAAI, 2014. Towards Semantic Mashup Tools for Big Data [Book] / auth. Anjomshoaa Amin, Tjoa A. Min and Hendrik. - Bali : Springer Berlin Heidelberg, 2014. Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases. Proceedings of the 2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013) [Conference] / auth. Anantharam Pramod , Thirunarayan Krishnaprasad and Sheth Amit // Ohio Center of Excellence in Knowledge-Enabled Computing. - 2013. Understanding Big Data [Book] / auth. Eaton Chris [et al.]. - USA : McGraw-Hill, 2012. Using Data for Systemic Financial Risk Management [Conference] / auth. Flood Mark [et al.] // Proc. Fifth Biennial Conf. Innovative Data Systems . - 2011.