SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
The Briefing Room
As You Seek—How Search Enables Big Data Analytics
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
Twitter Tag: #briefr The Briefing Room
!   Reveal the essential characteristics of enterprise software,
good and bad
!   Provide a forum for detailed analysis of today s innovative
technologies
!   Give vendors a chance to explain their product to savvy
analysts
!   Allow audience members to pose serious questions... and get
answers!
Mission
Twitter Tag: #briefr The Briefing Room
JUNE: Database
July: CLOUD
August: HIGH PERFORMANCE ANALYTICS
September: ANALYTICS
Twitter Tag: #briefr The Briefing Room
Database
Better SEARCH
Faster INSIGHT
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group	
	
robin.bloor@bloorgroup.com
Twitter Tag: #briefr The Briefing Room
! MarkLogic is an enterprise-class NoSQL database company
!   Key features of its database include ACID transactions,
horizontal scaling, real-time indexing, high availability,
disaster recovery, and government-grade security
!   Its platform provides full-text query and search capabilities,
application services and big data analytics
MarkLogic
Twitter Tag: #briefr The Briefing Room
David Gorbet
David Gorbet is Vice President of Engineering for
MarkLogic, where he also runs the Support
organization. Gorbet brings two decades of
experience delivering some of the highest-volume
applications and enterprise software in the world.
Prior to MarkLogic, Gorbet helped pioneer
Microsoft’s business online services strategy by
founding and leading the SharePoint Online
team. Gorbet holds a Bachelor of Applied
Science degree in Systems Design Engineering
with an additional major in Psychology from the
University of Waterloo, and an MBA from the
University of Washington Foster School of
Business.
MarkLogic: What it is, how it works
David Gorbet, VP Engineering
Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
WE ARE THE
NEW GENERATION
DATABASE
Any Structure Era
“For all your data!”
• Schema-agnostic
• Massive scale
• Query and search
• Analytics
• Application services
• Faster time-to-results
Relational Era
“For all your structured
data!”
• Normalized, tabular
model
• Application-
independent query
• User control
Hierarchical Era
For your application
data!
• Application- and
hardware-specific
Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Real Value From Big Data
Make The World More Secure
Provide Access To Valuable Information
Create New Revenue Streams
Gain Insights to Increase Market Share
Reduce Bottom Line Expense
Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
The MarkLogic Advantage
Only Enterprise NoSQL Database
 ACID compliant
 Big data search
 High availability
 Replication
 Point in-time recovery
 Government-grade security
 Real-time your Hadoop
 Proven customer success
Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
How Does It Work?
Schema-agnostic design
Real-time indexing and query
Event processing and alerting
Scale-out shared-nothing cluster topology
Analytics and Visualization
High availability and disaster recovery
Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Hierarchical Data Model
 MarkLogic Server is a document-centric database
 Supports any-structured data via hierarchical data model
Document
Title
Author
Section
Section Section Section Section
First
Last
Metadata
Trade
Cashflows
Party
Identifier
Net
Payment
Payment
Date
Party
Reference
Payer
Party
trade
ID
Payment
Amount
Receiver
Party
Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic is Schema Agnostic
JSON and XML are self-describing
<article>
<title>MarkLogic Server:… </title>
<author>
<first-name>John</first-name>
<last-name>Doe</last-name>
</author>
<abstract>
. . . .<company>MarkLogic</company>. . . .
</abstract>
<body>
<section>
<section>. . . .</section>
</section>
<section>…index…</section>
</body>
<copyright>Copyright © … </copyright>
</article>
Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic is Schema Agnostic
JSON and XML are self-describing
<article>
<title>
MarkLogic Server:…
<author>
<first-name>
John
<last-name>
Doe
<abstract>
. . . .
<company>
MarkLogic
. . . .
<body>
<section>
<section>
. . . .
<section>
…index…
<copyright>
Copyright © …
Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
“brown” 123, 125, 129, 152, 344, 491, …
“mice” 123, 125, 126, 129, 130, 152, …
“brown mice” 125, 152, 516, 522, 765, 890, …
STEM “mouse” 123, 125, 126, 129, 130, 152, …
STEM “brown mouse” 125, 152, 516, 522, 765, 890, …
<article> …
<article>/<abstract> …
<section>/<paragraph> …
<animal>mouse</animal> …
<year>1950</year> …
Collection:Draft …
Role:Editor + Action:Read …
… …
… …
… …
Universal Index
Term Term List
MarkLogic indexes…
 Words
 Phrases
 Stemming
 Structure
 Values
 Collections
 Security Permissions
Document
References
125, 516, 890, …
Which draft articles contain the phrase brown mice?
Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
“brown” 123, 125, 129, 152, 344, 491, …
“mice” 123, 125, 126, 129, 130, 152, …
“brown mice” 125, 152, 516, 522, 765, 890, …
STEM “mouse” 123, 125, 126, 129, 130, 152, …
STEM “brown mouse” 125, 152, 516, 522, 765, 890, …
<article> …
<article>/<abstract> …
<section>/<paragraph> …
<animal>mouse</animal> …
<year>1950</year> …
Collection:Draft …
Role:Editor + Action:Read …
… …
… …
… …
Scalar Queries
Term Term List Document
References
125, 516, 890, …
Which draft articles that contain the phrase brown mice were written before 2010?
Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Range Indexes
Value ID
2002 3
2003 10
2004 5
2004 11
2007 4
2007 17
2009 1
2011 8
… …
… …
… …
ID Value
1 2009
3 2002
4 2007
5 2004
8 2011
10 2003
11 2004
17 2007
… …
… …
… …
Map document IDs to
values, and vice-versa in
a compact in-memory
representation
Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Geospatial Index:
A 2-Dimensional Range Index
Fully composable with all other indexes!
 Built-in support for:
 Point
 Box
 Circle
 Polygon
 Complex Polygon
 Polygon Intersection
 Polygon Containment
Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Reverse Indexes (Alerting)
1. Load serialized queries as query documents
2. For a given data document, find all queries that match
 Can provide real-time alerts during loads
 With no significant performance impact!
 Can let documents store values as "ranges"
 Documents about cities self-defining their geo boundaries
 Person documents defining birthdays as ranges, sequences
 Can power classifiers and "matchmaker" queries
Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Value ID
2002 3
2003 10
2004 5
2004 11
2007 4
2007 17
2009 1
2011 8
… …
… …
… …
ID Value
1 2009
3 2002
4 2007
5 2004
8 2011
10 2003
11 2004
17 2007
… …
… …
… …
Range Indexes
Map document IDs to
values, and vice-versa in
a compact in-memory
representation
Range Indexes work like
a built-in in-memory
column store
Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Facets and Aggregation
Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Interactive Visualization
Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
In-database Analytic Functions
Leverage ready-made
analytic built-ins for
commonly-used numeric
applications
 Variance
 Covariance
 Correlation
 Standard deviation
 Linear model
 Median
 Mode
 Percentile
 Rank
 Percent-rank
Benefits
 Faster analytics-based application
development
 Supports more users & more data
 Eliminates costs associated with
writing custom code
Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
User-defined Functions
class InfluenceRank : public AggregateUDF
{
public:
struct Value {
double sum, sum_sq, count;
Value() : sum(0), sum_sq(0), count(0) {}
} value;
public:
AggregateUDF* clone() const { return new InfluenceRank (*this); }
void close() { delete this; }
void start(Sequence&, Reporter&) {}
void finish(OutputSequence& os, Reporter& reporter);
void map(TupleIterator& values, Reporter& reporter);
void reduce(const AggregateUDF* _o, Reporter& reporter);
void encode(Encoder& e, Reporter& reporter);
void decode(Decoder& d, Reporter& reporter);
};
Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
• • •
• • •
In-database MapReduce
start
encode
decode
reduce
finish
decode
map
reduce
encode
Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SQL and BI Tools
ODBC
SQL
Range Indexes
Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SQL and BI Tools
Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
HA/DR Features of MarkLogic
Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic 6
Flexible
Indexes
Full Text
Search
Schema-
Agnostic
Scalable
Analytic
Functions
Hadoop
Distribution
Alerting
& Event
Processing
Geospatial
Query
In-
database
MapReduce
Visualization
Widgets
Transactions
Role-based
Security
Automated
Failover
Replication Journal
Archiving
Point-in-
time
Recovery
Database
Rollback
Backup/
Restore
Distributed
Transactions
Super-
clusters
Powerful
Everything you
need to deliver
business value
Trusted
Enterprise-
ready for
mission-critical
apps
REST &
Java APIs
JSON
Storage
Application
Builder
Information
Studio
Hadoop
Connector
Content
Pump
BI
Integration
SQL
Support
Monitoring
&
Management
OS
Support
Accessible
Leverage existing
tools, knowledge,
skills
Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Any Questions?
Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is Semantics Technology?
Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Elasticity
 New tools to characterize and monitor the
resource requirements of your applications and
loads.
 Dynamic provisioning system that can add or
subtract resources on-the-fly to match the
loads.
 Distributed & virtualized environments including
VMWare, Amazon AWS and Hadoop are
supported to scale-out.
 Make the cloud a first-class citizen: Use Hadoop
HDFS or Amazon S3 for backup
Aligning infrastructure + demand, continually
Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Tiered storage
ML
SSD
local
HDFS
amzn s3
Benefits
 Keep data on tiers appropriate to
access needs = lower costs
 Detach and reattach storage when
needed. Fewer compute nodes
required = lower costs
 Leverage Hadoop HDFS investment
Choose infrastructure based on
value of data stored.
 100% online with different tiers
at different SLAs/topologies
 On-line/near-line mix utilizing
mount on-demand and
dynamic node spin-up.
Tiered Storage New Constructs
• Range partitions by Date/Scalar
manage group of forests by
range (“Q1” or “1990-1995”)
• Super Databases federate
queries across multiple
databases
Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Tiered Storage
96 504 1,044
592 2,066 2,080
Total Size (TB)
Total Cost ($000)
Operational
$25
Effective Unit Cost ($/GB)
$4
Compliance
$1.50
Analytic
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
The Bloor Group
The Bloor Group
Database Innovation
Database used to be a “zero-innovation market.”
Now it is the opposite.
Traditional (relational)
database is now seen
(rightly) as inadequate
in many respects
Big Data is, mainly, new
data posing new
problems
New products are
emerging and some
older products are
being given a make-over
(and gaining popularity)
Hadoop has changed
perceptions and
thinking about database
The Bloor Group
Multiple Database Roles
HAVE INCREASED SIGNIFICANTLY…
The Bloor Group
The Analytics Issue
The Bloor Group
The Origin of Big Data
The Bloor Group
NoSQL Confusion
As the graph indicates
NoSQL is a very
confusing descriptor.
WHAT CAN A GIVEN
DATABASE ACTUALLY
DO?
The important question is
The Bloor Group
The Joys and Sorrows of SQL
SQL:
Very good for set manipulation
Works for OLTP and many query
environments
Not good for nested data structures
(documents, web pages, etc.)
Not good for ordered data sets
Not good for data graphs (networks of
values)
The Bloor Group
!   In my view we have reached a situation where
there will be multiple “data engines.” Is that
MarkLogic’s view?
!   Specifically, are there data structures or
database contexts for which MarkLogic is
inappropriate?
!   What new features or capabilities are on the
MarkLogic roadmap?
!   In your view, is the “age of the data
warehouse” over?
The Bloor Group
!   Which sectors/businesses are currently in
MarkLogic’s “sweet spot”?
!   Data analytics involves much more than having
analytical functions in the database. It is more
than 50% data prep (merging, cleansing, joining,
transformation, etc.). How does MarkLogic
accommodate that?
!   What is MarkLogic’s attitude to the cloud?
Specifically, where would it recommend cloud
deployment?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
July: CLOUD
August: HIGH PERFORMANCE ANALYTICS
September: ANALYTICS
Upcoming Topics
www.insideanalysis.com
Twitter Tag: #briefr The Briefing Room
Thank You
for Your
Attention

Contenu connexe

Similaire à As You Seek – How Search Enables Big Data Analytics

Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Splunk live london_grs
Splunk live london_grsSplunk live london_grs
Splunk live london_grsjenny_splunk
 
Databases, CAP, ACID, BASE, NoSQL... oh my!
Databases, CAP, ACID, BASE, NoSQL... oh my!Databases, CAP, ACID, BASE, NoSQL... oh my!
Databases, CAP, ACID, BASE, NoSQL... oh my!DATAVERSITY
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Mark Tabladillo
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDBNaoki (Neo) SATO
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroDenodo
 
Microsoft .NET Portfolio
Microsoft .NET PortfolioMicrosoft .NET Portfolio
Microsoft .NET PortfolioEnterra
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...Juarez Junior
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scaleMaxim Salnikov
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startMaxim Salnikov
 

Similaire à As You Seek – How Search Enables Big Data Analytics (20)

Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Splunk live london_grs
Splunk live london_grsSplunk live london_grs
Splunk live london_grs
 
Databases, CAP, ACID, BASE, NoSQL... oh my!
Databases, CAP, ACID, BASE, NoSQL... oh my!Databases, CAP, ACID, BASE, NoSQL... oh my!
Databases, CAP, ACID, BASE, NoSQL... oh my!
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Microsoft .NET Portfolio
Microsoft .NET PortfolioMicrosoft .NET Portfolio
Microsoft .NET Portfolio
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start
 

Plus de Inside Analysis

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 

Plus de Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

As You Seek – How Search Enables Big Data Analytics

  • 1. The Briefing Room As You Seek—How Search Enables Big Data Analytics
  • 2. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com
  • 3. Twitter Tag: #briefr The Briefing Room !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Mission
  • 4. Twitter Tag: #briefr The Briefing Room JUNE: Database July: CLOUD August: HIGH PERFORMANCE ANALYTICS September: ANALYTICS
  • 5. Twitter Tag: #briefr The Briefing Room Database Better SEARCH Faster INSIGHT
  • 6. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com
  • 7. Twitter Tag: #briefr The Briefing Room ! MarkLogic is an enterprise-class NoSQL database company !   Key features of its database include ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, and government-grade security !   Its platform provides full-text query and search capabilities, application services and big data analytics MarkLogic
  • 8. Twitter Tag: #briefr The Briefing Room David Gorbet David Gorbet is Vice President of Engineering for MarkLogic, where he also runs the Support organization. Gorbet brings two decades of experience delivering some of the highest-volume applications and enterprise software in the world. Prior to MarkLogic, Gorbet helped pioneer Microsoft’s business online services strategy by founding and leading the SharePoint Online team. Gorbet holds a Bachelor of Applied Science degree in Systems Design Engineering with an additional major in Psychology from the University of Waterloo, and an MBA from the University of Washington Foster School of Business.
  • 9. MarkLogic: What it is, how it works David Gorbet, VP Engineering
  • 10. Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved. WE ARE THE NEW GENERATION DATABASE Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Application services • Faster time-to-results Relational Era “For all your structured data!” • Normalized, tabular model • Application- independent query • User control Hierarchical Era For your application data! • Application- and hardware-specific
  • 11. Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Real Value From Big Data Make The World More Secure Provide Access To Valuable Information Create New Revenue Streams Gain Insights to Increase Market Share Reduce Bottom Line Expense
  • 12. Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved. The MarkLogic Advantage Only Enterprise NoSQL Database  ACID compliant  Big data search  High availability  Replication  Point in-time recovery  Government-grade security  Real-time your Hadoop  Proven customer success
  • 13. Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved. How Does It Work? Schema-agnostic design Real-time indexing and query Event processing and alerting Scale-out shared-nothing cluster topology Analytics and Visualization High availability and disaster recovery
  • 14. Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Hierarchical Data Model  MarkLogic Server is a document-centric database  Supports any-structured data via hierarchical data model Document Title Author Section Section Section Section Section First Last Metadata Trade Cashflows Party Identifier Net Payment Payment Date Party Reference Payer Party trade ID Payment Amount Receiver Party
  • 15. Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic is Schema Agnostic JSON and XML are self-describing <article> <title>MarkLogic Server:… </title> <author> <first-name>John</first-name> <last-name>Doe</last-name> </author> <abstract> . . . .<company>MarkLogic</company>. . . . </abstract> <body> <section> <section>. . . .</section> </section> <section>…index…</section> </body> <copyright>Copyright © … </copyright> </article>
  • 16. Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic is Schema Agnostic JSON and XML are self-describing <article> <title> MarkLogic Server:… <author> <first-name> John <last-name> Doe <abstract> . . . . <company> MarkLogic . . . . <body> <section> <section> . . . . <section> …index… <copyright> Copyright © …
  • 17. Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved. “brown” 123, 125, 129, 152, 344, 491, … “mice” 123, 125, 126, 129, 130, 152, … “brown mice” 125, 152, 516, 522, 765, 890, … STEM “mouse” 123, 125, 126, 129, 130, 152, … STEM “brown mouse” 125, 152, 516, 522, 765, 890, … <article> … <article>/<abstract> … <section>/<paragraph> … <animal>mouse</animal> … <year>1950</year> … Collection:Draft … Role:Editor + Action:Read … … … … … … … Universal Index Term Term List MarkLogic indexes…  Words  Phrases  Stemming  Structure  Values  Collections  Security Permissions Document References 125, 516, 890, … Which draft articles contain the phrase brown mice?
  • 18. Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved. “brown” 123, 125, 129, 152, 344, 491, … “mice” 123, 125, 126, 129, 130, 152, … “brown mice” 125, 152, 516, 522, 765, 890, … STEM “mouse” 123, 125, 126, 129, 130, 152, … STEM “brown mouse” 125, 152, 516, 522, 765, 890, … <article> … <article>/<abstract> … <section>/<paragraph> … <animal>mouse</animal> … <year>1950</year> … Collection:Draft … Role:Editor + Action:Read … … … … … … … Scalar Queries Term Term List Document References 125, 516, 890, … Which draft articles that contain the phrase brown mice were written before 2010?
  • 19. Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Range Indexes Value ID 2002 3 2003 10 2004 5 2004 11 2007 4 2007 17 2009 1 2011 8 … … … … … … ID Value 1 2009 3 2002 4 2007 5 2004 8 2011 10 2003 11 2004 17 2007 … … … … … … Map document IDs to values, and vice-versa in a compact in-memory representation
  • 20. Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Geospatial Index: A 2-Dimensional Range Index Fully composable with all other indexes!  Built-in support for:  Point  Box  Circle  Polygon  Complex Polygon  Polygon Intersection  Polygon Containment
  • 21. Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Reverse Indexes (Alerting) 1. Load serialized queries as query documents 2. For a given data document, find all queries that match  Can provide real-time alerts during loads  With no significant performance impact!  Can let documents store values as "ranges"  Documents about cities self-defining their geo boundaries  Person documents defining birthdays as ranges, sequences  Can power classifiers and "matchmaker" queries
  • 22. Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Value ID 2002 3 2003 10 2004 5 2004 11 2007 4 2007 17 2009 1 2011 8 … … … … … … ID Value 1 2009 3 2002 4 2007 5 2004 8 2011 10 2003 11 2004 17 2007 … … … … … … Range Indexes Map document IDs to values, and vice-versa in a compact in-memory representation Range Indexes work like a built-in in-memory column store
  • 23. Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Facets and Aggregation
  • 24. Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Interactive Visualization
  • 25. Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved. In-database Analytic Functions Leverage ready-made analytic built-ins for commonly-used numeric applications  Variance  Covariance  Correlation  Standard deviation  Linear model  Median  Mode  Percentile  Rank  Percent-rank Benefits  Faster analytics-based application development  Supports more users & more data  Eliminates costs associated with writing custom code
  • 26. Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved. User-defined Functions class InfluenceRank : public AggregateUDF { public: struct Value { double sum, sum_sq, count; Value() : sum(0), sum_sq(0), count(0) {} } value; public: AggregateUDF* clone() const { return new InfluenceRank (*this); } void close() { delete this; } void start(Sequence&, Reporter&) {} void finish(OutputSequence& os, Reporter& reporter); void map(TupleIterator& values, Reporter& reporter); void reduce(const AggregateUDF* _o, Reporter& reporter); void encode(Encoder& e, Reporter& reporter); void decode(Decoder& d, Reporter& reporter); };
  • 27. Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved. • • • • • • In-database MapReduce start encode decode reduce finish decode map reduce encode
  • 28. Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved. SQL and BI Tools ODBC SQL Range Indexes
  • 29. Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved. SQL and BI Tools
  • 30. Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved. HA/DR Features of MarkLogic
  • 31. Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic 6 Flexible Indexes Full Text Search Schema- Agnostic Scalable Analytic Functions Hadoop Distribution Alerting & Event Processing Geospatial Query In- database MapReduce Visualization Widgets Transactions Role-based Security Automated Failover Replication Journal Archiving Point-in- time Recovery Database Rollback Backup/ Restore Distributed Transactions Super- clusters Powerful Everything you need to deliver business value Trusted Enterprise- ready for mission-critical apps REST & Java APIs JSON Storage Application Builder Information Studio Hadoop Connector Content Pump BI Integration SQL Support Monitoring & Management OS Support Accessible Leverage existing tools, knowledge, skills
  • 32. Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Any Questions?
  • 33. Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is Semantics Technology?
  • 34. Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Elasticity  New tools to characterize and monitor the resource requirements of your applications and loads.  Dynamic provisioning system that can add or subtract resources on-the-fly to match the loads.  Distributed & virtualized environments including VMWare, Amazon AWS and Hadoop are supported to scale-out.  Make the cloud a first-class citizen: Use Hadoop HDFS or Amazon S3 for backup Aligning infrastructure + demand, continually
  • 35. Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Tiered storage ML SSD local HDFS amzn s3 Benefits  Keep data on tiers appropriate to access needs = lower costs  Detach and reattach storage when needed. Fewer compute nodes required = lower costs  Leverage Hadoop HDFS investment Choose infrastructure based on value of data stored.  100% online with different tiers at different SLAs/topologies  On-line/near-line mix utilizing mount on-demand and dynamic node spin-up. Tiered Storage New Constructs • Range partitions by Date/Scalar manage group of forests by range (“Q1” or “1990-1995”) • Super Databases federate queries across multiple databases
  • 36. Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Tiered Storage 96 504 1,044 592 2,066 2,080 Total Size (TB) Total Cost ($000) Operational $25 Effective Unit Cost ($/GB) $4 Compliance $1.50 Analytic
  • 37. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 39. The Bloor Group Database Innovation Database used to be a “zero-innovation market.” Now it is the opposite. Traditional (relational) database is now seen (rightly) as inadequate in many respects Big Data is, mainly, new data posing new problems New products are emerging and some older products are being given a make-over (and gaining popularity) Hadoop has changed perceptions and thinking about database
  • 40. The Bloor Group Multiple Database Roles HAVE INCREASED SIGNIFICANTLY…
  • 41. The Bloor Group The Analytics Issue
  • 42. The Bloor Group The Origin of Big Data
  • 43. The Bloor Group NoSQL Confusion As the graph indicates NoSQL is a very confusing descriptor. WHAT CAN A GIVEN DATABASE ACTUALLY DO? The important question is
  • 44. The Bloor Group The Joys and Sorrows of SQL SQL: Very good for set manipulation Works for OLTP and many query environments Not good for nested data structures (documents, web pages, etc.) Not good for ordered data sets Not good for data graphs (networks of values)
  • 45. The Bloor Group !   In my view we have reached a situation where there will be multiple “data engines.” Is that MarkLogic’s view? !   Specifically, are there data structures or database contexts for which MarkLogic is inappropriate? !   What new features or capabilities are on the MarkLogic roadmap? !   In your view, is the “age of the data warehouse” over?
  • 46. The Bloor Group !   Which sectors/businesses are currently in MarkLogic’s “sweet spot”? !   Data analytics involves much more than having analytical functions in the database. It is more than 50% data prep (merging, cleansing, joining, transformation, etc.). How does MarkLogic accommodate that? !   What is MarkLogic’s attitude to the cloud? Specifically, where would it recommend cloud deployment?
  • 47. Twitter Tag: #briefr The Briefing Room
  • 48. Twitter Tag: #briefr The Briefing Room July: CLOUD August: HIGH PERFORMANCE ANALYTICS September: ANALYTICS Upcoming Topics www.insideanalysis.com
  • 49. Twitter Tag: #briefr The Briefing Room Thank You for Your Attention