SlideShare une entreprise Scribd logo
1  sur  11
Apache Gora
●

What is it ?

●

Gora – Nutch

●

Supports

●

Data Access

●

API's

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – What is it ?
●

Provides for Big Data
–
–

Persistence

–
●

In memory data model
Data store abstraction

Supports persisting to
–
–

Key/value stores

–

Document stores

–
●

Column stores

RDBMS's

Supports use of Hadoop

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – What is it ?
●

Released via Apache 2 license

●

Written in Java

●

Offers a persistence framework

●

Designed for big data applications

●

Used by Nutch 2.x for web crawl data storage

●

Used for
–

Persistence

–

Indexing

–

Analytics

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Nutch
●

Nutch 2.x now uses Gora
–

Abstracted storage

–

Data store independence

–

Handles object to persistent mappings

–

Use various NoSql solutions

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Supports
●

Gora supports the following
–

Apache Accumulo

–

Apache Cassandra

–

Apache Hbase

–

Amazon DynamoDB

–

Pig

–

Hive

–

Cascading

–

MapReduce

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Data Access
●

Java API for data access
–

●

Independent of location

Core Gora API's
–

Store

–

Persistency

–

Query

–

MapReduce

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Store API
●

Java API – org.apache.gora.store.*
–

DataStore handles object persistence

–

DataStore methods process objects
●
●
●
●

Persist
Fetch
Query
Delete

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Persistency API
●

Java API – org.apache.gora.persistency.*
–

Core classes
●

●

●

BeanFactory
– Construct keys
Persistent
– Persist objects
State
– State managed through StateManager
–
–

NEW, CLEAN (UNMODIFIED)
DIRTY (MODIFIED), DELETED

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Query API
●

Java API – org.apache.gora.query.*
–

Core classes
●

●

●

Query
– Constructed via DataStore
PartitionQuery
– Divide results of Query into partitions.
– Run queries on data nodes.
– Generate Hadoop InputSplits
Result

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – MapReduce API
●

Java API – org.apache.gora.mapreduce.*
–

GoraMapper

–

GoraReducer

–

ALL Record Counter

–

Reader

–

Writer

–

Hadoop / Avro
●
●
●

Serialise
De-serialise
Persistent

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Contact Us
●

Feel free to contact us at
–

www.semtech-solutions.co.nz

–

info@semtech-solutions.co.nz

●

We offer IT project consultancy

●

We are happy to hear about your problems

●

You can just pay for those hours that you need

●

To solve your problems

Contenu connexe

Plus de Mike Frampton

Plus de Mike Frampton (20)

Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
OrientDB
OrientDBOrientDB
OrientDB
 
Prometheus
PrometheusPrometheus
Prometheus
 
Apache Tephra
Apache TephraApache Tephra
Apache Tephra
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
JanusGraph DB
JanusGraph DBJanusGraph DB
JanusGraph DB
 
Apache Ignite
Apache IgniteApache Ignite
Apache Ignite
 
Apache Samza
Apache SamzaApache Samza
Apache Samza
 
Apache Flink
Apache FlinkApache Flink
Apache Flink
 
Apache Edgent
Apache EdgentApache Edgent
Apache Edgent
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
An introduction to Apache Mesos
An introduction to Apache MesosAn introduction to Apache Mesos
An introduction to Apache Mesos
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to Pentaho
 
An introduction to Apache Thrift
An introduction to Apache ThriftAn introduction to Apache Thrift
An introduction to Apache Thrift
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache Cassandra
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 

Dernier

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

An introduction to Apache Gora

  • 1. Apache Gora ● What is it ? ● Gora – Nutch ● Supports ● Data Access ● API's www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Apache Gora – What is it ? ● Provides for Big Data – – Persistence – ● In memory data model Data store abstraction Supports persisting to – – Key/value stores – Document stores – ● Column stores RDBMS's Supports use of Hadoop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Apache Gora – What is it ? ● Released via Apache 2 license ● Written in Java ● Offers a persistence framework ● Designed for big data applications ● Used by Nutch 2.x for web crawl data storage ● Used for – Persistence – Indexing – Analytics www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Apache Gora – Nutch ● Nutch 2.x now uses Gora – Abstracted storage – Data store independence – Handles object to persistent mappings – Use various NoSql solutions www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Apache Gora – Supports ● Gora supports the following – Apache Accumulo – Apache Cassandra – Apache Hbase – Amazon DynamoDB – Pig – Hive – Cascading – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Apache Gora – Data Access ● Java API for data access – ● Independent of location Core Gora API's – Store – Persistency – Query – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Apache Gora – Store API ● Java API – org.apache.gora.store.* – DataStore handles object persistence – DataStore methods process objects ● ● ● ● Persist Fetch Query Delete www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Apache Gora – Persistency API ● Java API – org.apache.gora.persistency.* – Core classes ● ● ● BeanFactory – Construct keys Persistent – Persist objects State – State managed through StateManager – – NEW, CLEAN (UNMODIFIED) DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 9. Apache Gora – Query API ● Java API – org.apache.gora.query.* – Core classes ● ● ● Query – Constructed via DataStore PartitionQuery – Divide results of Query into partitions. – Run queries on data nodes. – Generate Hadoop InputSplits Result www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 10. Apache Gora – MapReduce API ● Java API – org.apache.gora.mapreduce.* – GoraMapper – GoraReducer – ALL Record Counter – Reader – Writer – Hadoop / Avro ● ● ● Serialise De-serialise Persistent www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 11. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems