Presentation on how to chat with PDF using ChatGPT code interpreter
An introduction to Apache Gora
1. Apache Gora
●
What is it ?
●
Gora – Nutch
●
Supports
●
Data Access
●
API's
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
2. Apache Gora – What is it ?
●
Provides for Big Data
–
–
Persistence
–
●
In memory data model
Data store abstraction
Supports persisting to
–
–
Key/value stores
–
Document stores
–
●
Column stores
RDBMS's
Supports use of Hadoop
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
3. Apache Gora – What is it ?
●
Released via Apache 2 license
●
Written in Java
●
Offers a persistence framework
●
Designed for big data applications
●
Used by Nutch 2.x for web crawl data storage
●
Used for
–
Persistence
–
Indexing
–
Analytics
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
4. Apache Gora – Nutch
●
Nutch 2.x now uses Gora
–
Abstracted storage
–
Data store independence
–
Handles object to persistent mappings
–
Use various NoSql solutions
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
5. Apache Gora – Supports
●
Gora supports the following
–
Apache Accumulo
–
Apache Cassandra
–
Apache Hbase
–
Amazon DynamoDB
–
Pig
–
Hive
–
Cascading
–
MapReduce
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
6. Apache Gora – Data Access
●
Java API for data access
–
●
Independent of location
Core Gora API's
–
Store
–
Persistency
–
Query
–
MapReduce
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
7. Apache Gora – Store API
●
Java API – org.apache.gora.store.*
–
DataStore handles object persistence
–
DataStore methods process objects
●
●
●
●
Persist
Fetch
Query
Delete
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
8. Apache Gora – Persistency API
●
Java API – org.apache.gora.persistency.*
–
Core classes
●
●
●
BeanFactory
– Construct keys
Persistent
– Persist objects
State
– State managed through StateManager
–
–
NEW, CLEAN (UNMODIFIED)
DIRTY (MODIFIED), DELETED
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
9. Apache Gora – Query API
●
Java API – org.apache.gora.query.*
–
Core classes
●
●
●
Query
– Constructed via DataStore
PartitionQuery
– Divide results of Query into partitions.
– Run queries on data nodes.
– Generate Hadoop InputSplits
Result
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
10. Apache Gora – MapReduce API
●
Java API – org.apache.gora.mapreduce.*
–
GoraMapper
–
GoraReducer
–
ALL Record Counter
–
Reader
–
Writer
–
Hadoop / Avro
●
●
●
Serialise
De-serialise
Persistent
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
11. Contact Us
●
Feel free to contact us at
–
www.semtech-solutions.co.nz
–
info@semtech-solutions.co.nz
●
We offer IT project consultancy
●
We are happy to hear about your problems
●
You can just pay for those hours that you need
●
To solve your problems