SlideShare une entreprise Scribd logo
1  sur  13
Farzad Nozarian
4/25/15 @AUT
Purpose
This guide describes how to get Shark running locally. It creates a small Hive
installation on one machine and allows you to execute simple queries.
The only prerequisite for this guide is that you have Java and Scala
2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can
download it by running:
2
$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz
Running Shark In Other Modes
• You can also start your Shark in one of the three other supported modes:
• Running Shark on EC2
• Running Shark on a Cluster
• Running Shark with Tachyon
3
Let’s Start…(1/3)
• Download the binary distribution of Shark 0.8.
• The package contains two folders, shark-0.8.0 and hive-0.9.0-
shark-0.8.0-bin.
4
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-
hadoop1.tgz # Hadoop 1/CDH3 - or -
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-
cdh4.tgz # Hadoop 2/CDH4
$ tar xvfz shark-*-bin-*.tgz
$ cd shark-*-bin-*
• The Shark code is in the shark-0.8.0/ directory.
Let’s Start…(2/3)
• To setup your environment to run Shark locally, you need to set
HIVE_HOME and SCALA_HOME environmental variables in a file shark-
0.8.0/conf/shark-env.sh to point to the folders you just downloaded.
• Shark comes with a template file shark-env.sh.template that you can
copy and modify to get started:
5
$ cp shark-0.8.0/conf/shark-env.sh.template shark-0.8.0/conf/shark-env.sh
• Now edit the following two lines in shark-env.sh:
export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin
export SCALA_HOME=/path/to/scala-2.9.3
Let’s Start…(3/3)
• Next, create the default Hive warehouse directory. This is where Hive will
store table data for native tables:
6
$ sudo mkdir -p /user/hive/warehouse
$ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner
• You can now start the Shark CLI:
$ ./bin/shark
• In addition to the Shark CLI, there are several executables in shark-0.8.0/bin:
bin/shark-withdebug
bin/shark-withinfo
: Runs Shark CLI with DEBUG level logs printed to the console.
: Runs Shark CLI with INFO level logs printed to the console.
Lab
Assignment
1. Launch the Shark shell.
2. Create a table called book … .
3. List all the columns of the table book.
4. Load the book table from the file books in
the local filesystem.
5. Create a table called novel, containing
those records from table book … .
6. Print out the list of available tables.
7. Count the number of records from the
table book.
8. Print out the total cost of the books with
authors who have the same last name.
9. Count the number of distinct last names.
10. Drop the tables.
7
Lab Assignment 5 (1/5)
1. Launch the Shark shell.
2. Create a table called book whose schema includes book's title,
description, author's first name, last name, and cost.
3. List all the columns of the table book.
8
shark
create table
book(title string, description string, firstname string, lastname string, cost int)
row format delimited fields terminated by 't';
describe book;
Lab Assignment 5 (2/5)
4. Load the book table from the file books in the local filesystem. The books
file has the following format:
9
load data local inpath 'books' into table book;
Speed love Long book about love Brian Dog 10
Long day Story about Monday Emily Blue 20
Flying Car Novel about airplanes Phil High 5
Short day Novel about a day Phil Dog 30
Lab Assignment 5 (3/5)
As an alternative solution, you can create the an external table. The
external keyword lets you to create a table and provide a location so that
Hive does not use a default location for this table. This would be useful if
you already have data generated.
10
create external table
exbook(title string, description string, firstname string, lastname string, cost int)
row format delimited fields terminated by 't'
location '<file location, excluding the name of the file>';
5. Create a table called novel, containing those records from table book
that have keyword “novel” in their description and cache it in memory.
create table novel TBLPROPERTIES('shark.cache'='MEMORY_ONLY')
as select * from book where description like "%Novel%";
Lab Assignment 5 (4/5)
6. Print out the list of available tables.
11
show tables;
select lastname, sum(cost) from book group by lastname;
7. Count the number of records from the table book.
select count(*) from book;
8. Print out the total cost of the books with authors who have the same last
name.
9. Count the number of distinct last names.
select count(distinct lastname) from book;
Lab Assignment 5 (5/5)
10. Drop the tables.
12
drop table book;
drop table novel;
References:
• https://github.com/amplab/shark/wiki/Running-Shark-Locally
13

Contenu connexe

Tendances

Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solrMike Frampton
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSSaumitra Srivastav
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBaseAmal Abid
 
eZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisitedeZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisitedBertrand Dunogier
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteInstall Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteTushar B Kute
 
Install Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteInstall Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteTushar B Kute
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseShiva Rama Krishna Dasharathi
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee Nur Ahammad
 
InfiniFlux collector
InfiniFlux collectorInfiniFlux collector
InfiniFlux collectorInfiniFlux
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Libraryrajivkumarmca
 

Tendances (17)

Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFS
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 
eZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisitedeZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisited
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteInstall Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. Kute
 
Install Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteInstall Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. Kute
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
Drupal from scratch
Drupal from scratchDrupal from scratch
Drupal from scratch
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee
 
InfiniFlux collector
InfiniFlux collectorInfiniFlux collector
InfiniFlux collector
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Library
 

En vedette

Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformFarzad Nozarian
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 

En vedette (8)

Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing Environments
 
Object Based Databases
Object Based DatabasesObject Based Databases
Object Based Databases
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 

Similaire à Get Shark Running Locally

Using Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSXUsing Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSXPuppet
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboyKenneth Geisshirt
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleMohammed Farrag
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294IkiArif1
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder
 
04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark Installation04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark InstallationAlexander Bitar
 
Geecon 2019 - Taming Code Quality in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality  in the Worst Language I Know: BashGeecon 2019 - Taming Code Quality  in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality in the Worst Language I Know: BashMichał Kordas
 
Unix primer
Unix primerUnix primer
Unix primerdummy
 
Bash shell
Bash shellBash shell
Bash shellxylas121
 
390aLecture05_12sp.ppt
390aLecture05_12sp.ppt390aLecture05_12sp.ppt
390aLecture05_12sp.pptmugeshmsd5
 
Introduction to linux
Introduction to linuxIntroduction to linux
Introduction to linuxQIANG XU
 
Introduction to linux2
Introduction to linux2Introduction to linux2
Introduction to linux2Gourav Varma
 

Similaire à Get Shark Running Locally (20)

Php introduction
Php introductionPhp introduction
Php introduction
 
Using Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSXUsing Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSX
 
Gophers, whales and.. clouds? Oh my!
Gophers, whales and.. clouds? Oh my!Gophers, whales and.. clouds? Oh my!
Gophers, whales and.. clouds? Oh my!
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete Example
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
 
04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark Installation04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark Installation
 
Linux configer
Linux configerLinux configer
Linux configer
 
Directories description
Directories descriptionDirectories description
Directories description
 
Geecon 2019 - Taming Code Quality in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality  in the Worst Language I Know: BashGeecon 2019 - Taming Code Quality  in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality in the Worst Language I Know: Bash
 
Unix primer
Unix primerUnix primer
Unix primer
 
Bash shell
Bash shellBash shell
Bash shell
 
390aLecture05_12sp.ppt
390aLecture05_12sp.ppt390aLecture05_12sp.ppt
390aLecture05_12sp.ppt
 
Introduction to linux
Introduction to linuxIntroduction to linux
Introduction to linux
 
Ruby
RubyRuby
Ruby
 
Docker perl build
Docker perl buildDocker perl build
Docker perl build
 
Final Report - Spark
Final Report - SparkFinal Report - Spark
Final Report - Spark
 
Introduction to linux2
Introduction to linux2Introduction to linux2
Introduction to linux2
 

Dernier

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 

Dernier (20)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 

Get Shark Running Locally

  • 2. Purpose This guide describes how to get Shark running locally. It creates a small Hive installation on one machine and allows you to execute simple queries. The only prerequisite for this guide is that you have Java and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running: 2 $ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz $ tar xvfz scala-2.9.3.tgz
  • 3. Running Shark In Other Modes • You can also start your Shark in one of the three other supported modes: • Running Shark on EC2 • Running Shark on a Cluster • Running Shark with Tachyon 3
  • 4. Let’s Start…(1/3) • Download the binary distribution of Shark 0.8. • The package contains two folders, shark-0.8.0 and hive-0.9.0- shark-0.8.0-bin. 4 $ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin- hadoop1.tgz # Hadoop 1/CDH3 - or - $ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin- cdh4.tgz # Hadoop 2/CDH4 $ tar xvfz shark-*-bin-*.tgz $ cd shark-*-bin-* • The Shark code is in the shark-0.8.0/ directory.
  • 5. Let’s Start…(2/3) • To setup your environment to run Shark locally, you need to set HIVE_HOME and SCALA_HOME environmental variables in a file shark- 0.8.0/conf/shark-env.sh to point to the folders you just downloaded. • Shark comes with a template file shark-env.sh.template that you can copy and modify to get started: 5 $ cp shark-0.8.0/conf/shark-env.sh.template shark-0.8.0/conf/shark-env.sh • Now edit the following two lines in shark-env.sh: export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin export SCALA_HOME=/path/to/scala-2.9.3
  • 6. Let’s Start…(3/3) • Next, create the default Hive warehouse directory. This is where Hive will store table data for native tables: 6 $ sudo mkdir -p /user/hive/warehouse $ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner • You can now start the Shark CLI: $ ./bin/shark • In addition to the Shark CLI, there are several executables in shark-0.8.0/bin: bin/shark-withdebug bin/shark-withinfo : Runs Shark CLI with DEBUG level logs printed to the console. : Runs Shark CLI with INFO level logs printed to the console.
  • 7. Lab Assignment 1. Launch the Shark shell. 2. Create a table called book … . 3. List all the columns of the table book. 4. Load the book table from the file books in the local filesystem. 5. Create a table called novel, containing those records from table book … . 6. Print out the list of available tables. 7. Count the number of records from the table book. 8. Print out the total cost of the books with authors who have the same last name. 9. Count the number of distinct last names. 10. Drop the tables. 7
  • 8. Lab Assignment 5 (1/5) 1. Launch the Shark shell. 2. Create a table called book whose schema includes book's title, description, author's first name, last name, and cost. 3. List all the columns of the table book. 8 shark create table book(title string, description string, firstname string, lastname string, cost int) row format delimited fields terminated by 't'; describe book;
  • 9. Lab Assignment 5 (2/5) 4. Load the book table from the file books in the local filesystem. The books file has the following format: 9 load data local inpath 'books' into table book; Speed love Long book about love Brian Dog 10 Long day Story about Monday Emily Blue 20 Flying Car Novel about airplanes Phil High 5 Short day Novel about a day Phil Dog 30
  • 10. Lab Assignment 5 (3/5) As an alternative solution, you can create the an external table. The external keyword lets you to create a table and provide a location so that Hive does not use a default location for this table. This would be useful if you already have data generated. 10 create external table exbook(title string, description string, firstname string, lastname string, cost int) row format delimited fields terminated by 't' location '<file location, excluding the name of the file>'; 5. Create a table called novel, containing those records from table book that have keyword “novel” in their description and cache it in memory. create table novel TBLPROPERTIES('shark.cache'='MEMORY_ONLY') as select * from book where description like "%Novel%";
  • 11. Lab Assignment 5 (4/5) 6. Print out the list of available tables. 11 show tables; select lastname, sum(cost) from book group by lastname; 7. Count the number of records from the table book. select count(*) from book; 8. Print out the total cost of the books with authors who have the same last name. 9. Count the number of distinct last names. select count(distinct lastname) from book;
  • 12. Lab Assignment 5 (5/5) 10. Drop the tables. 12 drop table book; drop table novel;