Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
XML TO HIVE
1. Multiple XML data load to Hive Environment
Daily region wise Sales XML data file from Univlever Group :
File name and path : homehadoopUnilever_RAW_data market.xml
<Table><State>Telengana</State><District>kurnool</District><Market>Fruit-
Market</Market><Commodity>Bathing-Soap(Lux)</Commodity><Variety>Bath-
soap</Variety><Arrival_Date>24/04/2016</Arrival_Date><Min_x0020_Price>25</Min_x0020_Pr
ice><Max_x0020_Price>28</Max_x0020_Price><Modal_x0020_Price>29</Modal_x0020_Price>
</Table>
<Table><State>Telengana</State><District>kadapa</District><Market>sanjay
market</Market><Commodity>Bathing-Soap(Margo)</Commodity><Variety>Bath-
soap</Variety><Arrival_Date>25/04/2016</Arrival_Date><Min_x0020_Price>24</Min_x0020_Pr
ice><Max_x0020_Price>30</Max_x0020_Price><Modal_x0020_Price>30</Modal_x0020_Price>
</Table>
<Table><State>Andra Pradesh</State><District>chitoor</District><Market>TTD
Market</Market><Commodity>Laddu(Spl made with
Ghee)</Commodity><Variety>Ladoo</Variety><Arrival_Date>25/04/2016</Arrival_Date><Min_
x0020_Price>25</Min_x0020_Price><Max_x0020_Price>100</Max_x0020_Price><Modal_x002
0_Price>120</Modal_x0020_Price></Table>
<Table><State>Andra Pradesh</State><District>Anantapur</District><Market>Vishal
Market</Market><Commodity>Tyres(MRF, CEAT, Zeal)
</Commodity><Variety>Tyre</Variety><Arrival_Date>25/04/2016</Arrival_Date><Min_x0020_
Price>12000</Min_x0020_Price><Max_x0020_Price>25000</Max_x0020_Price><Modal_x0020
_Price>c1200</Modal_x0020_Price></Table>
File name and path : homehadoopUnilever_RAW_data market1.xml
<Table><State>Andra Pradesh</State><District>Nellore</District><Market>Big
Theaters</Market><Commodity>cinemas</Commodity><Variety>cienmas(Hindi,English,Telegu)
</Variety><Arrival_Date>24/07/2015</Arrival_Date><Min_x0020_Price>100</Min_x0020_Price
><Max_x0020_Price>150</Max_x0020_Price><Modal_x0020_Price>BiG-
C</Modal_x0020_Price></Table>
<Table><State>Andra Pradesh</State><District>Guntur</District><Market>chilli
Market</Market><Commodity>chilli</Commodity><Variety>clilli(Red)</Variety><Arrival_Date>
24/07/2015</Arrival_Date><Min_x0020_Price>50</Min_x0020_Price><Max_x0020_Price>100</
Max_x0020_Price><Modal_x0020_Price>RED-C</Modal_x0020_Price></Table>
2. Objective :
The aim of the exercise is to analyse the data and generate sales trends like 52 weeks
high/low, month by month trends, state wise trends and overall price fluctuations for
various products and store the final output in form of JSON documents in NoSQL
database MongoDB.
Solution
Step 1:
hadoop fs -mkdir /xml/data/commodity
hadoop fs -copyFromLocal *.xml /xml/data/commodity
Step 2:
Need to download hivexmlserde-1.0.5.3.jar; [ By default this jar will not be
available in hive/lib folder, for that reason need to download and add the below
jar ...]
http://mvnrepository.com/artifact/com.ibm.spss.hive.serde2.xml/hivexmlserde/1.
0.5.3
hive> add jar /home/hadoop/apache-hive-1.2.1-bin/lib/hivexmlserde-1.0.5.3.jar;