2. Agile project scope details – User story , Scrum cycles.
TechnologyStack details.
SetupVirtual machine based Hadoop cluster setup.
Installation of Hadoop , Hive and Sqoop.
ONLINETRANING2011@GMAIL.COM
3. Analyze the payment data xmls.
Parse xml data using choice of technology(DOM , JAXB etc).
Load data in RDBMS tables in incremental mode.
Schedule the preprocessing job to run for every 30 min run (
Javascheduler Quartz- source 1 every 15 min, crontab - source 2
: every 1 hour )
Add multithreading / parallel process model. (To handle large
volumes .
ONLINETRANING2011@GMAIL.COM|
4. Build data migration flow from RDBMS into Hadoop/ Hive
usingSqoop.
Create Import tables in Hive.
Create Sqoop - Hive data import script.
Verify data import records and write error for records
mismatch.
ONLINETRANING2011@GMAIL.COM|
5. RunHive analytic query and store output data in result table.(
Schedule the job to run)
Execute Hive joins for complex queries.
Write UDF for data normalization.
Use Sqoop to resend data from Hive to RDBMS through shell
script .
ONLINETRANING2011@GMAIL.COM
6. Visualize output data in RDBMS table using open
source/commercialtools like Tableau.
Create report using Bar graph to show the trends for issue rate.
.
Create report using Pie chart for payment data distribution on
issues.
Use Hiveserver2 to connect and generate live analytic results.
ONLINETRANING2011@GMAIL.COM|
7. Email: onlinetraining2011@gmail.com
Skype:onlinetraining2011
Some live sessionvideos:
HadoopInstallation: http://www.youtube.com/watch?v=i9yckEduQBE
HDFS File system Lab: http://www.youtube.com/watch?v=ZIpJ5LUWNUw
Linkedingroup :
http://www.linkedin.com/groups/Online-Hadoop-Training-4838165
Blog: http://onlinetraining2011.blogspot.com
ProjectDuration: 20 hours
TrainingMedium: OnlineviaGotomeeting
ONLINETRANING2011@GMAIL.COM|