Help your Enterprise Implement Big Data with Control-M for Hadoop

© Copyright 1/15/2015 BMC Software, Inc1
Joe Goldberg
Solutions Marketing
October 2014
Help your Enterprise Implement
Big Data with Control-M for
Hadoop

Who is using Hadoop and why?

Who is using Big Data?
$
• New Account Risk Screens
• Fraud Prevention
• Trading Risk
• Insurance Underwriting
• Accelerate Loan Processing
Financial Services
• 360° View of the Customer
• Analyze Brand Sentiment
• Personalized Promotions
• Website Optimization
• Optimal Store Layout
Retail
• Call Detail Records (CDRs)
• Infrastructure Investment
• Next Product to Buy (NPTB)
• Real-time Bandwidth Allocation
• New Product Development
Telecom
Healthcare
• Genomic data for medical trials
• Monitor patient vitals
• Reduce re-admittance rates
• Store medical research data
• Recruiting for pharmaceutical trials
Utilities, Oil & Gas
• Smart meter stream analysis
• Optimize lease bidding
• Compliance reporting
• Proactive equipment repair
• Seismic image processing
Public Sector
• Analyze public sentiment
• Protect critical networks
• Prevent fraud and waste
• Crowdsourced reporting
• Fulfill open records requests

Hadoop
Lots
of
Data
Traditional

The Players

Building a (Hadoop) Business Service

• Identify Data Sources and Targets
• Write code
• Test
• Deploy
• Production
The Steps

SQL Data Source
• SQL Server
– Write SQL script/Stored Procedure
– Learn SQL Agent Job Definition
– Write powershell script/bat file
– Define job
– Run job

ETL Data Source
• Informatica
– Build Informatica workflows
– Learn PowerCenter Scheduler
– Write Scripts
– Build PowerCenter job
– Run job

Files Data Source
• Move files with FTP
– Learn FTP tool
– Write scripts
– Run FTP

Run Hadoop Jobs
• Oozie/Hue
– Write the MapReduce, Pig
– Learn Oozie
– Write scripts
– Build workflows

SQL Query
#!/usr/bin/sh
# Sample pmcmdscript
set pagesize 0 linesize 80 feedback off
SELECT 'The database ' || instance_name ||
' has been running since ' || to_char(startup_time, 'HH24:MI MM/DD/YYYY')
FROM v$instance;
SELECT 'There are ' || count(status) ||
' data files with a status of ' || status
FROM dba_data_files
GROUP BY status
ORDER BY status;
SELECT 'The total storage used by the data files is ' ||
sum(bytes)/1024/1024 || ' MB'
FROM dba_data_files;
#!/usr/bin/env bash
bin=`dirname"$0"`
bin=`cd "$bin"; pwd`
. "$bin"/../libexec/hadoop-config.sh
#set the hadoopcommandand the path to the hadoop jar
HADOOP_CMD="${HADOOP_PREFIX}/bin/hadoop --config $HADOOP_CONF_DIR“
#find the hadoop jar
HADOOP_JAR='‘
#find under HADOOP_PREFIX (tar ball install)
HADOOP_JAR=`find ${HADOOP_PREFIX} -name 'hadoop--*.jar' | head -n1`
#if its not found look under /usr/share/hadoop(rpm/deb installs)
if [ "$HADOOP_JAR"== '' ]then
HADOOP_JAR=`find /usr/share/hadoop-name 'hadoop--*.jar' |
head -n1`
fi
#if it is still empty then dont run the tests
echo "Did not find hadoop--*.jarunder '${HADOOP_PREFIX} or
'/usr/share/hadoop'"
exit 1
fi
#dir where to store the data on hdfs. The data is relative of the users home dir on hdfs.
PARENT_DIR="validate_deploy_`date+%s`“
TERA_GEN_OUTPUT_DIR="${PARENT_DIR}/tera_gen_data“
TERA_SORT_OUTPUT_DIR="${PARENT_DIR}/tera_sort_data“
Hadoop
#!/bin/ksh
cd /home/bmcU1ser/ftp_race_source
sftp -b /dev/stdin -o Cipher=blowfish -o Compression=yes -o BatchMode=yes -o
IdentityFile=/export/home/user/.ssh/id_rsa -o Port=22 bmcUs1ser@hou-hadoop-
mstr 1>sftp.log 2>&1 <<ENDSFTP
if [ -f /home/bmcU1ser/ftp_race_target/daily_shipment_log]; then
exit 1
else
put daily_shipment_log/home/bmcU1ser/ftp_race_target
fi
quit
ENDSFTP
rc=$?
if [[ $rc != 0 ]]; then
print "***Erroroccurred...$rc" `date "+%Y-%m-%d-%H.%M.%S"`
if [[ -f /home/bmcU1ser/ftp_race_target/daily_shipment_log ]];
then
rm /home/bmcU1ser/ftp_race_target/daily_shipment_log
fi
else
mv /home/bmcU1ser/ftp_race_source/daily_shipment_log
/home/bmcU1ser/ftp_race_source/old/daily_shipment_log
print "***Successful transfer...$rc" `date "+%Y-%m-%d-%H.%M.%S"`
fi
File TransferInformatica
#!/usr/bin/bash
# Sample pmcmdscript
# Check if the service is alive
pmcmd pingservice -sv testService -d testDomain
if [ "$?" != 0 ]; then
# handle error
echo "Could not ping service"
exit
fi
# Get service properties
pmcmd getserviceproperties -sv testService -d testDomain
if [ "$?" != 0 ]; then
# handle error
echo "Could not get service properties"
exit
fi
# Get task details for session task "s_testSessionTask" of workflow
# "wf_test_workflow" in folder "testFolder"
pmcmd gettaskdetails -sv testService -d testDomain -u Administrator -p adminPass
-folder testFolder -workflow wf_test_workflow s_testSessionTask
if [ "$?" != 0 ]; then
# handle error
echo "Could not get details for task s_testSessionTask"
exit
fi
Programmers program

What happens when this runs?
• What is related to what?
• Are we on time or late?
• What if something fails?
– Which program was running?
– Where is the output?
– How do I fix it?
– Can I just rerun it? If so,
from the beginning?
– Does any cleanup have to be done?
– How do I track this problem and the steps
taken to resolve the problem?
#!/usr/bin/env bash
bin=`dirname"$0"`
bin=`cd "$bin"; pwd`
. "$bin"/../libexec/hadoop-config.sh
#set the hadoopcommandand the path to the hadoop jar
HADOOP_CMD="${HADOOP_PREFIX}/bin/hadoop --config $HADOOP_CONF_DIR“
#find the hadoop jar
HADOOP_JAR='‘
#find under HADOOP_PREFIX (tar ball install)
HADOOP_JAR=`find ${HADOOP_PREFIX} -name 'hadoop--*.jar' | head -n1`
#if its not found look under /usr/share/hadoop(rpm/deb installs)
HADOOP_JAR=`find /usr/share/hadoop-name 'hadoop--*.jar' |
head -n1`
fi
#if it is still empty then dont run the tests
echo "Did not find hadoop--*.jarunder '${HADOOP_PREFIX} or
'/usr/share/hadoop'"
exit 1
fi
#dir where to store the data on hdfs. The data is relative of the users home dir on hdfs.
PARENT_DIR="validate_deploy_`date+%s`“
TERA_GEN_OUTPUT_DIR="${PARENT_DIR}/tera_gen_data“
TERA_SORT_OUTPUT_DIR="${PARENT_DIR}/tera_sort_data“
Hadoop

SQL Query HadoopFile TransferInformatica
A Better Way

Defining Control-M for Hadoop jobs
Set Script parameters
Hadoop Program
parameters
HDFS commands
• get
• put
• rm
• move
• rename
Supports all Apache
Distributions (0.x-2.x):
• Cloudera
• Hortonworks
• MapR
• Pivotal
• BigInsights

Building a Hadoop Business Process
HDFS
Java MapReduce
Pig
Hive
Sqoop
File Transfer
Informatica
DataStage
Business Objects
Cognos
Oracle
Sybase
SQL Server
SSIS
PostgreSQL
z/OS
Linux/Unix/Windows
Amazon EC2 / VMware
NetBackup / TSM
SAP / OEBS / Peoplesoft

Connection Profile

Monitoring Workflows
Resource Manager report

Workload Conversion & Discovery

BMC Control-M Workload Automation
Hadoop Application Developers
Write programs Build Hadoop jobs Add Pre/Post Jobs
Access for the
Business
IT Scheduler
Pig
Hive
MapReduce
Sqoop
HDFS File Watcher

And the fun is just beginning…

Partner
Why BMC Control-M for Hadoop?

Key Takeaways
 Eliminate scripting with built-in
capabilities
 Reduce the complexity of
building and testing applications
 Production Applications run
more reliably, are easier to
monitor and ensure compliance
Big Data and Hadoop are coming to your Data Center
 Easily build composite
applications leveraging the full
power of your technology fabric
Build Applications Faster Increase Service Quality Gain Business Agility

For Additional Information: www.bmc.com/hadoop

Thank You.
Be sure to visit Control-M Labs

Help your Enterprise Implement Big Data with Control-M for Hadoop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Help your Enterprise Implement Big Data with Control-M for Hadoop

Similaire à Help your Enterprise Implement Big Data with Control-M for Hadoop (20)

Plus de BMC Software

Plus de BMC Software (17)

Dernier

Dernier (20)

Help your Enterprise Implement Big Data with Control-M for Hadoop

Notes de l'éditeur