SlideShare une entreprise Scribd logo
1  sur  9
Big Data in Azure: Demo and Hands-On Labs
Pre-requisites
 You will needanAzure subscriptionwithavailableHDInsightcores
 PowerBI / Excel 2013
o Downloadthe PowerQueryadd-in,choose 32bitor64bit to match your Office installation
http://www.microsoft.com/en-us/download/details.aspx?id=39379&CorrelationId=d8002172-0438-
4ef5-b0fa-e635f8f17251
o Enable PowerPivotandPowerViewinyourExcel options –com add-ins.
 DownloadHOLlabs https://github.com/Azure-Readiness/CloudDataCamp.ForApril 30 onlyuse
https://github.com/cindygross/CloudDataCamp instead. If youalreadyhave GitHubinstalled,choose to“Clone
inDesktop”.Otherwise choose“DownloadZIP” andUNZIP the files.Save the locationtoaNotepadfile.
 Data movement–one or both
o GUI: Install CloudXplorerhttp://clumsyleaf.com/products/downloads.Iwill be usingv3,youcan
downloadthe v3trial or the free v1 (withfewerfeatures).
o Cmd line:Install AzCopy http://azure.microsoft.com/en-us/documentation/articles/storage-use-
azcopy/.Save the install locationasyouwill needitlater,itwill defaultto(withoutthe x86on 32bit)
C:ProgramFiles(x86)MicrosoftSDKsAzureAzCopy.
 Install SQL2014 SSMS http://www.microsoft.com/en-gb/download/details.aspx?id=42299
 Today’sslides:http://tinyurl.com/lxutdd4
Goal
Understandhowto use some of the commonpiecesof anAzure hosted BigData and Analyticssolution.These
componentsare oftenpartof an Internet of Thingssolution,whichisacommonBig Data and Analyticsscenario.
 At the endof thishands-onlabyouwill have:
o Createdan Azure storage accountand container thenloaded datatoit. You will alsouse thisaccountfor
storage of data generatedinothersteps.
o Create a Hadoop onAzure instance (HDInsight),addedstructure (tables) storedinHCatalog,andqueried
the data on the storage account usingHive.
o Connectedan AzureMLexperimenttoHive – Hadoopis “justanotherdata source”.
o Create and ran an Azure StreamAnalyticsjobthatreadsdata generatedonthe flyfromyourlaptopviaa
Service BusEventHub andoutputsaggregateddata to a SQL Azure database.
o UsedPowerBI to visualize andpresentthe data.
Labs
We’re goingto use a modifiedversionof the CloudDataCamphandson labs.Those labshave screenshotsandmore
detailedinstructionsthanwhatIhave below,please refertothe original docsif youneedmore detailedsteps.
Guidelines
 Many nameswithinAzure have tobe globallyunique,tryprefixingserviceswithyourinitialsorcompanyname.
Some service namesmustbe all lowercase,it’seasiertomake all nameslowercase. Forthislabprefix all names
withthe same identifier. OpenNotepadandtype inthe name of the prefix youwill use.
 Let’spicka single datacenteranduse it forall our work (thoughsome servicesare notyetavailableinall
regions).ForMontreal let’schoose EastUS. Note thatthisis NOTthe same as East US 2.
 I suggestyoustart a single file inasimple editorlike Notepadandkeepall the links,names,andpasswords/keys
we use in thatcentral locationforthe durationof the labs.
HOL1: Intro to the Azure Portal
The detailedlab file isinthe CloudDataCampdownloadunderdocsoryou can getit here: https://github.com/Azure-
Readiness/CloudDataCamp/blob/master/HOL/HOL1-IntroductionToAzure.md
In Lab 1 we’ll create astorage account andload data withAzCopyand/orCloudXplorer.Thenwe’llcreate aSQL
Database,openthe firewall toourclientmachine,andcreate some SQLtablesforstructureddata. Nextwe’ll generate
some looselystructureddata,simulatinga“thing”or device thatgeneratessmall chunksof data.
Portals
 Productionmanagementportal: https://manage.windowsazure.com/ - loginandchoose subscription
 Previewportal:https://portal.azure.com/ - loginandchoose subscription
Storage Account (creation takes 2-3 minutes)
 In the Previewportal https://portal.azure.com/ (resource groupings are notavailableinthe managementportal)
choose to create a newstorage account. New ->Data + Storage -> Storage.
o Name:Your prefix +storage.Mine is bddragonstorage.
o Pricing:LocallyRedundant. <select>
o Resource Group: New -> Your prefix +rg. Mine is bddragonrg.
o Subscription:use one subscriptionforall steps!
o Location:East US
o Diagnostics:Notconfigured
o Pinto Startboard:Yes
o <Create>
 Still inthe previewportal,addacontainertothe storage account
o Name:data (thisname isrequireddue tothe waythe lab issetup)
o Accesstype:Private
 Clickon Settings ->Keysin the storage account andcopy the name and primarykeyto your Notepadfile.
Ingest data
Either AZCopy
 Opena commandprompt and change directories:
Cd c:ProgramFiles(x86)MicrosoftSDKsAzureAzCopy (withoutthe x86on 32bit OS)
 Use youractual local directory,storage accountname,andstorage account key.
azcopy/Source:"{yourpath}CloudDataCampdata"/Dest:https://[storage account
name].blob.core.windows.net/data/input/DestKey:[storage accountkey] /S
 If you installedCloudXploreryoucanadd the storage account and keyonthe “accounts”buttonthenviewthe
filesthere.
 Note that youcan alsodrag/drop small filesfromyourlocal File ExplorertoCloudXplorer,butAzCopyisbetter
for largerfilesorautomatedprocesses.
Or CloudXplorer
 Addyour storage account
 Choose toadd a “folder”calledinputtothe data container
 Drag the file from{yourpath}CloudDataCampdatatothe input“directory”underthe datacontaineronyour
account
Extra Credit
 Try both AzCopyandCloudXplorer
 Load the data from Bill’stalkyesterdaytoa DIFFERENTFOLDER. Create tablestoreferto them.Querythe
tables. Since Hive pointstodirectoriesandnotto single files,eachtype of datamust be in itsownfolder!
Azure SQL DB
Createa new SQLdatabase
 In the previewportal https://portal.azure.com/ chooseNew ->Data+ Storage -> SQL Database
 Name:cdcasa (thisisunique withinyourserverandishardcodedforthe demo)
 Server:“Create a newserver”
o Name:Your prefix +SQL. Mine is bddragonsql
o ServerAdminLogin:Somethingyouwill remember,putitinyournotepad
o Password:Somethingyouwill remember,putitinyour notepad.If youare goingto use the same
passwordforotherservices,make it10+ characters withupper/lowercase,#,special character.
o Location:same as the rest(East US for Montreal)
o AllowAzure ServicestoAccessServer:Yes,checkthe box!(Veryimportant!)
o OK
 SelectSource:BlankDatabase
 PricingTier:Standard(cheapestisfine forthe demo)
 Optional Configuration: leave atdefaults
 Resource Group:the one we createdabove
 Subscription:the same one we’vebeenusing
 Choose toadd it to the Startboard.
 <Create>(wait3-4 minutes)
Configurethefirewall
 Openthe non-previewmanagementportal https://manage.windowsazure.com/.
 Clickon the SQL Databasesinthe leftpane.
 Highlightcdcasaand thenchoose Serversfromthe uppermenu (notthe database,the server).
 Clickon the serveryoucreatedearlier(bddragonsql ismine) andgoto Configure.
 Where itsays “CurrentClientIPAddress”choose “addtothe allowedIPaddresses”.
 Doublecheckthat“WindowsAzure Services”issettoYes.
 Choose save inthe bottombar.
CreateSQL schemasforASA
 OpenSQL ServerManagementStudio(SSMS).Note thatthiscanoptionallybe done fromVisualStudio2013
withupdate 4 or later.
o ServerType:Database Engine
o ServerName:{yourSQLserver.database.windows.net} Forexample mineis
bddragonsql.database.windows.net.
o Authentication:SQLServerAuthentication(note inthe real worldneverloginwithyoursysadmin
account fordbo activities)
 Login:the one you createdearlier
 Password:the one youcreatedearlier
 Choose the cdcasa database fromthe leftmenu(ObjectExplorer).
 Cntl-Otoopen1_CreateSQLTable.sqlfrom C:{yourdirectory}CloudDataCampscriptsASA
 Verifyyouare inthe cdcasa dataase (there’sadropdownbox overobjectexplorer)
 Hit F5 or the Execute buttontorun it.
 Note:It will be populatedlaterby ASA.
Create Event Hub for Data Ingestion
 Openthe non-previewmanagementportal https://manage.windowsazure.com/
 Clickon Service Bus inthe leftmenu
 Choose New ->AppServices ->Service Bus -> EventHub -> CustomCreate
o EventHub Name:Your prefix +eh.Mine is bddragoneh
o Region:The same one we’ve beenusing
o Namespace:Create anewnamespace
o Namespace Name:Yourprefix +eh+ -ns(itwill defaulttothis)
o Choose nextusingarrowonbottomright
o PartitionCount:8
o Message Retention:2
o Choose the checkmarkto finish
 Configure sharedaccess
o Clickon the newService Busnamespace
o Choose EventHubsfromthe topmenu
o Clickon the EventHub
o Choose Configure fromthe topmenu
o In the “sharedaccess policies” sectionaddapolicy
 Name:mypolicy
 Permissions:send, listen
 Choose Save at the bottom
o Copythe policyname and itsprimarykeyto yourNotepadfile.
Generate Data (Device Sender)
 Opena commandprompt
 Cd {yourdirectory}CloudDataCamptoolsDeviceSender
 Replace youractual valuesinthe belowcommand:
DeviceSenderGenerateDataToEventHub -n<eventHubNamespace>-e <eventHubName>-p<policyName>-k
<policyKey>
 Paste the editedcommandintothe commandpromptandhit entertoexecute it.Youshouldsee aseriesof
“Messagesfiredontothe eventhub!”messagesindicatingdataisbeingsentfromyourmachine toAzure.
 Do NOT close the window. Thisdatawill be usedlater.
HOL9: Azure Stream Analytics
Create Streaming Job
 Openhttp://manage.windowsazure.com
 Clickon New ->Data Services ->StreamAnalytics ->QuickCreate
o JobName:prefix +stream
o Region:(EastUS isn’tavailable yet –use East US 2)
o Regional MonitoringStorage Account:Create new
o NewStorage AccountName:prefix +streammonitor
Configure Streaming Job
Inputs
 Clickon the jobyoujust created,choose Inputsfromthe topribbon,andclick“Add Input”.
 Choose “Data stream”then“EventHub”.
 EventHub Settings:
o InputAlias:MyEventHubStream(mustbe exactlythis)
o Subscription:Current
o Namespace:The one youcreatedinthe EventHub step(prefix + -ns)
o EventHub Name:The one you created
o Policy:mypolicy
o ConsumerGroup:$Default
 Serializationsettings
o Format: JSON
o Encoding:UTF8
Output
 In the streamingjob,choose Outputsfromthe upperribbonand“AddOutput”
 Choose SQLDatabase
 SQL Database Settings
o Outputalias:output
o Subscription:Current
o SQL Database:cdcasa
o ServerName:the one youcreatedearlier,prefix +sql
o Username/Password:The SQLadminaccount youcreated
o Table:AvgReadings
Query
 Choose Queryfromthe upperribbon
 Paste inand thenSAVE:
SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as
WinEndTime, Type = 'Temperature', RoomNumber, Avg(Temperature) as AvgReading,
Count(*) as EventCount
FROM MyEventHubStream
Where Temperature IS NOT NULL
GROUP BY TumblingWindow(minute, 1), RoomNumber, Type
UNION
SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as
WinEndTime, Type = 'Humidity', RoomNumber, Avg(Humidity) as AvgReading, Count(*) as
EventCount
FROM MyEventHubStream
Where Humidity IS NOT NULL
GROUP BY TumblingWindow(minute, 1), RoomNumber, Type
UNION
SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as
WinEndTime, Type = 'Energy', RoomNumber, Avg(Kwh) as AvgReading, Count(*) as
EventCount
FROM MyEventHubStream
Where Kwh IS NOT NULL
GROUP BY TumblingWindow(minute, 1), RoomNumber, Type
UNION
SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as
WinEndTime, Type = 'Light', RoomNumber, Avg(Lumens) as AvgReading, Count(*) as
EventCount
FROM MyEventHubStream
Where Lumens IS NOT NULL
GROUP BY TumblingWindow(minute, 1), RoomNumber, Type
Start Steaming Job
 Clickon Start inthe bottomribbon,choose default(JobStartTime)
 VerifyDeviceSenderisrunning(orrestartit)
View Data in SQL
 Aftera fewminutesyoucanquerythe SQL database fromSSMS and see the data inAvgReadings.
 Stopthe DeviceSenderappif it’sstill running.
You have successfullyingesteddatafroma“thing” (yourlaptop) toAzure!Youpushedthatdata througha query
(streaming) andsentthe aggregatedoutputtoa destinationinthe cloud –Azure SQL Database.
---- Backto SLIDES -----
HOL2: Intro to HDInsight
In lab2 we create a HadoopclusterinAzure usingthe HDInsightservice.Thenwe RDPtothe headnode and see thatit’s
trulyApache opensource HadooprunningonWindows.HDInsightisalsoavailable onLinux butwe are usingWindows
for the lab.
Create an HDInsight Hadoop cluster
 Loginto https://manage.windowsazure.com/
 Choose HDInsight(the elephant)fromthe leftmenu
 Choose New ->Data Services->HDInsight-> CustomCreate
 Page 1 / ClusterDetails
o ClusterName:Yourprefix +hdi
o ClusterType:Hadoop
o OperatingSystem:Windows
o Version:default
 Page 2 / Configure Cluster
o Data Nodes:1
o Region:the same regionyou’ve beenusing,the storage account mustbe inthe same region
o HeadNode Size:defaultA3
o Data Node Size:defaultA3
 Page 3 / Configure ClusterUser
o Name:Your prefix +admin(youcan use the same as the SQL db forthe demobutdon’tdo that in
production)
o Password:(youcan use the same as the SQL db for the demobutdon’tdo that in production)
o Enable the remote desktopforcluster:Yes(youwill generallychoose no)
 RDP User Name:clustername + 1 (don’tdothisinproduction)
 RDP Password:(youcanuse the same as the SQL db for the demo butdon’tdo that in
production)
 ExpiresOn:tomorrow
o Enter the Hive/Oozie Metastore:No(youwillgenerallychoose yesforproduction)
 Page 4 / Storage Account
o Storage Account:Use existingstorage
o AccountName:the storage account we createdearlier
o DefaultContainer:data
o Additional Storage Accounts:0
 Page 5 / ScriptActions
o Clickthe arrow to create the cluster,waitabout15 minutes
Use the Hadoop Distributed File System (HDFS)
 RDP to the headnode
 Get a listingof files
o Hadoopfs –ls /
o Hadoopfs –ls /example/data
---- Backto SLIDES -----
HOL3: HDI Batch Analysis and Power BI
We’ll dosome batch analysisandcreate aggregations.Thenwe willviewthe datainPowerBI.
Hive
 Navigate toCloudDataCampscriptsHiveinyourfile explorer.
 In the Azure managementportal clickonyourHDInsightinstance.ClickonQueryConsole atthe bottomof the
screento opena querywindow.Loginwiththe clustercredentials(notthe RDPcredentials).
 Choose the Hive editor.
Createan External Table DeviceReadings
 OpenCloudDataCampscriptsHive1_CreateDeviceReadings.txt inatexteditorlike Notepad.
Update the location:replace <storage accountname>withthe storage account youcreatedin Handson Lab 1
(remove the brackets).Paste the editedqueryintothe Hive editorandhitSubmittocreate a Hive table.
LOCATION 'wasb://data@<storage account name>.blob.core.windows.net/input';
 Viewthe joboutput – it opensina newwindow.Foracreate schemastatementyouwanttoverifythere are no
errors(the messagesaboutloggingare noterrors).It will show the time taken.
Query thetable
 Copythe belowqueryandrun itfrom the Hive editor:
SELECT deviceId FROM DeviceReadings LIMIT 100;
 Viewthe joboutput.
CreateExternalTables for Averages
Create and populate tablesthatstore aggregates.
 OpenCloudDataCampscriptsHive2_CreateAverageReadingByType.txt.
 Editthe location andrun fromthe Hive editor.
 Repeatchangingthe locationandexecuting the remainingcreate/insertscripts:
 CloudDataCampscriptsHive3_CreateAverageReadingByMinute.txt.
 CloudDataCampscriptsHive4_CreateMaximumReading.txt.
 CloudDataCampscriptsHive5_CreateMinimumReading.txt.
File Browser
The locationof the data wasspecifiedinthe table creationstatementsusinglocation. The browsershowsdataonthe
defaultstorage accountforthe cluster.
 Viewthe original andthe aggregateddatainthe File Browsertabof the console.
 If you have CloudXplorer,viewthe datainCloudXplorer (hitrefresh).
Extra Credit
 Write SELECT statementstovieweachtable’sdataset.
 Write more complex queries.
 Showtables;
 describe formattedAverageReadingByType;
 Connectto Hive fromPowerPivotusingthe MicrosoftHive ODBCdriveranda DSN
AzureML
Connectto HadoopfromAzureML. Note thatthisis notin the CloudDataCamp,the HOL10 inthat seriespointstoa flat
file andhere we use a Hive query.
 From manage.windowsazure.com, clickonAzureMLandchoose tosignin to yourAzureML studio.
 Choose a new blankexperiment.
 Drag a Readerfromthe lefttothe designer.
 Highlightthe Readerandviewthe optionsyouhave forconnecting.
o Data source:Hive Query
o Hive database query:SELECT * FROM AverageReadingByType
o HCatalogserverURI: http://{yourhdicluster}.azurehdinsight.net
o Hadoopuser accountname:your clusteradmin(notrdp) account
o Hadoopuser accountpassword:yourpassword
o Locationof outputdata: Azure
o Azure storage account name:{yourstorage account}
o Azure storage key:{yourkey}
o Azure containername:data
 Choose Save and Run fromthe bottom ribbon
 Whenit completesviewthe resultsdatasetbyrightclickingonthe circle andeithervisualize ordownload
Reference:https://andersspur.wordpress.com/2014/10/10/use-hive-to-read-data-into-azure-ml/
ClusterCleanup
At thispointwe have newdatasetscreatedbasedonaggregatesof ourfirst,static data file.We couldeitherleave the
clusterupand queryit directlyfromtoolslike PowerBIusing Hive ordrop the clusteranddirectlyaccessthe data inthe
flatfiles.We’lluse the latter–flatfiles.Thisemphasizesthatthese are on-demandclusters,youdon’tneedtopayto
keepthemupall the time.
 Drop the HDInsightcluster.
PowerQuery
 Opena newworkbookinExcel 2013. VerifyyouinstalledandenabledPowerQuery.
 Clickon PowerQuery.
 Choose FromAzure -> FromMicrosoft Azure HDInsight.
 Enter the storage account youcreatedearlierandthe keyyousavedinNotepad.
 In Navigatorexpandyourstorage accountand double-clickonthe containernameddatatoopenthe query
editor.
 Findthe “FolderPath”columnon the far rightand choose the dropdownarrow.
 Enter outputinthe search box andyou’ll see the ‘directories’ andfileswe have createdtoday.
 If you chose ok,in “AppliedSteps”onthe far rightclickthe red X nextto“FilteredRows”toremove thisfilter.
 Create a newfiltertoaverageReadingByMinute –thiswill show asingle row (because we hadasmall amountof
data and onlyran the insertonce we onlyhave one file inthatdirectory). Choose ok.
 Scroll back to the leftandin the “Content”columnclickon“Binary” to importthe file.
 Name the columns:DeviceType,ReadingDateTime,RoomNumber,Reading
 Choose “Close &Load” from the upper lefttocreate a new sheetcalledAverageReadingByMinute.
 Save the workbooktoyour desktop.
PowerView
 Go to the workbookcreatedinthe laststep.
 Choose the Inserttabat the top thenchoose PowerView inthe middle of the top.
 It ispopulatedwiththe table fromthe worksheet –youcan see the columnsin“PowerView Fields”onthe right.
 Note that the numericfieldshave asumfigure nexttothem.We don’twantto summarize roomnumber,sogo
to the bottomof the “PowerViewFields”inthe “Fields”sectionandchoose “DoNotSummarize”for
RoomNumber.
 Clickinside the table inthe reportdesignerpane (left).Inthe Designmenuiteminthe ribbontothe rightof
“PowerView”choose “OtherChart”->“Line”.
 In the Filterssectionchoose Chart.
 ExpandDeviceType andputa checknextto energy.
 Editthe title to“EnergyReadingByMinute”.
 Save the workbookandclose it.
You have nowdone distributedprocessingwithHadooponAzure (HDInsight) utilizingthe powerof WASBto accessthat
same data outside of Hadoop.YouthenusedPowerBI to discoverandvisualizethatdata,openingupthe possibilities
for newdata-driveninsights.
Cleanup
 Verifyyouhave droppedyourHDInsightcluster –youare chargedfor itsexistence whetheryouare running
anythingornot.
 Stopthe DeviceSenderappif it’sstill running.
 Drop the otherresourceswe’ve created –theyhave minimal costsif youaren’tactivelyusingthem.
o StreamingJob
o EventHub (underService Bus)
o Service Busnamespace
o Storage
o SQL Azure Database cdcasa (andoptionallythe hostingSQLServer)
o AzureMLExperiments
o Resource Group
 Optionallydelete the Excel workbook.
 Optionallyremove some orall filesandtoolsfromthisworkshop
o CloudDataCampfolderandall files
o CloudXplorer
o AzCopy
o DeviceSender

Contenu connexe

Tendances

Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Cynthia Saracco
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013BertrandDrouvot
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 
Hbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsHbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsJinith Joseph
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Maarten Balliauw
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
 
Taming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jcloudsTaming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jcloudszshoylev
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentationDuyhai Doan
 

Tendances (19)

Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
oracle dba
oracle dbaoracle dba
oracle dba
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Hbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jarsHbase coprocessor with Oozie WF referencing 3rd Party jars
Hbase coprocessor with Oozie WF referencing 3rd Party jars
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
Taming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jcloudsTaming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jclouds
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentation
 

En vedette

Sprint Retrospectives
Sprint RetrospectivesSprint Retrospectives
Sprint RetrospectivesJuan Banda
 
Scrum meetings
Scrum meetingsScrum meetings
Scrum meetingsJuan Banda
 
1 islamduludankini-120819033929-phpapp02
1 islamduludankini-120819033929-phpapp021 islamduludankini-120819033929-phpapp02
1 islamduludankini-120819033929-phpapp02eldi Al-fatih
 
Nonviolent communication role play for pacifying scrum retrospectives
Nonviolent communication role play for pacifying scrum retrospectivesNonviolent communication role play for pacifying scrum retrospectives
Nonviolent communication role play for pacifying scrum retrospectivesJuan Banda
 
Eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
EeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeEeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeldi Al-fatih
 
Presentacion de gene bounds
Presentacion de gene boundsPresentacion de gene bounds
Presentacion de gene boundsJuan Banda
 
Sprint Retrospectives
Sprint Retrospectives Sprint Retrospectives
Sprint Retrospectives Juan Banda
 
Assembling Scrum Teams: A Nonviolent Story
Assembling Scrum Teams: A Nonviolent StoryAssembling Scrum Teams: A Nonviolent Story
Assembling Scrum Teams: A Nonviolent StoryJuan Banda
 
фотосессии героев
фотосессии героевфотосессии героев
фотосессии героевangelokkk
 
Outsourcing Agile Without Losing Agile
Outsourcing Agile Without Losing AgileOutsourcing Agile Without Losing Agile
Outsourcing Agile Without Losing AgileJuan Banda
 
фотосессии героев
фотосессии героевфотосессии героев
фотосессии героевangelokkk
 
Final alam hijau
Final alam hijauFinal alam hijau
Final alam hijauTerence Tan
 
PMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the TitansPMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the TitansJuan Banda
 
PMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the TitansPMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the TitansJuan Banda
 

En vedette (19)

Sprint Retrospectives
Sprint RetrospectivesSprint Retrospectives
Sprint Retrospectives
 
Scrum meetings
Scrum meetingsScrum meetings
Scrum meetings
 
1 islamduludankini-120819033929-phpapp02
1 islamduludankini-120819033929-phpapp021 islamduludankini-120819033929-phpapp02
1 islamduludankini-120819033929-phpapp02
 
Nonviolent communication role play for pacifying scrum retrospectives
Nonviolent communication role play for pacifying scrum retrospectivesNonviolent communication role play for pacifying scrum retrospectives
Nonviolent communication role play for pacifying scrum retrospectives
 
Tugas plkj
Tugas plkjTugas plkj
Tugas plkj
 
Eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
EeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeEeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
 
Tugas plkj
Tugas plkjTugas plkj
Tugas plkj
 
Presentacion de gene bounds
Presentacion de gene boundsPresentacion de gene bounds
Presentacion de gene bounds
 
Sprint Retrospectives
Sprint Retrospectives Sprint Retrospectives
Sprint Retrospectives
 
Assembling Scrum Teams: A Nonviolent Story
Assembling Scrum Teams: A Nonviolent StoryAssembling Scrum Teams: A Nonviolent Story
Assembling Scrum Teams: A Nonviolent Story
 
фотосессии героев
фотосессии героевфотосессии героев
фотосессии героев
 
Outsourcing Agile Without Losing Agile
Outsourcing Agile Without Losing AgileOutsourcing Agile Without Losing Agile
Outsourcing Agile Without Losing Agile
 
фотосессии героев
фотосессии героевфотосессии героев
фотосессии героев
 
Descobreix barcelona
Descobreix barcelonaDescobreix barcelona
Descobreix barcelona
 
Final alam hijau
Final alam hijauFinal alam hijau
Final alam hijau
 
PMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the TitansPMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the Titans
 
PMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the TitansPMPs vs Agile Project Managers: Clash of the Titans
PMPs vs Agile Project Managers: Clash of the Titans
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
 
Scrum basics
Scrum basicsScrum basics
Scrum basics
 

Similaire à Big datademo

R server and spark
R server and sparkR server and spark
R server and sparkBAINIDA
 
Hands-on Lab: Migrating Oracle to PostgreSQL
Hands-on Lab: Migrating Oracle to PostgreSQL Hands-on Lab: Migrating Oracle to PostgreSQL
Hands-on Lab: Migrating Oracle to PostgreSQL Amazon Web Services
 
Install Oracle 12c Golden Gate On Oracle Linux
Install Oracle 12c Golden Gate On Oracle LinuxInstall Oracle 12c Golden Gate On Oracle Linux
Install Oracle 12c Golden Gate On Oracle LinuxArun Sharma
 
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakijavier ramirez
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
PVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agents
PVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agentsPVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agents
PVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agentsAndrey Karpov
 
HDinsight Workshop - Prerequisite Activity
HDinsight Workshop - Prerequisite ActivityHDinsight Workshop - Prerequisite Activity
HDinsight Workshop - Prerequisite ActivityIdan Tohami
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesYevgeniy Brikman
 
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON Byrum
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflakeSivakumar Ramar
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureAnita Luthra
 
BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023Timothy Spann
 
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampBigDataCamp
 
Using Catalogic DPX with Microsoft Azure Cloud
Using Catalogic DPX with Microsoft Azure CloudUsing Catalogic DPX with Microsoft Azure Cloud
Using Catalogic DPX with Microsoft Azure CloudCatalogic Software
 
Microsoft Docker Meetup - Tutum Spring 2015
Microsoft Docker Meetup - Tutum Spring 2015Microsoft Docker Meetup - Tutum Spring 2015
Microsoft Docker Meetup - Tutum Spring 2015luisamariethm
 

Similaire à Big datademo (20)

R server and spark
R server and sparkR server and spark
R server and spark
 
Hands-on Lab: Migrating Oracle to PostgreSQL
Hands-on Lab: Migrating Oracle to PostgreSQL Hands-on Lab: Migrating Oracle to PostgreSQL
Hands-on Lab: Migrating Oracle to PostgreSQL
 
Install Oracle 12c Golden Gate On Oracle Linux
Install Oracle 12c Golden Gate On Oracle LinuxInstall Oracle 12c Golden Gate On Oracle Linux
Install Oracle 12c Golden Gate On Oracle Linux
 
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
PVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agents
PVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agentsPVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agents
PVS-Studio: analyzing pull requests in Azure DevOps using self-hosted agents
 
HDinsight Workshop - Prerequisite Activity
HDinsight Workshop - Prerequisite ActivityHDinsight Workshop - Prerequisite Activity
HDinsight Workshop - Prerequisite Activity
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
 
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
 
Apache
ApacheApache
Apache
 
Apache
ApacheApache
Apache
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
 
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
 
Lampstack (1)
Lampstack (1)Lampstack (1)
Lampstack (1)
 
Using Catalogic DPX with Microsoft Azure Cloud
Using Catalogic DPX with Microsoft Azure CloudUsing Catalogic DPX with Microsoft Azure Cloud
Using Catalogic DPX with Microsoft Azure Cloud
 
Microsoft Docker Meetup - Tutum Spring 2015
Microsoft Docker Meetup - Tutum Spring 2015Microsoft Docker Meetup - Tutum Spring 2015
Microsoft Docker Meetup - Tutum Spring 2015
 

Big datademo

  • 1. Big Data in Azure: Demo and Hands-On Labs Pre-requisites  You will needanAzure subscriptionwithavailableHDInsightcores  PowerBI / Excel 2013 o Downloadthe PowerQueryadd-in,choose 32bitor64bit to match your Office installation http://www.microsoft.com/en-us/download/details.aspx?id=39379&CorrelationId=d8002172-0438- 4ef5-b0fa-e635f8f17251 o Enable PowerPivotandPowerViewinyourExcel options –com add-ins.  DownloadHOLlabs https://github.com/Azure-Readiness/CloudDataCamp.ForApril 30 onlyuse https://github.com/cindygross/CloudDataCamp instead. If youalreadyhave GitHubinstalled,choose to“Clone inDesktop”.Otherwise choose“DownloadZIP” andUNZIP the files.Save the locationtoaNotepadfile.  Data movement–one or both o GUI: Install CloudXplorerhttp://clumsyleaf.com/products/downloads.Iwill be usingv3,youcan downloadthe v3trial or the free v1 (withfewerfeatures). o Cmd line:Install AzCopy http://azure.microsoft.com/en-us/documentation/articles/storage-use- azcopy/.Save the install locationasyouwill needitlater,itwill defaultto(withoutthe x86on 32bit) C:ProgramFiles(x86)MicrosoftSDKsAzureAzCopy.  Install SQL2014 SSMS http://www.microsoft.com/en-gb/download/details.aspx?id=42299  Today’sslides:http://tinyurl.com/lxutdd4 Goal Understandhowto use some of the commonpiecesof anAzure hosted BigData and Analyticssolution.These componentsare oftenpartof an Internet of Thingssolution,whichisacommonBig Data and Analyticsscenario.  At the endof thishands-onlabyouwill have: o Createdan Azure storage accountand container thenloaded datatoit. You will alsouse thisaccountfor storage of data generatedinothersteps. o Create a Hadoop onAzure instance (HDInsight),addedstructure (tables) storedinHCatalog,andqueried the data on the storage account usingHive. o Connectedan AzureMLexperimenttoHive – Hadoopis “justanotherdata source”. o Create and ran an Azure StreamAnalyticsjobthatreadsdata generatedonthe flyfromyourlaptopviaa Service BusEventHub andoutputsaggregateddata to a SQL Azure database. o UsedPowerBI to visualize andpresentthe data. Labs We’re goingto use a modifiedversionof the CloudDataCamphandson labs.Those labshave screenshotsandmore detailedinstructionsthanwhatIhave below,please refertothe original docsif youneedmore detailedsteps. Guidelines  Many nameswithinAzure have tobe globallyunique,tryprefixingserviceswithyourinitialsorcompanyname. Some service namesmustbe all lowercase,it’seasiertomake all nameslowercase. Forthislabprefix all names withthe same identifier. OpenNotepadandtype inthe name of the prefix youwill use.  Let’spicka single datacenteranduse it forall our work (thoughsome servicesare notyetavailableinall regions).ForMontreal let’schoose EastUS. Note thatthisis NOTthe same as East US 2.  I suggestyoustart a single file inasimple editorlike Notepadandkeepall the links,names,andpasswords/keys we use in thatcentral locationforthe durationof the labs.
  • 2. HOL1: Intro to the Azure Portal The detailedlab file isinthe CloudDataCampdownloadunderdocsoryou can getit here: https://github.com/Azure- Readiness/CloudDataCamp/blob/master/HOL/HOL1-IntroductionToAzure.md In Lab 1 we’ll create astorage account andload data withAzCopyand/orCloudXplorer.Thenwe’llcreate aSQL Database,openthe firewall toourclientmachine,andcreate some SQLtablesforstructureddata. Nextwe’ll generate some looselystructureddata,simulatinga“thing”or device thatgeneratessmall chunksof data. Portals  Productionmanagementportal: https://manage.windowsazure.com/ - loginandchoose subscription  Previewportal:https://portal.azure.com/ - loginandchoose subscription Storage Account (creation takes 2-3 minutes)  In the Previewportal https://portal.azure.com/ (resource groupings are notavailableinthe managementportal) choose to create a newstorage account. New ->Data + Storage -> Storage. o Name:Your prefix +storage.Mine is bddragonstorage. o Pricing:LocallyRedundant. <select> o Resource Group: New -> Your prefix +rg. Mine is bddragonrg. o Subscription:use one subscriptionforall steps! o Location:East US o Diagnostics:Notconfigured o Pinto Startboard:Yes o <Create>  Still inthe previewportal,addacontainertothe storage account o Name:data (thisname isrequireddue tothe waythe lab issetup) o Accesstype:Private  Clickon Settings ->Keysin the storage account andcopy the name and primarykeyto your Notepadfile. Ingest data Either AZCopy  Opena commandprompt and change directories: Cd c:ProgramFiles(x86)MicrosoftSDKsAzureAzCopy (withoutthe x86on 32bit OS)  Use youractual local directory,storage accountname,andstorage account key. azcopy/Source:"{yourpath}CloudDataCampdata"/Dest:https://[storage account name].blob.core.windows.net/data/input/DestKey:[storage accountkey] /S  If you installedCloudXploreryoucanadd the storage account and keyonthe “accounts”buttonthenviewthe filesthere.  Note that youcan alsodrag/drop small filesfromyourlocal File ExplorertoCloudXplorer,butAzCopyisbetter for largerfilesorautomatedprocesses. Or CloudXplorer  Addyour storage account  Choose toadd a “folder”calledinputtothe data container  Drag the file from{yourpath}CloudDataCampdatatothe input“directory”underthe datacontaineronyour account Extra Credit  Try both AzCopyandCloudXplorer  Load the data from Bill’stalkyesterdaytoa DIFFERENTFOLDER. Create tablestoreferto them.Querythe tables. Since Hive pointstodirectoriesandnotto single files,eachtype of datamust be in itsownfolder!
  • 3. Azure SQL DB Createa new SQLdatabase  In the previewportal https://portal.azure.com/ chooseNew ->Data+ Storage -> SQL Database  Name:cdcasa (thisisunique withinyourserverandishardcodedforthe demo)  Server:“Create a newserver” o Name:Your prefix +SQL. Mine is bddragonsql o ServerAdminLogin:Somethingyouwill remember,putitinyournotepad o Password:Somethingyouwill remember,putitinyour notepad.If youare goingto use the same passwordforotherservices,make it10+ characters withupper/lowercase,#,special character. o Location:same as the rest(East US for Montreal) o AllowAzure ServicestoAccessServer:Yes,checkthe box!(Veryimportant!) o OK  SelectSource:BlankDatabase  PricingTier:Standard(cheapestisfine forthe demo)  Optional Configuration: leave atdefaults  Resource Group:the one we createdabove  Subscription:the same one we’vebeenusing  Choose toadd it to the Startboard.  <Create>(wait3-4 minutes) Configurethefirewall  Openthe non-previewmanagementportal https://manage.windowsazure.com/.  Clickon the SQL Databasesinthe leftpane.  Highlightcdcasaand thenchoose Serversfromthe uppermenu (notthe database,the server).  Clickon the serveryoucreatedearlier(bddragonsql ismine) andgoto Configure.  Where itsays “CurrentClientIPAddress”choose “addtothe allowedIPaddresses”.  Doublecheckthat“WindowsAzure Services”issettoYes.  Choose save inthe bottombar. CreateSQL schemasforASA  OpenSQL ServerManagementStudio(SSMS).Note thatthiscanoptionallybe done fromVisualStudio2013 withupdate 4 or later. o ServerType:Database Engine o ServerName:{yourSQLserver.database.windows.net} Forexample mineis bddragonsql.database.windows.net. o Authentication:SQLServerAuthentication(note inthe real worldneverloginwithyoursysadmin account fordbo activities)  Login:the one you createdearlier  Password:the one youcreatedearlier  Choose the cdcasa database fromthe leftmenu(ObjectExplorer).  Cntl-Otoopen1_CreateSQLTable.sqlfrom C:{yourdirectory}CloudDataCampscriptsASA  Verifyyouare inthe cdcasa dataase (there’sadropdownbox overobjectexplorer)  Hit F5 or the Execute buttontorun it.  Note:It will be populatedlaterby ASA. Create Event Hub for Data Ingestion  Openthe non-previewmanagementportal https://manage.windowsazure.com/  Clickon Service Bus inthe leftmenu  Choose New ->AppServices ->Service Bus -> EventHub -> CustomCreate
  • 4. o EventHub Name:Your prefix +eh.Mine is bddragoneh o Region:The same one we’ve beenusing o Namespace:Create anewnamespace o Namespace Name:Yourprefix +eh+ -ns(itwill defaulttothis) o Choose nextusingarrowonbottomright o PartitionCount:8 o Message Retention:2 o Choose the checkmarkto finish  Configure sharedaccess o Clickon the newService Busnamespace o Choose EventHubsfromthe topmenu o Clickon the EventHub o Choose Configure fromthe topmenu o In the “sharedaccess policies” sectionaddapolicy  Name:mypolicy  Permissions:send, listen  Choose Save at the bottom o Copythe policyname and itsprimarykeyto yourNotepadfile. Generate Data (Device Sender)  Opena commandprompt  Cd {yourdirectory}CloudDataCamptoolsDeviceSender  Replace youractual valuesinthe belowcommand: DeviceSenderGenerateDataToEventHub -n<eventHubNamespace>-e <eventHubName>-p<policyName>-k <policyKey>  Paste the editedcommandintothe commandpromptandhit entertoexecute it.Youshouldsee aseriesof “Messagesfiredontothe eventhub!”messagesindicatingdataisbeingsentfromyourmachine toAzure.  Do NOT close the window. Thisdatawill be usedlater. HOL9: Azure Stream Analytics Create Streaming Job  Openhttp://manage.windowsazure.com  Clickon New ->Data Services ->StreamAnalytics ->QuickCreate o JobName:prefix +stream o Region:(EastUS isn’tavailable yet –use East US 2) o Regional MonitoringStorage Account:Create new o NewStorage AccountName:prefix +streammonitor Configure Streaming Job Inputs  Clickon the jobyoujust created,choose Inputsfromthe topribbon,andclick“Add Input”.  Choose “Data stream”then“EventHub”.  EventHub Settings: o InputAlias:MyEventHubStream(mustbe exactlythis) o Subscription:Current o Namespace:The one youcreatedinthe EventHub step(prefix + -ns) o EventHub Name:The one you created o Policy:mypolicy o ConsumerGroup:$Default
  • 5.  Serializationsettings o Format: JSON o Encoding:UTF8 Output  In the streamingjob,choose Outputsfromthe upperribbonand“AddOutput”  Choose SQLDatabase  SQL Database Settings o Outputalias:output o Subscription:Current o SQL Database:cdcasa o ServerName:the one youcreatedearlier,prefix +sql o Username/Password:The SQLadminaccount youcreated o Table:AvgReadings Query  Choose Queryfromthe upperribbon  Paste inand thenSAVE: SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as WinEndTime, Type = 'Temperature', RoomNumber, Avg(Temperature) as AvgReading, Count(*) as EventCount FROM MyEventHubStream Where Temperature IS NOT NULL GROUP BY TumblingWindow(minute, 1), RoomNumber, Type UNION SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as WinEndTime, Type = 'Humidity', RoomNumber, Avg(Humidity) as AvgReading, Count(*) as EventCount FROM MyEventHubStream Where Humidity IS NOT NULL GROUP BY TumblingWindow(minute, 1), RoomNumber, Type UNION SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as WinEndTime, Type = 'Energy', RoomNumber, Avg(Kwh) as AvgReading, Count(*) as EventCount FROM MyEventHubStream Where Kwh IS NOT NULL GROUP BY TumblingWindow(minute, 1), RoomNumber, Type UNION SELECT DateAdd(minute,-1,System.TimeStamp) as WinStartTime, system.TimeStamp as WinEndTime, Type = 'Light', RoomNumber, Avg(Lumens) as AvgReading, Count(*) as EventCount FROM MyEventHubStream Where Lumens IS NOT NULL GROUP BY TumblingWindow(minute, 1), RoomNumber, Type Start Steaming Job  Clickon Start inthe bottomribbon,choose default(JobStartTime)  VerifyDeviceSenderisrunning(orrestartit) View Data in SQL  Aftera fewminutesyoucanquerythe SQL database fromSSMS and see the data inAvgReadings.
  • 6.  Stopthe DeviceSenderappif it’sstill running. You have successfullyingesteddatafroma“thing” (yourlaptop) toAzure!Youpushedthatdata througha query (streaming) andsentthe aggregatedoutputtoa destinationinthe cloud –Azure SQL Database. ---- Backto SLIDES ----- HOL2: Intro to HDInsight In lab2 we create a HadoopclusterinAzure usingthe HDInsightservice.Thenwe RDPtothe headnode and see thatit’s trulyApache opensource HadooprunningonWindows.HDInsightisalsoavailable onLinux butwe are usingWindows for the lab. Create an HDInsight Hadoop cluster  Loginto https://manage.windowsazure.com/  Choose HDInsight(the elephant)fromthe leftmenu  Choose New ->Data Services->HDInsight-> CustomCreate  Page 1 / ClusterDetails o ClusterName:Yourprefix +hdi o ClusterType:Hadoop o OperatingSystem:Windows o Version:default  Page 2 / Configure Cluster o Data Nodes:1 o Region:the same regionyou’ve beenusing,the storage account mustbe inthe same region o HeadNode Size:defaultA3 o Data Node Size:defaultA3  Page 3 / Configure ClusterUser o Name:Your prefix +admin(youcan use the same as the SQL db forthe demobutdon’tdo that in production) o Password:(youcan use the same as the SQL db for the demobutdon’tdo that in production) o Enable the remote desktopforcluster:Yes(youwill generallychoose no)  RDP User Name:clustername + 1 (don’tdothisinproduction)  RDP Password:(youcanuse the same as the SQL db for the demo butdon’tdo that in production)  ExpiresOn:tomorrow o Enter the Hive/Oozie Metastore:No(youwillgenerallychoose yesforproduction)  Page 4 / Storage Account o Storage Account:Use existingstorage o AccountName:the storage account we createdearlier o DefaultContainer:data o Additional Storage Accounts:0  Page 5 / ScriptActions o Clickthe arrow to create the cluster,waitabout15 minutes Use the Hadoop Distributed File System (HDFS)  RDP to the headnode  Get a listingof files o Hadoopfs –ls / o Hadoopfs –ls /example/data ---- Backto SLIDES -----
  • 7. HOL3: HDI Batch Analysis and Power BI We’ll dosome batch analysisandcreate aggregations.Thenwe willviewthe datainPowerBI. Hive  Navigate toCloudDataCampscriptsHiveinyourfile explorer.  In the Azure managementportal clickonyourHDInsightinstance.ClickonQueryConsole atthe bottomof the screento opena querywindow.Loginwiththe clustercredentials(notthe RDPcredentials).  Choose the Hive editor. Createan External Table DeviceReadings  OpenCloudDataCampscriptsHive1_CreateDeviceReadings.txt inatexteditorlike Notepad. Update the location:replace <storage accountname>withthe storage account youcreatedin Handson Lab 1 (remove the brackets).Paste the editedqueryintothe Hive editorandhitSubmittocreate a Hive table. LOCATION 'wasb://data@<storage account name>.blob.core.windows.net/input';  Viewthe joboutput – it opensina newwindow.Foracreate schemastatementyouwanttoverifythere are no errors(the messagesaboutloggingare noterrors).It will show the time taken. Query thetable  Copythe belowqueryandrun itfrom the Hive editor: SELECT deviceId FROM DeviceReadings LIMIT 100;  Viewthe joboutput. CreateExternalTables for Averages Create and populate tablesthatstore aggregates.  OpenCloudDataCampscriptsHive2_CreateAverageReadingByType.txt.  Editthe location andrun fromthe Hive editor.  Repeatchangingthe locationandexecuting the remainingcreate/insertscripts:  CloudDataCampscriptsHive3_CreateAverageReadingByMinute.txt.  CloudDataCampscriptsHive4_CreateMaximumReading.txt.  CloudDataCampscriptsHive5_CreateMinimumReading.txt. File Browser The locationof the data wasspecifiedinthe table creationstatementsusinglocation. The browsershowsdataonthe defaultstorage accountforthe cluster.  Viewthe original andthe aggregateddatainthe File Browsertabof the console.  If you have CloudXplorer,viewthe datainCloudXplorer (hitrefresh). Extra Credit  Write SELECT statementstovieweachtable’sdataset.  Write more complex queries.  Showtables;  describe formattedAverageReadingByType;  Connectto Hive fromPowerPivotusingthe MicrosoftHive ODBCdriveranda DSN AzureML Connectto HadoopfromAzureML. Note thatthisis notin the CloudDataCamp,the HOL10 inthat seriespointstoa flat file andhere we use a Hive query.
  • 8.  From manage.windowsazure.com, clickonAzureMLandchoose tosignin to yourAzureML studio.  Choose a new blankexperiment.  Drag a Readerfromthe lefttothe designer.  Highlightthe Readerandviewthe optionsyouhave forconnecting. o Data source:Hive Query o Hive database query:SELECT * FROM AverageReadingByType o HCatalogserverURI: http://{yourhdicluster}.azurehdinsight.net o Hadoopuser accountname:your clusteradmin(notrdp) account o Hadoopuser accountpassword:yourpassword o Locationof outputdata: Azure o Azure storage account name:{yourstorage account} o Azure storage key:{yourkey} o Azure containername:data  Choose Save and Run fromthe bottom ribbon  Whenit completesviewthe resultsdatasetbyrightclickingonthe circle andeithervisualize ordownload Reference:https://andersspur.wordpress.com/2014/10/10/use-hive-to-read-data-into-azure-ml/ ClusterCleanup At thispointwe have newdatasetscreatedbasedonaggregatesof ourfirst,static data file.We couldeitherleave the clusterupand queryit directlyfromtoolslike PowerBIusing Hive ordrop the clusteranddirectlyaccessthe data inthe flatfiles.We’lluse the latter–flatfiles.Thisemphasizesthatthese are on-demandclusters,youdon’tneedtopayto keepthemupall the time.  Drop the HDInsightcluster. PowerQuery  Opena newworkbookinExcel 2013. VerifyyouinstalledandenabledPowerQuery.  Clickon PowerQuery.  Choose FromAzure -> FromMicrosoft Azure HDInsight.  Enter the storage account youcreatedearlierandthe keyyousavedinNotepad.  In Navigatorexpandyourstorage accountand double-clickonthe containernameddatatoopenthe query editor.  Findthe “FolderPath”columnon the far rightand choose the dropdownarrow.  Enter outputinthe search box andyou’ll see the ‘directories’ andfileswe have createdtoday.  If you chose ok,in “AppliedSteps”onthe far rightclickthe red X nextto“FilteredRows”toremove thisfilter.  Create a newfiltertoaverageReadingByMinute –thiswill show asingle row (because we hadasmall amountof data and onlyran the insertonce we onlyhave one file inthatdirectory). Choose ok.  Scroll back to the leftandin the “Content”columnclickon“Binary” to importthe file.  Name the columns:DeviceType,ReadingDateTime,RoomNumber,Reading  Choose “Close &Load” from the upper lefttocreate a new sheetcalledAverageReadingByMinute.  Save the workbooktoyour desktop. PowerView  Go to the workbookcreatedinthe laststep.  Choose the Inserttabat the top thenchoose PowerView inthe middle of the top.  It ispopulatedwiththe table fromthe worksheet –youcan see the columnsin“PowerView Fields”onthe right.  Note that the numericfieldshave asumfigure nexttothem.We don’twantto summarize roomnumber,sogo to the bottomof the “PowerViewFields”inthe “Fields”sectionandchoose “DoNotSummarize”for RoomNumber.
  • 9.  Clickinside the table inthe reportdesignerpane (left).Inthe Designmenuiteminthe ribbontothe rightof “PowerView”choose “OtherChart”->“Line”.  In the Filterssectionchoose Chart.  ExpandDeviceType andputa checknextto energy.  Editthe title to“EnergyReadingByMinute”.  Save the workbookandclose it. You have nowdone distributedprocessingwithHadooponAzure (HDInsight) utilizingthe powerof WASBto accessthat same data outside of Hadoop.YouthenusedPowerBI to discoverandvisualizethatdata,openingupthe possibilities for newdata-driveninsights. Cleanup  Verifyyouhave droppedyourHDInsightcluster –youare chargedfor itsexistence whetheryouare running anythingornot.  Stopthe DeviceSenderappif it’sstill running.  Drop the otherresourceswe’ve created –theyhave minimal costsif youaren’tactivelyusingthem. o StreamingJob o EventHub (underService Bus) o Service Busnamespace o Storage o SQL Azure Database cdcasa (andoptionallythe hostingSQLServer) o AzureMLExperiments o Resource Group  Optionallydelete the Excel workbook.  Optionallyremove some orall filesandtoolsfromthisworkshop o CloudDataCampfolderandall files o CloudXplorer o AzCopy o DeviceSender