SlideShare une entreprise Scribd logo
1  sur  22
Hadoop in a Windows Shop


            Abuna Demoz – Abuna@AdGooroo.com
            Brad Vah – Bvah@AdGooroo.com
            Mike Schiro – Mschiro@AdGooroo.com
            Twitter: @AdGooroo @abuna
Who Is AdGooroo?
• Founded in 2004
• We are the largest provider of Search Intelligence in the world
• Our customers include:
   –   Agencies
   –   CMOs
   –   Marketing Managers
   –   Digital Ad Sales
   –   Over 4,000 users
• Global Scale
   – 50 Countries
   – 14 Search Engines
   – 14 Ad Networks
AdGooroo Insight Suite™




©2012 AdGooroo, LLC. All Rights Reserved.
Paid Search




Natural Search
Why we deployed Hadoop
Hadoop Administration
Learning Curve

• Where is Hadoop going to fit?
• How do we leverage existing tools?
• Linux can be less forgiving
  – rm –rf /*
• Who names these things?
Integration Points

• Active Directory != LDAP
• Create a seamless user experience
• Domjoin in 30 simple steps
  – Tip: It’s usually safe to blame Kerberos
Integration Points – Data Transfer

• SMB works…mostly
  – Flaky connectivity
  – Relatively slow transfer for GigE
• NFS
  – Client Services for NFS
  – Much faster transfer speeds
Integration Points – Data Transfer

• MountableHDFS/HDFS_Fuse
  – Fuse -> NFS -> Windows
    • We tried it. You should not.
  – SCP (Windows) -> NFS -> Fuse
    • Messy, but it works.
    • Don’t often need to use it
Monitoring and Management

• Operations Manager (MOM/SCOM)
  – Native Linux monitoring
  – Custom Management packs for Hadoop
• Opalis
  – Workflow automation
• Configuration Manager (SCCM)
  – Quest Management Xtensions for *nix
Final Thoughts

• Hadoop and Windows can live together.
• Microsoft is starting to figure out this
  whole “open-source” thing.
  – MSSQL connectors for Hadoop
  – ODBC driver for Hive
  – Interop initiatives
• When in doubt; blame Kerberos.
• Roll your own repo.
Hadoop Development
Environments

• Windows
  – Visual Studio, SQL Server, etc
  – Physical workstations
• Linux
  – Getting reacquainted with an old friend
  – New suite of tools
  – Cloudera VM
     • RAMRAMRAMRAMRAMRAMRAMRAMRAM
Languages

• Java
  – Straightforward transition from the .NET world
  – Hmm…How do I create that JAR again?
• Python/Bash
  – Utilized a lot more than expected
• HiveQL
  – Simple transition from SQL
  – Custom UDFs
Unexpected Roadblocks - AVRO

• Assumption:
  – Works with .NET
     • Can serialize files to be read by Java Map/Reduce

• Reality:
  – .NET compatibility not fully baked
     • Any files written in .NET could not be read in Java.
        – C# side is not reading nor writing the header
        – JIRA: AVRO-823
Unexpected Roadblocks – Flume
• Assumption:
  – We’ll use Flume for Windows

• Reality:
  – Overkill for our needs
  – Implementation woes

• Solution:
  – Custom log collector service
  – Converts data to AVRO file
Unexpected Roadblocks – Thrift
• Assumption:
  – We’ll use Thrift to talk to HBase from .NET

• Reality:
  – HBase.thrift does not support C# yet

• Solution:
  – Convert Thrift Java code-gen to .NET
     • Some community work already done here
       (https://bitbucket.org/vadim/hbase-sharp)
As Advertised - Sqoop
• Simple
• Fast route to POC
  – Imports
  – Exports
• Minor “gotchas”
  – Delimiters
  – Large exports to SQL Server
     • Use “--batch” mode
As Advertised - Hive

• Very similar to SQL
• “Quick” data analysis
  – Results without crippling your existing RDBMS
• HBase storage handler
  – provides easy point of entry to data and data
    manipulation
Final Thoughts
• Don’t overthink it!
  – Just because you can doesn’t mean you should

• Modularity
  – Easy to be overwhelmed by all the moving parts
  – Flatten the learning curve by taking it one piece at
    a time
We’re Hiring


jobs@adgooroo.com

abuna@adgooroo.com
bvah@adgooroo.com
mschiro@adgooroo.com

Contenu connexe

Tendances

Cache all the things #DCLondon
Cache all the things #DCLondonCache all the things #DCLondon
Cache all the things #DCLondondigital006
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server sideHoward Marks
 
Hong Kong Drupal User Group - Sep 13th
Hong Kong Drupal User Group - Sep 13thHong Kong Drupal User Group - Sep 13th
Hong Kong Drupal User Group - Sep 13thWong Hoi Sing Edison
 
Drupal 7 performance and optimization
Drupal 7 performance and optimizationDrupal 7 performance and optimization
Drupal 7 performance and optimizationShafqat Hussain
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scalesOren Eini
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRicard Clau
 
Redis everywhere - PHP London
Redis everywhere - PHP LondonRedis everywhere - PHP London
Redis everywhere - PHP LondonRicard Clau
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologiesgagravarr
 
Freebsd, the unknown giant
Freebsd, the unknown giantFreebsd, the unknown giant
Freebsd, the unknown giantGLC Networks
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Ricard Clau
 
ChinaNetCloud - Zabbix Monitoring System Overview
ChinaNetCloud - Zabbix Monitoring System OverviewChinaNetCloud - Zabbix Monitoring System Overview
ChinaNetCloud - Zabbix Monitoring System OverviewChinaNetCloud
 
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen AnhOGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen AnhBuff Nguyen
 
Website performance optimization QA
Website performance optimization QAWebsite performance optimization QA
Website performance optimization QADenis Dudaev
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsAchievers Tech
 

Tendances (18)

Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
Cache all the things #DCLondon
Cache all the things #DCLondonCache all the things #DCLondon
Cache all the things #DCLondon
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Optimize drupal
Optimize drupalOptimize drupal
Optimize drupal
 
Hong Kong Drupal User Group - Sep 13th
Hong Kong Drupal User Group - Sep 13thHong Kong Drupal User Group - Sep 13th
Hong Kong Drupal User Group - Sep 13th
 
Drupal 7 performance and optimization
Drupal 7 performance and optimizationDrupal 7 performance and optimization
Drupal 7 performance and optimization
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHP
 
Redis everywhere - PHP London
Redis everywhere - PHP LondonRedis everywhere - PHP London
Redis everywhere - PHP London
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Freebsd, the unknown giant
Freebsd, the unknown giantFreebsd, the unknown giant
Freebsd, the unknown giant
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
Caching 101 - WordCamp OC
Caching 101 - WordCamp OCCaching 101 - WordCamp OC
Caching 101 - WordCamp OC
 
ChinaNetCloud - Zabbix Monitoring System Overview
ChinaNetCloud - Zabbix Monitoring System OverviewChinaNetCloud - Zabbix Monitoring System Overview
ChinaNetCloud - Zabbix Monitoring System Overview
 
A faster web
A faster webA faster web
A faster web
 
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen AnhOGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
 
Website performance optimization QA
Website performance optimization QAWebsite performance optimization QA
Website performance optimization QA
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 

Similaire à Hadoop in a Windows Shop - CHUG - 20120416

Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudRogue Wave Software
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...NashvilleTechCouncil
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonNeotys
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopKai Sasaki
 
HyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQLHyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQLEvan Volgas
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & howdotCloud
 
Infrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryInfrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryCarlo Bonamico
 

Similaire à Hadoop in a Windows Shop - CHUG - 20120416 (20)

Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
HyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQLHyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQL
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
 
Infrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryInfrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous Delivery
 

Plus de Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Chicago Hadoop Users Group
 

Plus de Chicago Hadoop Users Group (19)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Dernier

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Hadoop in a Windows Shop - CHUG - 20120416

  • 1. Hadoop in a Windows Shop Abuna Demoz – Abuna@AdGooroo.com Brad Vah – Bvah@AdGooroo.com Mike Schiro – Mschiro@AdGooroo.com Twitter: @AdGooroo @abuna
  • 2. Who Is AdGooroo? • Founded in 2004 • We are the largest provider of Search Intelligence in the world • Our customers include: – Agencies – CMOs – Marketing Managers – Digital Ad Sales – Over 4,000 users • Global Scale – 50 Countries – 14 Search Engines – 14 Ad Networks
  • 3. AdGooroo Insight Suite™ ©2012 AdGooroo, LLC. All Rights Reserved.
  • 7. Learning Curve • Where is Hadoop going to fit? • How do we leverage existing tools? • Linux can be less forgiving – rm –rf /* • Who names these things?
  • 8. Integration Points • Active Directory != LDAP • Create a seamless user experience • Domjoin in 30 simple steps – Tip: It’s usually safe to blame Kerberos
  • 9. Integration Points – Data Transfer • SMB works…mostly – Flaky connectivity – Relatively slow transfer for GigE • NFS – Client Services for NFS – Much faster transfer speeds
  • 10. Integration Points – Data Transfer • MountableHDFS/HDFS_Fuse – Fuse -> NFS -> Windows • We tried it. You should not. – SCP (Windows) -> NFS -> Fuse • Messy, but it works. • Don’t often need to use it
  • 11. Monitoring and Management • Operations Manager (MOM/SCOM) – Native Linux monitoring – Custom Management packs for Hadoop • Opalis – Workflow automation • Configuration Manager (SCCM) – Quest Management Xtensions for *nix
  • 12. Final Thoughts • Hadoop and Windows can live together. • Microsoft is starting to figure out this whole “open-source” thing. – MSSQL connectors for Hadoop – ODBC driver for Hive – Interop initiatives • When in doubt; blame Kerberos. • Roll your own repo.
  • 14. Environments • Windows – Visual Studio, SQL Server, etc – Physical workstations • Linux – Getting reacquainted with an old friend – New suite of tools – Cloudera VM • RAMRAMRAMRAMRAMRAMRAMRAMRAM
  • 15. Languages • Java – Straightforward transition from the .NET world – Hmm…How do I create that JAR again? • Python/Bash – Utilized a lot more than expected • HiveQL – Simple transition from SQL – Custom UDFs
  • 16. Unexpected Roadblocks - AVRO • Assumption: – Works with .NET • Can serialize files to be read by Java Map/Reduce • Reality: – .NET compatibility not fully baked • Any files written in .NET could not be read in Java. – C# side is not reading nor writing the header – JIRA: AVRO-823
  • 17. Unexpected Roadblocks – Flume • Assumption: – We’ll use Flume for Windows • Reality: – Overkill for our needs – Implementation woes • Solution: – Custom log collector service – Converts data to AVRO file
  • 18. Unexpected Roadblocks – Thrift • Assumption: – We’ll use Thrift to talk to HBase from .NET • Reality: – HBase.thrift does not support C# yet • Solution: – Convert Thrift Java code-gen to .NET • Some community work already done here (https://bitbucket.org/vadim/hbase-sharp)
  • 19. As Advertised - Sqoop • Simple • Fast route to POC – Imports – Exports • Minor “gotchas” – Delimiters – Large exports to SQL Server • Use “--batch” mode
  • 20. As Advertised - Hive • Very similar to SQL • “Quick” data analysis – Results without crippling your existing RDBMS • HBase storage handler – provides easy point of entry to data and data manipulation
  • 21. Final Thoughts • Don’t overthink it! – Just because you can doesn’t mean you should • Modularity – Easy to be overwhelmed by all the moving parts – Flatten the learning curve by taking it one piece at a time

Notes de l'éditeur

  1. Insight Suite all