Operationalizing Big Data as a Service

Ravi Kishore Valeti
Lead Member of Technical Staff
rvaleti@salesforce.com
Operationalizing Big Data as a
Service

Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any
of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking
statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or
service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for
future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer
contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible
mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our
employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com
products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of
salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most
recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information
section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not
be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available.
Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

What is a Service?
Big-data As a Service (BDaaS)
Operational Challenges
Operational Excellence
Agenda

Security + HA + DR + Monitoring
Service

(Security + Multi-tenancy + HA + DR + Monitoring) at scale
Big-data As a Service (BDaaS)

Security using Kerberos
● Third party authentication service
● Provides both authentication and authorization
● Authenticates User to Application and Application to
Application
● Each Cluster should be configured with multiple KDC
servers in Master/Slave Mode for HA
BDaaS = SECURITY + Multi-tenancy + HA + DR + MONITORING at Scale

Multi-Tenancy
● Security
Authentication
Authorization
Auditing
Data Protection - Data at Rest
Encryption
● Resource Management
User Quotas
SLAs
● Role Based Access Control
Noisy neighbors: multi-tenant
Hadoop clusters

HA - High Availability
● HA for all Services in the stack
Namenode, RM, JHS
HBase
Hive
Hue
Spark Master
● Fault Tolerance & Mean time to Recover
● Multi-Rack architecture & Services are
Rack aware
● Continuous Synth tests
● Rolling restarts whenever possible

DR - Disaster Recovery
● Namenode Metadata Backups
● Namenode Snapshots
● Hive Metadata Backups
● HBase Backups
Configure Replication to a Buddy Cluster
Daily/Continuous Backups using
Snapshots/WAL
● Switch to DR site when ready

DR - Disaster Recovery
● All the last known check-pointed data is
available in the DR site
● For HBase, make sure HBase
Replication queue is empty - Everything
is replicated to DR
● Make sure Data checksums (fsck) &
Synth tests pass
● Enable traffic to DR site
Site Switching Checklist

Monitoring & Alerting
● Centralized Visualization & Alerting
● Monitor User Quotas
● Monitor Resource Utilizations - Memory/CPU
● Should be a mix of Logs & Metrics
● Should be extensible to on-board new added service monitoring
needs
● Ability to quickly incorporate new rules to alert on newly observed
issues

Monitoring & Alerting
● Resource Utilizations by jobs & trends
● Job Waiting times, run times & amount of data processed
● Unique users per day (or week or month)
● Daily queries (HBase)
● Daily read bytes
● Daily written bytes, etc.
Monitoring success metrics include but not limited to:

Operational Challenges
● Zero down time
● Mean time to recover from failures
● Optimum utilization of resources
● Capacity Planning
● On-Demand capacity adds/removals

Operational Excellence - Shipping bits
● Maintain “Light” forks for the key services that you run
● Choose an appropriate packaging model - Bigtop
● Make sure your production services are as close possible as to
stable versions in open source

Operational Excellence - Shipping bits
● Continuous Integration & Deployment pipeline!
● Almost Zero DownTime* - Rack by Rack Rolling Upgrades
Block placement policy - All replicas on different racks** can mitigate the
risk of Service disruptions during Rack by Rack Rolling upgrades
● Auto-Restart bots#
* - except some planned major upgrades where downtime cannot be avoided!
# - Caution! May cause more damage than healing if not configured properly
** Faster network links are usually preferred to make sure service SLAs are not breached due to this special block
placement policy. Extensive performance testing might be required.

Operational Excellence - Tuning
● Always keep an eye on the user resource requirements vs reality
Update User Quotas/resource configurations based on actual usage
● Automated Daily reports on important events/metrics
● Dynamic Thresholds for Alerting & continuous tuning to make the
alerts meaningful & non-noisy.
● Performance testing & configuration tuning of all services
● Choosing the right GC settings

Operationalizing Big Data as a Service

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (10)

Similaire à Operationalizing Big Data as a Service

Similaire à Operationalizing Big Data as a Service (20)

Plus de Salesforce Engineering

Plus de Salesforce Engineering (17)

Dernier

Dernier (20)

Operationalizing Big Data as a Service

Notes de l'éditeur