hbaseconasia2017: HBase on Beam

•

2 j'aime•1,336 vues

Jingcheng Du Apache Beam is an open source and unified programming model for defining batch and streaming jobs that run on many execution engines, HBase on Beam is a connector that allows Beam to use HBase as a bounded data source and target data store for both batch and streaming data sets. With this connector HBase can work with many batch and streaming engines directly, for example Spark, Flink, Google Cloud Dataflow, etc. In this session, I will introduce Apache Beam, and the current implementation of HBase on Beam and the future plan on this. hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

Technologie

Apache Beam
u Apache Beam is an open source, unified programming model for defining both
batch and streaming data-parallel processing pipelines.
u It was initialized and contributed by Google.
u Published the first stable release on May 17, 2017.

Apache Beam
https://beam.apache.org/images/beam_architecture.png

Apache Beam
u A unified model for batch and streaming applications.
u Runners for famous open-source batch and streaming engines, for instance
Spark and Flink.
u Multi-languages are available for end users to build their own pipelines, now
Java and Python are supported.
u Implement once, run almost everywhere.

Apache Beam
u Pipeline: The processing pipeline which includes data input, transform and
output.
u PCollection: The representation for both bounded and unbounded data
u Transform
u ParDo
u GroupByKey
u Combine
u Flatten
u …

Data Sources
u In-memory data: Array, Collection, Map
u Text
u HDFS
u Kafka
u HBase
u …

Windowing
u Fixed time windows
u Sliding time windows
u Session windows
u Single global window

Serialization
u Every Transform must be serializable!
u CustomCoder
u Register coder for classes
u Register coder for the output of transform
u Serializable

Example: Count the Words
https://beam.apache.org/images/wordcount-pipeline.png

Capability Matrix
https://beam.apache.org/documentation/runners/capability-matrix/

HBase + Beam
u Inspired by HBase + Spark
u Similar functions, Beam SQL is not supported
yet.
u Use HBase as a bounded data source, and a
target data store in both batch and
streaming applications
u Customized Transforms for HBase bulk
operations, and HBasePipelineFunctions as
the entry to start the pipeline.

Operations
u Operations for both batch and streaming manners
u Scan (Already implemented in Beam)
u BulkGet
u BulkPut
u BulkDelete
u MapPartitions
u ForeachPartition
u BulkLoad
u BulkLoadThinRows

Examples: Scan
u Read data from HBase table by scan

Examples: BulkGet
u Implement MakeFunctions to convert input to Get, and convert Result to output

Examples: BulkPut
u Implement MakeFunction to convert input to Put.

Examples: BulkDelete
u Implement MakeFunction to convert input to Delete.

Examples: BulkLoad
u Implement MakeFunction to convert each input into a Cell.

Example: BulkLoadThinRows
u Implement MakeFunctions to convert each input into row keys and cells.

Future
u Contribute the code to Apache Beam
u Support Beam SQL in HBase

Contenu connexe

Tendances

Meetup#2: Building responsive Symbology & Suggest WebService

Minsk MongoDB User Group

Rust & Apache Arrow @ RMS

Andy Grove

January 2011 HUG: Pig Presentation

Yahoo Developer Network

Presto at Twitter

Bill Graham

Meet Hadoop Family: part 4

caizer_x

Presto

Chen Chun

Big data components - Introduction to Flume, Pig and Sqoop

Jeyamariappan Guru

Xinxin Fan and Hongxiang Jiang First, we will give a brief introduction about the HBase service at Netease，include the basic cluster info and the key HBase service. And then we will talk same tips about the tuning practices for HBase. Last, we will introduce some improvements at the internal HBase version. hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: Apache HBase at Netease

HBaseCon

HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions

Michael Stack

Treasure Data and OSS

N Masahiro

HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...

Michael Stack

Treasure Data simplifies event analytics for the complex digital world. Our customers send us 1,000,000 events per second and issue 30,000+ Presto queries everyday to understand their customers better. One of the challenges is designing a cloud database with zero downtime to support a global customer base. We have achieved this goal by developing several open-source technologies; Fluentd and Embulk enable seamless log collection from stream/batch sources, and with MessagePack we can provide an extensible columnar store that accommodates future schema changes. Finally, Presto allows us to serve a wide variety of data processing our customers perform on our service. In this talk, I will present an overview of our system, and how our customers keep using Presto while collecting and extending their data set.

Presto @ Treasure Data - Presto Meetup Boston 2015

Taro L. Saito

JFall 2011 no sql workshop

fvanvollenhoven

Building a Distributed Data Streaming Architecture for Modern Hardware with S...

ScyllaDB

Database Driven OpenCL Programming by Tim Child

Mert Akın

Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...

Data Con LA

In this talk of Hadoop User Group UK meeting, Aaron Kimball from Cloudera introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop's distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines. After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We'll also cover some deeper technical details of Sqoop's architecture, and take a look at some upcoming aspects of Sqoop's development roadmap.

Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK

Skills Matter

Presto Strata Hadoop SJ 2016 short talk

kbajda

Docker for mac & local developer environment optimization

Radek Baczynski

Hoodie: How (And Why) We built an analytical datastore on Spark

Vinoth Chandar

Tendances (20)

Meetup#2: Building responsive Symbology & Suggest WebService

Rust & Apache Arrow @ RMS

January 2011 HUG: Pig Presentation

Presto at Twitter

Meet Hadoop Family: part 4

Presto

Big data components - Introduction to Flume, Pig and Sqoop

hbaseconasia2017: Apache HBase at Netease

HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions

Treasure Data and OSS

HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...

Presto @ Treasure Data - Presto Meetup Boston 2015

JFall 2011 no sql workshop

Building a Distributed Data Streaming Architecture for Modern Hardware with S...

Database Driven OpenCL Programming by Tim Child

Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...

Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK

Presto Strata Hadoop SJ 2016 short talk

Docker for mac & local developer environment optimization

Hoodie: How (And Why) We built an analytical datastore on Spark

Similaire à hbaseconasia2017: HBase on Beam

Unbounded, unordered, global scale datasets are increasingly common in day-to-day business, and consumers of these datasets have detailed requirements for latency, cost, and completeness. Apache Beam defines a new data processing programming model that evolved from more than a decade of experience building Big Data infrastructure within Google, including MapReduce, FlumeJava, Millwheel, and Cloud Dataflow. Apache Beam handles both batch and streaming use cases, offering a powerful, unified model. It neatly separates properties of the data from run-time characteristics, allowing pipelines to be portable across multiple run-time environments, both open source, including Apache Apex, Apache Flink, Apache Gearpump, Apache Spark, and proprietary. Finally, Beam's model enables newer optimizations, like dynamic work rebalancing and autoscaling, resulting in an efficient execution. This talk will cover the basics of Apache Beam, touch on its evolution, and describe main concepts in its powerful programming model. We'll show how Beam unifies batch and streaming use cases, and show efficient execution in real-world scenarios. Finally, we'll demonstrate pipeline portability across Apache Apex, Apache Flink, Apache Spark and Google Cloud Dataflow in a live setting.

Unified, Efficient, and Portable Data Processing with Apache Beam

DataWorks Summit/Hadoop Summit

The world of big data involves an ever-changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the big data ecosystem together; it enables users to "run any data processing pipeline anywhere." This talk will briefly cover the capabilities of the Beam model for data processing and discuss its architecture, including the portability model. We’ll focus on the present state of the community and the current status of the Beam ecosystem. We’ll cover the state of the art in data processing and discuss where Beam is going next, including completion of the portability framework and the Streaming SQL. Finally, we’ll discuss areas of improvement and how anybody can join us on the path of creating the glue that interconnects the big data ecosystem. Speaker Davor Bonaci, Apache Software Foundation; Simbly, V.P. of Apache Beam; Founder/CEO at Operiant

Present and future of unified, portable, and efficient data processing with A...

DataWorks Summit

Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...

Provectus

Present and future of unified, portable and efficient data processing with Ap...

DataWorks Summit

The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam (incubating) aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In this talk, I will: Cover briefly the capabilities of the Beam model for data processing and integration with IOs, as well as the current state of the Beam ecosystem. Discuss the benefits Beam provides regarding portability and ease-of-use. Demo the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Flink on Google Cloud, Apache Spark on AWS, Apache Apex on-premise). Give a glimpse at some of the challenges Beam aims to address in the future.

Realizing the promise of portability with Apache Beam

J On The Beach

Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...

DataWorks Summit/Hadoop Summit

Portable Streaming Pipelines with Apache Beam

confluent

ApacheBeam_Google_Theater_TalendConnect2017.pdf

RAJA RAY

ApacheBeam_Google_Theater_TalendConnect2017.pptx

RAJA RAY

Building Scalable Data Pipelines - 2016 DataPalooza Seattle

Evan Chan

The other Apache Technologies your Big Data solution needs

gagravarr

The other Apache technologies your big data solution needs!

gagravarr

Sequoia Spark Talk March 2015.pdf

totomeme1991

Apache Beam is a top-level Apache project which aims at providing a unified API for efficient and portable data processing pipeline. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, Apache Apex, ...) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, describe the main concepts of the programming model and talk about the current state of the project (new python support, first stable version). We'll illustrate the concepts with a use case running on several runners.

Portable batch and streaming pipelines with Apache Beam (Big Data Application...

Malo Denielou

Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...

Frank Munz

Abstract:- Tracking user events as they happen can challenge anyone providing real time user interaction. It can demand both huge scale and a lot of processing to support dynamic adjustment to targeting products and services. As the operational data store Couchbase data services are capable of processing tens of millions of updates a day. Streaming through systems such as Apache Spark and Kafka into Hadoop, information about these key events can be turned into deeper knowledge. We will review Lambda architectures deployed at sites like PayPal, Live Person and LinkedIn that leverage a Couchbase Data Pipeline. Bio:- Justin Michaels. With over 20 years experience in deploying mission critical systems, Justin Michaels industry experience covers capacity planning, architecture and industry vertical experience. Justin brings his passion for architecting, implementing and improving Couchbase to the community as a Solution Architect. His expertise involves both conventional application platforms as well as distributed data management systems. He regularly engages with existing and new Couchbase customers in performance reviews, architecture planning and best practice guidance.

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...

Data Con LA

Data streaming

Alberto Paro

Node.js in SAP HANA SPS11

Jan Penninkhof

Building data pipelines

Jonathan Holloway

Kafka & Couchbase Integration Patterns

Manuel Hurtado

Similaire à hbaseconasia2017: HBase on Beam (20)

Unified, Efficient, and Portable Data Processing with Apache Beam

Present and future of unified, portable, and efficient data processing with A...

Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...

Present and future of unified, portable and efficient data processing with Ap...

Realizing the promise of portability with Apache Beam

Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...

Portable Streaming Pipelines with Apache Beam

ApacheBeam_Google_Theater_TalendConnect2017.pdf

ApacheBeam_Google_Theater_TalendConnect2017.pptx

Building Scalable Data Pipelines - 2016 DataPalooza Seattle

The other Apache Technologies your Big Data solution needs

The other Apache technologies your big data solution needs!

Sequoia Spark Talk March 2015.pdf

Portable batch and streaming pipelines with Apache Beam (Big Data Application...

Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...

Data streaming

Node.js in SAP HANA SPS11

Building data pipelines

Kafka & Couchbase Integration Patterns

Plus de HBaseCon

Zhiyong Bai As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapid，and provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment. hbaseconasia2017 hbasecon hbase

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes

HBaseCon

Ashish Singhi HBase Disaster recovery solution aims to maintain high availability of HBase service in case of disaster of one HBase cluster with very minimal user intervention. This session will introduce the HBase disaster recovery use cases and the various solutions adopted at Huawei like. a) Cluster Read-Write mode b) DDL operations synchronization with standby cluster c) Mutation and bulk loaded data replication d) Further challenges and pending work hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: HBase Disaster Recovery Solution at Huawei

HBaseCon

Tianying Chang HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency. hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest

HBaseCon

Mon-Fong Mike Jiang, Kuan-Yu Hubert Fan-Chiang and Tienyu Rebecca Lin 自2011年起，我們就開始使用HBase作為結構化大數據的儲存工具，主要是做為半導體製造設備參數的分析。為了有效進行數據查詢，我們開發Standard Query Language(SQL)的整合介面，最早的方式是(1)自行開發GUI操作介面及(2)透過自行定義SQL語法的方式進行，但是這樣會衍生出很多額外的工作，特別是SQL Parser與對應的HBase API的連結。為了解決此問題，我們解析了Hive QL Parser作為主要的核心，將此部分的原始碼整合進HareDB HBase Client之中，另外，也整合了HBase Coprocessor，可以加速查詢的進行，這個架構我們實際使用在數個半導體製造廠的大數據系統中，也展現了高查詢效率。除此之外，透過整合Kafka來處理串流數據的匯入，同時對於數據分析的呈現也加上Cube建立工具，這些都是實際開發大數據系統時陸續面對的問題與解決方法，我們將分享這一連串的系統開發過程。 hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: HareQL：快速HBase查詢工具的發展過程

HBaseCon

Qianxi Zhang 1. Hulu是美国最受欢迎的在线视频网站之一，Hulu Beijing是Hulu第二大研发中心。北京大数据基础架构团队负责整个公司的大数据基础架构的研发和运维。 2. HBase在Hulu的概况 3. HBase在Hulu的使用 4. 用户画像系统，存放所有用户的基本信息，用户行为，第三方DMP数据和机器学习结果标签(几十万个Qualifier)，Spark和Spark Streaming读写HBase数据，运行各种机器学习模型，为公司的视频推荐，精准广告和Marketing团队服务 5. HBase在Hulu的优化 hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: HBase在Hulu的使用和实践

HBaseCon

Xinyu Zhang, Xueliang Chen and Zheng Fan 基于HBase的大数据平台已经成为中国人寿新一代综合业务处理系统中非常重要的基础性数据平台。目前基于该平台已经整合了上百TB的数据，并将几亿客户的客户、业务、接触数据整合到一个统一的数据模型中，并基于此形成了上千个客户标签。同时，基于该平台为客户、营销员和内部管理人员提供了销售支持、客户服务、运营支持等多类应用。通过APP、网页等形式提供了多种信息的检索和查询，并通过深度学习模型提供了反欺诈等方面的数据应用。 hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: 基于HBase的企业级大数据平台

HBaseCon

Xingbo Peng, Nan Zhang and Bang Wen 1.规模现状 HBase在京东CTO体系中经历了数年的发展，集群规模已经达到3000+台，支持了京东600+业务系统，京东CTO体系的HBase集群，已经经历了多次618和双11的考验。京东CTO体系是HBase的重要用户。 2.应用的业务场景介绍HBase在京东的典型应用的业务，包括监控、风控、推荐、广告等 3.高可用改进介绍我们在HBase集群高可用方面做的一些工作，包括跨机房容灾、多租户-资源分组、集群安全等 4.运维实践主要介绍我们在HBase集群运维上的一些实践，包括：HBase集群监控系统Mummut、报警系统、HBase集群与大数据平台结合、业务运营及数据迁移等 5.未来展望介绍我们正在基于HBase做的及未来要做的一些工作，包括：kylin、phoenix和容器化部署等 hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: HBase at JD.com

HBaseCon

Shuaifeng Zhou When we do real-time data loading to HBase, we use put/putlist interface. After receiving put request, regionserver will write WAL, write data into memory store, flush memory store to disk-store, then compact files again and again. That precedure occupies too much resource and causing read/write performance decrease. To solve the problem, we provide a kind of near-line loading method and architecture, greatly increase the loading bandwidth, and decrease the influence to read operations. hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: Large scale data near-line loading method and architecture

HBaseCon

Jieshan Bi and Yanhui Zhong 1. CTBase: A light-weight HBase client for structured data. 1). Schematized table, more friendly for structured data storage. 2). Global secondary index for HBase. 3). HBase Query DSL. JSON based light-weight API. 4) Cluster table. Pre-joining with keys, a better solution for cross-table join queries from HBase. 2. Tagram: Distributed bitmap index implementation with HBase. 1). Distributed bitmap index for accelerating AD-HOC queries with low cardinality columns. 2). Powerful and flexible query API. 3). Tagram offers millisecond-level query latency. 3. CloudTable Service Introduction: HBase on Huawei cloud. hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei

HBaseCon

hbaseconasia2017: HBase Practice At XiaoMi

HBaseCon

hbaseconasia2017: hbase-2.0.0

HBaseCon

As HBase and Hadoop continue to become routine across enterprises, these enterprises inevitably shift priorities from effective deployments to cost-efficient operations. Consolidation of infrastructure, the sum of hardware, software, and system-administrator effort, is the most common strategy to reduce costs. As a company grows, the number of business organizations, development teams, and individuals accessing HBase grows commensurately, creating a not-so-simple requirement: HBase must effectively service many users, each with a variety of use-cases. This is problem is known as multi-tenancy. While multi-tenancy isn’t a new problem, it also isn’t a solved one, in HBase or otherwise. This talk will present a high-level view of the common issues organizations face when multiple users and teams share a single HBase instance and how certain HBase features were designed specifically to mitigate the issues created by the sharing of finite resources.

HBaseCon2017 Democratizing HBase

HBaseCon

HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.

HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest

HBaseCon

Hundreds of millions of people use Quora to find accurate, informative, and trustworthy answers to their questions. As it so happens, counting things at scale is both an important and a difficult problem to solve. In this talk, we will be talking about Quanta, Quora's counting system built on top of HBase that powers our high-volume near-realtime analytics that serves many applications like ads, content views, and many dashboards. In addition to regular counting, Quanta supports count propagation along the edges of an arbitrary DAG. HBase is the underlying data store for both the counting data and the graph data. We will describe the high-level architecture of Quanta and share our design goals, constraints, and choices that enabled us to build Quanta very quickly on top of our existing infrastructure systems.

HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase

HBaseCon

In the age of NoSQL, big data storage engines such as HBase have given up ACID semantics of traditional relational databases, in exchange for high scalability and availability. However, it turns out that in practice, many applications require consistency guarantees to protect data from concurrent modification in a massively parallel environment. In the past few years, several transaction engines have been proposed as add-ons to HBase; three different engines, namely Omid, Tephra, and Trafodion were open-sourced in Apache alone. In this talk, we will introduce and compare the different approaches from various perspectives including scalability, efficiency, operability and portability, and make recommendations pertaining to different use cases.

HBaseCon2017 Transactions in HBase

HBaseCon

HBaseCon2017 Highly-Available HBase

HBaseCon

In DiDi Chuxing Company, which is China’s most popular ride-sharing company. we use HBase to serve when we have a bigdata problem. We run three clusters which serve different business needs. We backported the Region Grouping feature back to our internal HBase version so we could isolate the different use cases. We built the Didi HBase Service platform which is popular amongst engineers at our company. It includes a workflow and project management function as well as a user monitoring view. Internally we recommend users use Phoenix to simplify access.even more,we used row timestamp;multidimensional table schema to slove muti dimension query problems C++, Go, Python, and PHP clients get to HBase via thrift2 proxies and QueryServer. We run many important buisness applications out of our HBase cluster such as ETA/GPS/History Order/API metrics monitoring/ and Traffic in the Cloud. If you are interested in any aspects listed above, please come to our talk. We would like to share our experiences with you.

HBaseCon2017 Apache HBase at Didi

HBaseCon

Infrastructure failures are a given in the cloud, but in a multi-tenant environment separating those failures from usage can be a challenge. I'll be presenting data gathered from over a hundred region server failures at HubSpot along with what we've done to improve our MTTR and what we're contributing back to the community. Covered topics will include separating usage-related failures from infrastructure and hardware failures, as well as steps we've taken to improve MTTR in both scenarios.

HBaseCon2017 Improving HBase availability in a multi tenant environment

HBaseCon

Both Spark and HBase are widely used, but how to use them together with high performance and simplicity is a very hard topic. Spark HBase Connector(SHC) provides feature rich and efficient access to HBase through Spark SQL. It bridges the gap between the simple HBase key value store and complex relational SQL queries and enables users to perform complex data analytics on top of HBase using Spark. SHC implements the standard Spark data source APIs, and leverages the Spark catalyst engine for query optimization. To achieve high performance, SHC constructs the RDD from scratch instead of using the standard HadoopRDD. With the customized RDD, all critical techniques can be applied and fully implemented, such as partition pruning, column pruning, predicate pushdown and data locality. The design makes the maintenance very easy, while achieving a good tradeoff between performance and simplicity. Also, SHC has supported Phoenix data as input to HBase in addition to Avro data. Defaulting to a simple native binary encoding seems susceptible to future changes and is a risk for users who write data from SHC into HBase. For example, with SHC going forward, backwards compatibility needs to be properly handled. So the default, SHC needs to support a more standard and well tested format like Phoenix. In this talk, we will demo how SHC works, how to use SHC in secure/non-secure clusters, how SHC works with multi-HBase clusters, etc. This talk will also benefit people who use Spark and other data sources (besides HBase) as it inspires them with ideas of how to support high performance data source access at the Spark DataFrame level.

HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...

HBaseCon

Our team is responsible for storage at Xiaomi and we provide storage services for dozens of businesses, such as personal cloud storage for smart phones and user profile data. So we will share some practices and improvements of HBase at Xiaomi： 1: We upgraded most of our cluster from 0.94 to 0.98 in the last year and will share some experience about upgrading. 2: We encountered some problems and made some improvements on replication. 3: We fixed or still fixing some confusing behavior from client side. 4: We introduced some improvements on scan to make users easy to use and reduce the time of RPC requests. 5: We implement an asynchronous hbase client which is an important feature for HBase 2.0.

HBaseCon2017 HBase at Xiaomi

HBaseCon

Plus de HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes

hbaseconasia2017: HBase Disaster Recovery Solution at Huawei

hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest

hbaseconasia2017: HareQL：快速HBase查詢工具的發展過程

hbaseconasia2017: HBase在Hulu的使用和实践

hbaseconasia2017: 基于HBase的企业级大数据平台

hbaseconasia2017: HBase at JD.com

hbaseconasia2017: Large scale data near-line loading method and architecture

hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei

hbaseconasia2017: HBase Practice At XiaoMi

hbaseconasia2017: hbase-2.0.0

HBaseCon2017 Democratizing HBase

HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest

HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase

HBaseCon2017 Transactions in HBase

HBaseCon2017 Highly-Available HBase

HBaseCon2017 Apache HBase at Didi

HBaseCon2017 Improving HBase availability in a multi tenant environment

HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...

HBaseCon2017 HBase at Xiaomi

Dernier

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

A Beginners Guide to Building a RAG App Using Open Source Milvus

Zilliz

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

DBX First Quarter 2024 Investor Presentation

Dropbox

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

Real Time Object Detection Using Open CV

Khem

Scalable LLM APIs for AI and Generative AI Application Development Ettikan Karuppiah, Director/Technologist - NVIDIA Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

apidays

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Navi Mumbai Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Navi Mumbai Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Navi Mumbai Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

A Beginners Guide to Building a RAG App Using Open Source Milvus

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

DBX First Quarter 2024 Investor Presentation

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Ransomware_Q4_2023. The report. [EN].pdf

How to Troubleshoot Apps for the Modern Connected Worker

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Strategies for Landing an Oracle DBA Job as a Fresher

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Corporate and higher education May webinar.pptx

Real Time Object Detection Using Open CV

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Powerful Google developer tools for immediate impact! (2023-24 C)

Exploring the Future Potential of AI-Enabled Smartphone Processors

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

MS Copilot expands with MS Graph connectors

hbaseconasia2017: HBase on Beam

1. HBase on Beam

2. Apache Beam u Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines. u It was initialized and contributed by Google. u Published the first stable release on May 17, 2017.

3. Apache Beam https://beam.apache.org/images/beam_architecture.png

4. Apache Beam u A unified model for batch and streaming applications. u Runners for famous open-source batch and streaming engines, for instance Spark and Flink. u Multi-languages are available for end users to build their own pipelines, now Java and Python are supported. u Implement once, run almost everywhere.

5. Apache Beam u Pipeline: The processing pipeline which includes data input, transform and output. u PCollection: The representation for both bounded and unbounded data u Transform u ParDo u GroupByKey u Combine u Flatten u …

6. Data Sources u In-memory data: Array, Collection, Map u Text u HDFS u Kafka u HBase u …

7. Windowing u Fixed time windows u Sliding time windows u Session windows u Single global window

8. Serialization u Every Transform must be serializable! u CustomCoder u Register coder for classes u Register coder for the output of transform u Serializable

9. Example: Count the Words https://beam.apache.org/images/wordcount-pipeline.png

10. Examples: Count the Words

11. Capability Matrix https://beam.apache.org/documentation/runners/capability-matrix/

12. HBase + Beam u Inspired by HBase + Spark u Similar functions, Beam SQL is not supported yet. u Use HBase as a bounded data source, and a target data store in both batch and streaming applications u Customized Transforms for HBase bulk operations, and HBasePipelineFunctions as the entry to start the pipeline.

13. Operations u Operations for both batch and streaming manners u Scan (Already implemented in Beam) u BulkGet u BulkPut u BulkDelete u MapPartitions u ForeachPartition u BulkLoad u BulkLoadThinRows

14. Examples: Scan u Read data from HBase table by scan

15. Examples: BulkGet u Implement MakeFunctions to convert input to Get, and convert Result to output

16. Examples: BulkPut u Implement MakeFunction to convert input to Put.

17. Examples: BulkDelete u Implement MakeFunction to convert input to Delete.

18. Examples: MapPartitions

19. Examples: MapPartitions

20. Examples: ForeachPartition

21. Examples: BulkLoad u Implement MakeFunction to convert each input into a Cell.

22. Examples: BulkLoad

23. Example: BulkLoadThinRows u Implement MakeFunctions to convert each input into row keys and cells.

24. Example: BulkLoadThinRows

25. Future u Contribute the code to Apache Beam u Support Beam SQL in HBase