Tom Harrison, Product and Delivery Manager at APNIC presents at the Registration Protocols Extensions working group during IETF 119 in Brisbane, Australia from 16-22 March 2024
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Databricks
Aggregation based features account for a quarter of the several 1000s features used by the ML-based decisioning system by the Risk team at Uber. We observed several repetitive, cumbersome steps needed for onboarding a feature, every single time. Therefore, to accelerate developer velocity, and to enable Feature Engineering at scale, we decided to develop a generic spark based infrastructure to simplify the process to no more than a simple spec file, containing a parameterized query, along with some metadata on where the feature should be aggregated and stored.
In the presentation, we will describe the architecture of the final solution, highlighting some of the advanced capabilities like backfill support and self-healing for correctness. We will showcase how, using data stored in Hive and using Spark, we developed a highly scalable solution to carry out feature aggregation in an incremental way. By dividing data aggregation responsibility across the realtime access layer, and the batch computation components, we ensured that only entities for which there is actual value changes are dispersed to our real-time access store (Cassandra). We will share how we did data modeling in Cassandra using its native capabilities such as counters, and how we worked around some of the limitations of Cassandra. We will also cover the details about the access service how we do different types of feature stitching together. How, based on our data model we were able to ensure that all the feature for an entity with the same aggregation window, were queried via a single query. Finally, we will cover some of the details on how these incremental aggregated features have enabled shorter turnaround times for the models using such features.
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other ObservationsMark Smith
The document discusses issues encountered when testing and evaluating residential customer premise equipment (CPE) implementations of IPv6. Some key issues included CPE sending unsolicited router advertisements too frequently, not properly decrementing prefix lifetimes, setting an incorrect current hop limit value, using non-unique local IPv6 addresses, and not supporting newer transport protocols like SCTP. The document emphasizes the importance of thorough testing and RFC compliance for CPE in order to ensure stable and interoperable IPv6 connectivity and services.
This document provides instructions for a P4 tutorial being conducted using a virtual machine (VM). It outlines how to download and set up the VM, including logging in and pulling the latest tutorial files. It describes the overall goals of learning the P4 language, tools, and future technology trends through a series of presentations and exercises. Finally, it provides an agenda with topics that will be covered over the course of the tutorial.
Recently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become attractive for companies having analytical tasks on only hundreds gigabytes to some ten terabytes of data because they can enjoy high-end parallel analytics at a cheap cost. This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service. The
major research issue is how to achieve a lower total cost of ownership by consolidating thousands of MPPDB tenants on to a shared hardware infrastructure, with a performance SLA that guarantees the tenants can obtain the query results as if they are executing their queries on dedicated machines. Thrifty achieves the goal by using a tenant-driven design that includes (1) a cluster design that carefully arranges the nodes in the cluster into groups and creates an MPPDB for each group of nodes, (2) a tenant placement that assigns each tenant to several MPPDBs (for high availability service through replication), and (3) a query routing algorithm that routes
a tenant’s query to the proper MPPDB at run-time. Experiments show that in a MPPDBaaS with 5000 tenants, where each tenant requests 2 to 32 nodes MPPDB to query against 200GB to 3.2TB of data, Thrifty can serve all the tenants with a 99.9% performance SLA guarantee and a high availability replication factor of 3, using only 18.7% of the nodes requested by the tenants.
The document provides an update on the OpenStack Manila project from the November 2018 OpenStack Summit in Berlin. It summarizes that Manila is the control plane for provisioning and managing shared filesystems across storage systems. The Rocky release added several new features including improved API changes and driver support. Priorities for the upcoming Stein release include running all gate jobs under Python 3 by default, adding pre-upgrade checks, and continuing work on access rule prioritization and JSON query validation from Rocky. The presentation encourages community feedback and contribution to the Manila project.
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
This presentation is an attempt to summarize the NumPy roadmap and both technical and non-technical ideas for the next 1-2 years to users that heavily rely on NumPy, as well as potential funders.
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One
Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways.
Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space.
Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.
-Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers.
-Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions.
-Understand how Capital One manages Kafka Docker containers using Kubernetes.
Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
Internet of Things (IoT) devices are becoming more ubiquitous in consumer, business and industrial landscapes. They are being widely used in applications ranging from home automation to the industrial internet. They pose a unique challenge in terms of the volume of data they produce, and the velocity with which they produce it, and the variety of sources they need to handle. The challenge is to ingest and process this data at the speed at which it is being produced in a real-time and fault tolerant fashion. Apache Apex is an industrial grade, scalable and fault tolerant big data processing platform that runs natively on Hadoop. In this deck, you will see how Apex is being used in IoT applications and also see how the enterprise features such as dimensional analytics, real-time dashboards and monitoring play a key role.
Presented by Pramod Immaneni, Principal Architect at DataTorrent and PPMC member Apache Apex, on BrightTALK webinar on Apr 6th, 2016
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Databricks
Aggregation based features account for a quarter of the several 1000s features used by the ML-based decisioning system by the Risk team at Uber. We observed several repetitive, cumbersome steps needed for onboarding a feature, every single time. Therefore, to accelerate developer velocity, and to enable Feature Engineering at scale, we decided to develop a generic spark based infrastructure to simplify the process to no more than a simple spec file, containing a parameterized query, along with some metadata on where the feature should be aggregated and stored.
In the presentation, we will describe the architecture of the final solution, highlighting some of the advanced capabilities like backfill support and self-healing for correctness. We will showcase how, using data stored in Hive and using Spark, we developed a highly scalable solution to carry out feature aggregation in an incremental way. By dividing data aggregation responsibility across the realtime access layer, and the batch computation components, we ensured that only entities for which there is actual value changes are dispersed to our real-time access store (Cassandra). We will share how we did data modeling in Cassandra using its native capabilities such as counters, and how we worked around some of the limitations of Cassandra. We will also cover the details about the access service how we do different types of feature stitching together. How, based on our data model we were able to ensure that all the feature for an entity with the same aggregation window, were queried via a single query. Finally, we will cover some of the details on how these incremental aggregated features have enabled shorter turnaround times for the models using such features.
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other ObservationsMark Smith
The document discusses issues encountered when testing and evaluating residential customer premise equipment (CPE) implementations of IPv6. Some key issues included CPE sending unsolicited router advertisements too frequently, not properly decrementing prefix lifetimes, setting an incorrect current hop limit value, using non-unique local IPv6 addresses, and not supporting newer transport protocols like SCTP. The document emphasizes the importance of thorough testing and RFC compliance for CPE in order to ensure stable and interoperable IPv6 connectivity and services.
This document provides instructions for a P4 tutorial being conducted using a virtual machine (VM). It outlines how to download and set up the VM, including logging in and pulling the latest tutorial files. It describes the overall goals of learning the P4 language, tools, and future technology trends through a series of presentations and exercises. Finally, it provides an agenda with topics that will be covered over the course of the tutorial.
Recently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become attractive for companies having analytical tasks on only hundreds gigabytes to some ten terabytes of data because they can enjoy high-end parallel analytics at a cheap cost. This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service. The
major research issue is how to achieve a lower total cost of ownership by consolidating thousands of MPPDB tenants on to a shared hardware infrastructure, with a performance SLA that guarantees the tenants can obtain the query results as if they are executing their queries on dedicated machines. Thrifty achieves the goal by using a tenant-driven design that includes (1) a cluster design that carefully arranges the nodes in the cluster into groups and creates an MPPDB for each group of nodes, (2) a tenant placement that assigns each tenant to several MPPDBs (for high availability service through replication), and (3) a query routing algorithm that routes
a tenant’s query to the proper MPPDB at run-time. Experiments show that in a MPPDBaaS with 5000 tenants, where each tenant requests 2 to 32 nodes MPPDB to query against 200GB to 3.2TB of data, Thrifty can serve all the tenants with a 99.9% performance SLA guarantee and a high availability replication factor of 3, using only 18.7% of the nodes requested by the tenants.
The document provides an update on the OpenStack Manila project from the November 2018 OpenStack Summit in Berlin. It summarizes that Manila is the control plane for provisioning and managing shared filesystems across storage systems. The Rocky release added several new features including improved API changes and driver support. Priorities for the upcoming Stein release include running all gate jobs under Python 3 by default, adding pre-upgrade checks, and continuing work on access rule prioritization and JSON query validation from Rocky. The presentation encourages community feedback and contribution to the Manila project.
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
This presentation is an attempt to summarize the NumPy roadmap and both technical and non-technical ideas for the next 1-2 years to users that heavily rely on NumPy, as well as potential funders.
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One
Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways.
Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space.
Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.
-Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers.
-Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions.
-Understand how Capital One manages Kafka Docker containers using Kubernetes.
Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
Internet of Things (IoT) devices are becoming more ubiquitous in consumer, business and industrial landscapes. They are being widely used in applications ranging from home automation to the industrial internet. They pose a unique challenge in terms of the volume of data they produce, and the velocity with which they produce it, and the variety of sources they need to handle. The challenge is to ingest and process this data at the speed at which it is being produced in a real-time and fault tolerant fashion. Apache Apex is an industrial grade, scalable and fault tolerant big data processing platform that runs natively on Hadoop. In this deck, you will see how Apex is being used in IoT applications and also see how the enterprise features such as dimensional analytics, real-time dashboards and monitoring play a key role.
Presented by Pramod Immaneni, Principal Architect at DataTorrent and PPMC member Apache Apex, on BrightTALK webinar on Apr 6th, 2016
Building Enterprise Grade Applications in Yarn with Apache TwillCask Data
Speaker: Poorna Chandra, from Cask
Big Data Applications Meetup, 07/27/2016
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to talk: https://www.youtube.com/watch?v=I1GLRXyQlx8
About the talk:
Twill is an Apache incubator project that provides higher level abstraction to build distributed systems applications on YARN. Developing distributed applications using YARN is challenging because it does not provide higher level APIs, and lots of boiler plate code needs to be duplicated to deploy applications. Developing YARN applications is typically done by framework developers, like those familiar with Apache Flink or Apache Spark, who need to deploy the framework in a distributed way.
By using Twill, application developers need only be familiar with the basics of the Java programming model when using the Twill APIs, so they can focus on solving business problems. In this talk I present how Twill can be leveraged and an example of Cask Data Application Platform (CDAP) that heavily uses Twill for resource management.
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
Born at Facebook, Presto is an open source high performance, distributed SQL query engine. With the disaggregation of storage and compute, Presto was created to simplify querying of all data lakes - cloud data lakes like S3 and on premise data lakes like HDFS. Presto's high performance and flexibility has made it a very popular choice for interactive query workloads on large Hadoop-based clusters as well as AWS S3, Google Cloud Storage and Azure blob store. Today it has grown to support many users and use cases including ad hoc query, data lake house analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as the future roadmap.
This document introduces TiDB, an open source distributed SQL database developed by PingCAP. It provides a 3-part summary:
1) TiDB is a hybrid transactional/analytical database inspired by Google Spanner/F1 that provides horizontal scalability, MySQL compatibility, and ACID transactions. It consists of TiDB, TiKV, and Placement Driver.
2) Mobike, a bike sharing platform with 200 million users, uses TiDB to power operations like bike locking/unlocking tracking and real-time analytics to handle high concurrency and permanent storage needs.
3) Over 200 companies use TiDB for two major uses - MySQL scalability and hybrid OLTP/OLAP architecture
This webinar recording introduces potential users in the energy industry to the communications specification defined by TROLIE, an LF Energy project aiming to establish an open conformance standard and cultivate a software ecosystem to accelerate the implementation of reliable, secure, and interoperable systems for the exchange of transmission facility ratings and related information. With FERC Order 881 being implemented next year in the United States, most organizations involved in the operation of the transmission system in North America now need to exchange ratings and related information in an automated, frequent manner. This project will help accelerate their implementation and simplify interoperability.
This webinar provides a technical introduction as well as "how to" content from the perspective of TROLIE's primary users - reliability coordinators and transmission owners.
The webinar was presented by Christopher Atkins of MISO and Tory McKeag of GE Vernova.
Learn more about the TROLIE project at https://lfenergy.org/projects/trolie/.
Accelerating Networked Applications with Flexible Packet ProcessingOpen-NFP
The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing subsystems for many cloud and data center applications, such as key-value stores and real-time analytics frameworks. A major reason for the high memory and processing overheads is the inefficient use of these resources by network interface cards. Offloading functionality to a programmable NIC can help, but what to offload needs to be carefully chosen.
This presentation will cover a number of reusable offloading mechanisms that can help data center software processing efficiency. It will show how to implement these mechanisms in the P4 programming language and discuss their efficiency using experiments run on the Netronome Agilio-CX NIC.
SDN in the Management Plane: OpenConfig and Streaming TelemetryAnees Shaikh
The networking industry has made good progress in the last few years on developing programmable interfaces and protocols for the control plane to enable a more dynamic and efficient infrastructure. Despite this progress, some parts of networking risk being left behind, most notably network management and configuration. The state-of-the-art in network management remains relegated to proprietary device interfaces (e.g., CLIs), imperative, incremental configuration, and lack of meaningful abstractions.
We propose a framework for network configuration guided by software-defined networking principles, with a focus on developing common models of network devices, and common languages to describe network structure and policies. We also propose a publish/subscribe framework for next generation network telemetry, focused on streaming structured data from network elements themselves.
The document summarizes a presentation about Apache Ratis, a Raft consensus library. It introduces Raft consensus and describes Ratis' features like leader election, log replication, pluggable components, and use cases in Hadoop projects like Ozone. It also outlines Ratis' development status and future work areas like performance, metrics, security, and documentation.
This document discusses using Pivotal's Big Data Suite to build a real-time analytics solution for processing taxi trip data streams. It presents an architecture that uses Spring XD for data ingestion, Spark Streaming for in-memory analytics on 10-second windows, Gemfire for fast data retrieval, and Pivotal HD for long-term storage. The solution demonstrates filtering inconsistent data, finding top traffic areas, and available taxis in real-time. The document highlights how the Big Data Suite provides a complete toolset for data-driven enterprises through its optimized Hadoop distribution, in-memory processing, stream processing, and low-latency data stores.
APNIC Product Manager Tom Harrison presents on the draft regext-rdap-rir-search functionality at IETF 118, held in Prague, Czech Republic from 4 to 10 November 2023.
This presentation provides an overview of the architecture and technology of TiDB, an open-source distributed NewSQL database, and how it helps Mobike, one of the largest dockless bikeshare platform, scale its infrastructure to achieve hyper-growth.
Building Pinterest Real-Time Ads Platform Using Kafka Streams confluent
Building Pinterest Real-Time Ads Platform Using Kafka Streams (Liquan Pei + Boyang Chen, Pinterest) Kafka Summit SF 2018
In this talk, we are sharing the experience of building Pinterest’s real-time Ads Platform utilizing Kafka Streams. The real-time budgeting system is the most mission-critical component of the Ads Platform as it controls how each ad is delivered to maximize user, advertiser and Pinterest value. The system needs to handle over 50,000 queries per section (QPS) impressions, requires less than five seconds of end-to-end latency and recovers within five minutes during outages. It also needs to be scalable to handle the fast growth of Pinterest’s ads business.
The real-time budgeting system is composed of real-time stream-stream joiner, real-time spend aggregator and a spend predictor. At Pinterest’s scale, we need to overcome quite a few challenges to make each component work. For example, the stream-stream joiner needs to maintain terabyte size state while supporting fast recovery, and the real-time spend aggregator needs to publish to thousands of ads servers while supporting over one million read QPS. We choose Kafka Streams as it provides milliseconds latency guarantee, scalable event-based processing and easy-to-use APIs. In the process of building the system, we performed tons of tuning to RocksDB, Kafka Producer and Consumer, and pushed several open source contributions to Apache Kafka. We are also working on adding a remote checkpoint for Kafka Streams state to reduce the time of code start when adding more machines to the application. We believe that our experience can be beneficial to people who want to build real-time streaming solutions at large scale and deeply understand Kafka Streams.
Have you ever thought about how your site’s performance compares to the web as a whole? Or maybe you’re curious how popular a particular web feature is. How much is too much JavaScript? The HTTP Archive has been keeping track of how the web is built since 2010. It enables you to find answers to questions about the state of the web past and present. Rick Viscomi, developer relations at Google, and Paul Calvano, web performance architect at Akamai, will explore how the HTTP Archive works, some of the ways people are using this dataset, and discuss some ways that Akamai has leveraged data within the HTTP Archive to help our customers. The session will include: (1) an intro to the HTTP Archive (what it does, how it works, who uses it, and how to get started quickly), and (2) case studies on compression research, third-party research, server analysis, and how to identify common performance errors across sites
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveRick Viscomi
Rick Viscomi and Paul Calvano discuss tracking the performance of the web with HTTP Archive, including how the tool works and some deeply technical case studies that show off the powerful insights to be learned.
JCIS 2022 - Smart LAMA API: Automated Capacity Analysis of Limitation-Aware M...Rafael Apellidos
The document describes the Smart LAMA API, which allows for automated capacity analysis of Limitation-Aware Microservices Architectures (LAMAs). The API accepts formal descriptions of LAMAs, which include internal/external services, requests, and pricing plans. It provides operations to analyze LAMA configurations to determine the maximum requests, minimum cost, or minimum time for a given scenario. The API was validated using examples in an online Jupyter notebook, with wrappers to simplify usage. While the current version only supports single metrics, endpoints, and entrypoints, the API automates constraint solving problems that were previously difficult to analyze at scale for LAMA configurations.
The document summarizes new features and improvements in Apache Spark 2.3 for machine learning. Key highlights include first-class support for loading image data, enhanced scalability of feature transformers by supporting multiple columns, parallelizing cross-validation for faster hyperparameter tuning, and a new scalable feature hashing transformer. Performance tests demonstrate that the multi-column transformers provide up to 2.7x speedup over the single-column approach. Parallel cross-validation also provides a 2-2.7x speedup using 3 threads. Future areas of focus include completing multi-column support, improving Python APIs, and enhancing techniques like gradient boosted trees.
This document discusses implementing R-trees in Datomic to enable geospatial queries. It provides an overview of Datomic and motivations for using it for spatial data. It then describes implementing R-trees in Datomic, including the schema, insertion and splitting transactions. It also discusses bulk loading R-trees using Hilbert curves to improve performance over single insertions. Future plans include supporting retractions, updates, additional queries and data types.
Outsourcing IT Projects to Managed Hosting of the CloudRackspace
Is your organization looking to cut costs, reduce deployment time, or gain new capabilities that you find challenging to implement with traditional on-premises infrastructure? Then outsourcing IT may be right for you.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Registry Data Accuracy Improvements, presented by Chimi Dorji at SANOG 41 / I...APNIC
Chimi Dorji, Internet Resource Analyst at APNIC, presented on Registry Data Accuracy Improvements at SANOG 41 jointly held with INNOG 7 in Mumbai, India from 25 to 30 April 2024.
Contenu connexe
Similaire à draft-harrison-sidrops-manifest-number-01, presented at IETF 119
Building Enterprise Grade Applications in Yarn with Apache TwillCask Data
Speaker: Poorna Chandra, from Cask
Big Data Applications Meetup, 07/27/2016
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to talk: https://www.youtube.com/watch?v=I1GLRXyQlx8
About the talk:
Twill is an Apache incubator project that provides higher level abstraction to build distributed systems applications on YARN. Developing distributed applications using YARN is challenging because it does not provide higher level APIs, and lots of boiler plate code needs to be duplicated to deploy applications. Developing YARN applications is typically done by framework developers, like those familiar with Apache Flink or Apache Spark, who need to deploy the framework in a distributed way.
By using Twill, application developers need only be familiar with the basics of the Java programming model when using the Twill APIs, so they can focus on solving business problems. In this talk I present how Twill can be leveraged and an example of Cask Data Application Platform (CDAP) that heavily uses Twill for resource management.
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
Born at Facebook, Presto is an open source high performance, distributed SQL query engine. With the disaggregation of storage and compute, Presto was created to simplify querying of all data lakes - cloud data lakes like S3 and on premise data lakes like HDFS. Presto's high performance and flexibility has made it a very popular choice for interactive query workloads on large Hadoop-based clusters as well as AWS S3, Google Cloud Storage and Azure blob store. Today it has grown to support many users and use cases including ad hoc query, data lake house analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as the future roadmap.
This document introduces TiDB, an open source distributed SQL database developed by PingCAP. It provides a 3-part summary:
1) TiDB is a hybrid transactional/analytical database inspired by Google Spanner/F1 that provides horizontal scalability, MySQL compatibility, and ACID transactions. It consists of TiDB, TiKV, and Placement Driver.
2) Mobike, a bike sharing platform with 200 million users, uses TiDB to power operations like bike locking/unlocking tracking and real-time analytics to handle high concurrency and permanent storage needs.
3) Over 200 companies use TiDB for two major uses - MySQL scalability and hybrid OLTP/OLAP architecture
This webinar recording introduces potential users in the energy industry to the communications specification defined by TROLIE, an LF Energy project aiming to establish an open conformance standard and cultivate a software ecosystem to accelerate the implementation of reliable, secure, and interoperable systems for the exchange of transmission facility ratings and related information. With FERC Order 881 being implemented next year in the United States, most organizations involved in the operation of the transmission system in North America now need to exchange ratings and related information in an automated, frequent manner. This project will help accelerate their implementation and simplify interoperability.
This webinar provides a technical introduction as well as "how to" content from the perspective of TROLIE's primary users - reliability coordinators and transmission owners.
The webinar was presented by Christopher Atkins of MISO and Tory McKeag of GE Vernova.
Learn more about the TROLIE project at https://lfenergy.org/projects/trolie/.
Accelerating Networked Applications with Flexible Packet ProcessingOpen-NFP
The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing subsystems for many cloud and data center applications, such as key-value stores and real-time analytics frameworks. A major reason for the high memory and processing overheads is the inefficient use of these resources by network interface cards. Offloading functionality to a programmable NIC can help, but what to offload needs to be carefully chosen.
This presentation will cover a number of reusable offloading mechanisms that can help data center software processing efficiency. It will show how to implement these mechanisms in the P4 programming language and discuss their efficiency using experiments run on the Netronome Agilio-CX NIC.
SDN in the Management Plane: OpenConfig and Streaming TelemetryAnees Shaikh
The networking industry has made good progress in the last few years on developing programmable interfaces and protocols for the control plane to enable a more dynamic and efficient infrastructure. Despite this progress, some parts of networking risk being left behind, most notably network management and configuration. The state-of-the-art in network management remains relegated to proprietary device interfaces (e.g., CLIs), imperative, incremental configuration, and lack of meaningful abstractions.
We propose a framework for network configuration guided by software-defined networking principles, with a focus on developing common models of network devices, and common languages to describe network structure and policies. We also propose a publish/subscribe framework for next generation network telemetry, focused on streaming structured data from network elements themselves.
The document summarizes a presentation about Apache Ratis, a Raft consensus library. It introduces Raft consensus and describes Ratis' features like leader election, log replication, pluggable components, and use cases in Hadoop projects like Ozone. It also outlines Ratis' development status and future work areas like performance, metrics, security, and documentation.
This document discusses using Pivotal's Big Data Suite to build a real-time analytics solution for processing taxi trip data streams. It presents an architecture that uses Spring XD for data ingestion, Spark Streaming for in-memory analytics on 10-second windows, Gemfire for fast data retrieval, and Pivotal HD for long-term storage. The solution demonstrates filtering inconsistent data, finding top traffic areas, and available taxis in real-time. The document highlights how the Big Data Suite provides a complete toolset for data-driven enterprises through its optimized Hadoop distribution, in-memory processing, stream processing, and low-latency data stores.
APNIC Product Manager Tom Harrison presents on the draft regext-rdap-rir-search functionality at IETF 118, held in Prague, Czech Republic from 4 to 10 November 2023.
This presentation provides an overview of the architecture and technology of TiDB, an open-source distributed NewSQL database, and how it helps Mobike, one of the largest dockless bikeshare platform, scale its infrastructure to achieve hyper-growth.
Building Pinterest Real-Time Ads Platform Using Kafka Streams confluent
Building Pinterest Real-Time Ads Platform Using Kafka Streams (Liquan Pei + Boyang Chen, Pinterest) Kafka Summit SF 2018
In this talk, we are sharing the experience of building Pinterest’s real-time Ads Platform utilizing Kafka Streams. The real-time budgeting system is the most mission-critical component of the Ads Platform as it controls how each ad is delivered to maximize user, advertiser and Pinterest value. The system needs to handle over 50,000 queries per section (QPS) impressions, requires less than five seconds of end-to-end latency and recovers within five minutes during outages. It also needs to be scalable to handle the fast growth of Pinterest’s ads business.
The real-time budgeting system is composed of real-time stream-stream joiner, real-time spend aggregator and a spend predictor. At Pinterest’s scale, we need to overcome quite a few challenges to make each component work. For example, the stream-stream joiner needs to maintain terabyte size state while supporting fast recovery, and the real-time spend aggregator needs to publish to thousands of ads servers while supporting over one million read QPS. We choose Kafka Streams as it provides milliseconds latency guarantee, scalable event-based processing and easy-to-use APIs. In the process of building the system, we performed tons of tuning to RocksDB, Kafka Producer and Consumer, and pushed several open source contributions to Apache Kafka. We are also working on adding a remote checkpoint for Kafka Streams state to reduce the time of code start when adding more machines to the application. We believe that our experience can be beneficial to people who want to build real-time streaming solutions at large scale and deeply understand Kafka Streams.
Have you ever thought about how your site’s performance compares to the web as a whole? Or maybe you’re curious how popular a particular web feature is. How much is too much JavaScript? The HTTP Archive has been keeping track of how the web is built since 2010. It enables you to find answers to questions about the state of the web past and present. Rick Viscomi, developer relations at Google, and Paul Calvano, web performance architect at Akamai, will explore how the HTTP Archive works, some of the ways people are using this dataset, and discuss some ways that Akamai has leveraged data within the HTTP Archive to help our customers. The session will include: (1) an intro to the HTTP Archive (what it does, how it works, who uses it, and how to get started quickly), and (2) case studies on compression research, third-party research, server analysis, and how to identify common performance errors across sites
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveRick Viscomi
Rick Viscomi and Paul Calvano discuss tracking the performance of the web with HTTP Archive, including how the tool works and some deeply technical case studies that show off the powerful insights to be learned.
JCIS 2022 - Smart LAMA API: Automated Capacity Analysis of Limitation-Aware M...Rafael Apellidos
The document describes the Smart LAMA API, which allows for automated capacity analysis of Limitation-Aware Microservices Architectures (LAMAs). The API accepts formal descriptions of LAMAs, which include internal/external services, requests, and pricing plans. It provides operations to analyze LAMA configurations to determine the maximum requests, minimum cost, or minimum time for a given scenario. The API was validated using examples in an online Jupyter notebook, with wrappers to simplify usage. While the current version only supports single metrics, endpoints, and entrypoints, the API automates constraint solving problems that were previously difficult to analyze at scale for LAMA configurations.
The document summarizes new features and improvements in Apache Spark 2.3 for machine learning. Key highlights include first-class support for loading image data, enhanced scalability of feature transformers by supporting multiple columns, parallelizing cross-validation for faster hyperparameter tuning, and a new scalable feature hashing transformer. Performance tests demonstrate that the multi-column transformers provide up to 2.7x speedup over the single-column approach. Parallel cross-validation also provides a 2-2.7x speedup using 3 threads. Future areas of focus include completing multi-column support, improving Python APIs, and enhancing techniques like gradient boosted trees.
This document discusses implementing R-trees in Datomic to enable geospatial queries. It provides an overview of Datomic and motivations for using it for spatial data. It then describes implementing R-trees in Datomic, including the schema, insertion and splitting transactions. It also discusses bulk loading R-trees using Hilbert curves to improve performance over single insertions. Future plans include supporting retractions, updates, additional queries and data types.
Outsourcing IT Projects to Managed Hosting of the CloudRackspace
Is your organization looking to cut costs, reduce deployment time, or gain new capabilities that you find challenging to implement with traditional on-premises infrastructure? Then outsourcing IT may be right for you.
Similaire à draft-harrison-sidrops-manifest-number-01, presented at IETF 119 (20)
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Registry Data Accuracy Improvements, presented by Chimi Dorji at SANOG 41 / I...APNIC
Chimi Dorji, Internet Resource Analyst at APNIC, presented on Registry Data Accuracy Improvements at SANOG 41 jointly held with INNOG 7 in Mumbai, India from 25 to 30 April 2024.
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
Sunny Chendi, Senior Advisor, Membership and Policy at APNIC, presents 'APNIC Policy Roundup' at the 5th ICANN APAC-TWNIC Engagement Forum and 41st TWNIC OPM in Taipei, Taiwan from 23 to 24 April.
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
Dave Phelan, Senior Network Analyst/Technical Trainer at APNIC, presents 'DDoS In Oceania and the Pacific' at NZNOG 2024 held in Nelson, New Zealand from 8 to 12 April 2024.
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
Geoff Huston, Chief Scientist at APNIC deliver keynote presentation on the 'Future Evolution of the Internet' at the Everything Open 2024 conference in Gladstone, Australia from 16 to 18 April 2024.
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
Paul Wilson, Director General of APNIC delivers a presentation on IP addressing and IPv6 to the Policymakers Program during IETF 119 in Brisbane Australia from 16 to 22 March 2024.
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...APNIC
Che-Hoo Cheng, Senior Director, Development at APNIC presents on the "Benefits of doing Internet peering and running an Internet Exchange (IX)" at the Communications Regulatory Commission of Mongolia's IPv6, IXP, Datacenter - Policy and Regulation International Trends Forum in Ulaanbaatar, Mongolia on 7 March 2024
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC
APNIC Senior Advisor, Membership and Policy, Sunny Chendi presented on APNIC updates and RIR Policies for ccTLDs at APTLD 85 in Goa, India from 19-22 February 2024.
Lao Digital Week 2024: It's time to deploy IPv6APNIC
APNIC Development Director Che-Hoo Cheng presents on the importance of deploying IPv6 at the Lao Digital Week 2024, held in Vientiane, Lao PDR from 10 to 14 January 2024.
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
HijackLoader Evolution: Interactive Process HollowingDonato Onofri
CrowdStrike researchers have identified a HijackLoader (aka IDAT Loader) sample that employs sophisticated evasion techniques to enhance the complexity of the threat. HijackLoader, an increasingly popular tool among adversaries for deploying additional payloads and tooling, continues to evolve as its developers experiment and enhance its capabilities.
In their analysis of a recent HijackLoader sample, CrowdStrike researchers discovered new techniques designed to increase the defense evasion capabilities of the loader. The malware developer used a standard process hollowing technique coupled with an additional trigger that was activated by the parent process writing to a pipe. This new approach, called "Interactive Process Hollowing", has the potential to make defense evasion stealthier.
2. 2
What is this about? (1)
● Manifests include a field called manifestNumber
● RFC 6486: Manifests for the RPKI (February 2012)
– “This field is an integer that is incremented each time a new manifest is
issued for a given publication point”
– “Manifest verifiers MUST be able to handle number values up to 20 octets”
● RFC 9286: Manifests for the RPKI (June 2022)
– “Each RP MUST verify that a purported "new" manifest contains a higher
manifestNumber than previously validated manifests”
– “If the purported "new" manifest contains a manifestNumber value equal to
or lower than manifestNumber values of previously validated manifests, the
RP SHOULD use locally cached versions of objects”
3. 3
What is this about? (2)
● A strict reading of the text requires that relying parties reject new manifests once the largest
signed 20-byte value (i.e. 1 << ((20 * 8) - 1) – 1, called MN_MAX from here) is
reached
– The CA can’t be used from that point onwards
● But MN_MAX is a very large number?
– Yes: if manifestNumber is incremented by one on reissuance, MN_MAX would not be reached
in billions of years, even if issuing billions of manifests per second
– But the manifest number can be set to an arbitrary value by the issuer
● E.g. ARIN’s TA’s manifestNumber is currently ~1x1045
(~1/128 of MN_MAX)
– And bugs or similar could cause manifestNumber to be set to a large value, or to increment
by large values
– In particular, if a TA becomes unusable, requires new root key issuance
● Time-consuming, plus long tail of users still using old TAL
4. 4
How should RPs handle this?
● Draft includes strawman proposal: if the manifest filename changes, reset the manifest number
check
– This is the current behaviour in rpki-client
● See appendix A for a detailed description of the rpki-client implementation
● Based on reading 9286 as referring to manifest numbers on a per-manifest-filename
basis, rather than a per-CA basis
– Pros:
● Simple to implement
● If there is consensus that this is required by 9286 as-is, then no further draft work
required
– Cons:
● 9286 can be interpreted as though this is not required, or even permitted
– See also RFC 8488 (RIPE NCC's Implementation of Resource Public Key
Infrastructure (RPKI) Certificate Tree Validation)
5. 5
What are the other options?
● Remove the manifestNumber check
● Make the largest manifestNumber a function of the current
time
● Use serial number arithmetic to facilitate rollover
● Leave as-is on the RP side
– Up to CAs/TAs to ensure manifestNumber makes sense
on the server side before publication
6. 6
RP support for 9286
Manifest number
reuse
Accepted Accepted Accepted Rejected
Manifest number
regression
Accepted Accepted Accepted Rejected
Manifest number of
MN_MAX Rejected Accepted Accepted Accepted
Manifest number >
MN_MAX Rejected Accepted Rejected Accepted
Manifest number of
MN_MAX_2* Rejected Accepted Rejected Accepted
Manifest number >
MN_MAX_2* Rejected Accepted Rejected Rejected
https:/
/github.com/APNIC-net/rpki-mft-number-demo
* (1 << (20 * 8)) – 1 (i.e. the largest unsigned 160-bit value, rather than the largest signed 160-bit value)