There are parallels between storing JSON data in PostgreSQL and storing vectors that are produced from AI/ML systems. This lightning talk briefly covers the similarities in use-cases in storing JSON and vectors in PostgreSQL, shows some of the use-cases developers have for querying vectors in Postgres, and some roadmap items for improving PostgreSQL as a vector database.
발표영상 다시보기: https://youtu.be/eQjkwhyOOmI
대규모 데이터 레이크 구성 및 관리는 복잡하고 시간이 많이 걸리는 작업입니다. AWS Lake Formation은 수일만에 안전한 데이터 레이크를 구성할 수 있는 완전 관리 서비스입니다. 본 세션에서는 데이터 수집, 분류, 정리, 변환 및 보안을 위해 AWS Lake Formation을 통해 Amazon S3, EMR, Redshift 및 Athena와 같은 분석 도구를 쉽게 구성하는 방법을 알아봅니다. (2019년 11월 서울 리전 출시)
AWS Glue는 고객이 분석을 위해 손쉽게 데이터를 준비하고 로드할 수 있게 지원하는 완전관리형 ETL(추출, 변환 및 로드) 서비스입니다. AWS 관리 콘솔에서 클릭 몇 번으로 ETL 작업을 생성하고 실행할 수 있습니다. 빅데이터 분석 시 다양한 데이터 소스에 대한 전처리 작업을 할 때, 별도의 데이터 처리용 서버나 인프라를 관리할 필요가 없습니다. 본 세션에서는 지난 5월 서울 리전에 출시한 Glue 서비스에 대한 자세한 소개와 함께 다양한 활용 팁을 데모와 함께 소개해 드립니다.
어떻게 하면 배포 프로세스를 빠르게 개선할 수 있을까요?
git branch를 푸시하고 개별 테스트 서버를 만드려면 어떻게 해야 할까요?
쿠버네티스와 GitOps, Argo CD를 이용한 배포 방법을 소개 합니다.
Open Infrastructure & Cloud Native Days Korea 2019 발표자료
원본 슬라이드 다운로드 - http://bit.ly/subicura-gitops
최근 국내와 글로벌 서비스에서 MongoDB를 사용하는 사례가 급증하고 있습니다. 다만 전통적인 RDBMS에 비해, 아직 지식과 경험의 축적이 적게 되어 있어 손쉬운 접근과 트러블 슈팅등에 문제가 있는 것도 사실입니다. 이 세션에서는 MongoDB 와 AWS의 DocumentDB의 Architecure를 간단히 살펴보고 MongoDB 및 DocumentDB의 비교를 진행하며 특히 MongoDB와 DocumentDB를 사용할때 주의해야할 중요 포인트에 대해서 알아봅니다.
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용
김태현 솔루션즈 아키텍트, AWS
AWS에서는 Big Data 분석 및 처리를 위해 분석 목적에 맞는 다양한 Big Data Framework 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터의 분석 및 처리를 위해 사용되는 AWS Glue와 Amazon EMR 같은 AWS Big Data Framework의 내부구조를 살펴보고 머신러닝을 포함한 다양한 분석 및 ETL을 위해 효율적으로 사용할 수 있는 방법들을 소개합니다.
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...Amazon Web Services Korea
기존 온프레미스 환경에서는 비즈니스 성장에 따른 유연한 확장에 어려움 있어 AWS를 이용하여 더욱 탄력적인 환경을 구축하는 프로젝트를 수행하였습니다. 이 세션을 통해 카카오게임즈가 AWS와 함께 수행한 데이터레이크 마이그레이션의 여정과, 그 과정에서 Amazon S3, EMR, Athena, Redshift 등의 다양한 기술 요소들을 활용한 경험과 팁을 전달해 드립니다.
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기Amazon Web Services Korea
CDC 기반 upserting 기능을 제공하는 Transactional Data Lake를 Apache Iceberg와 AWS Glue를 이용해서 구축하는 방법을 소개합니다. MySQL과 같은 RDS에서 발생하는 CDC 데이터를 Amazon Kinesis 또는 MSK를 통해서 실시간으로 S3에 Apache Iceberg 포맷으로 저장하는 Transactional Data Lake 아키텍처를 소개합니다.
발표영상 다시보기: https://youtu.be/eQjkwhyOOmI
대규모 데이터 레이크 구성 및 관리는 복잡하고 시간이 많이 걸리는 작업입니다. AWS Lake Formation은 수일만에 안전한 데이터 레이크를 구성할 수 있는 완전 관리 서비스입니다. 본 세션에서는 데이터 수집, 분류, 정리, 변환 및 보안을 위해 AWS Lake Formation을 통해 Amazon S3, EMR, Redshift 및 Athena와 같은 분석 도구를 쉽게 구성하는 방법을 알아봅니다. (2019년 11월 서울 리전 출시)
AWS Glue는 고객이 분석을 위해 손쉽게 데이터를 준비하고 로드할 수 있게 지원하는 완전관리형 ETL(추출, 변환 및 로드) 서비스입니다. AWS 관리 콘솔에서 클릭 몇 번으로 ETL 작업을 생성하고 실행할 수 있습니다. 빅데이터 분석 시 다양한 데이터 소스에 대한 전처리 작업을 할 때, 별도의 데이터 처리용 서버나 인프라를 관리할 필요가 없습니다. 본 세션에서는 지난 5월 서울 리전에 출시한 Glue 서비스에 대한 자세한 소개와 함께 다양한 활용 팁을 데모와 함께 소개해 드립니다.
어떻게 하면 배포 프로세스를 빠르게 개선할 수 있을까요?
git branch를 푸시하고 개별 테스트 서버를 만드려면 어떻게 해야 할까요?
쿠버네티스와 GitOps, Argo CD를 이용한 배포 방법을 소개 합니다.
Open Infrastructure & Cloud Native Days Korea 2019 발표자료
원본 슬라이드 다운로드 - http://bit.ly/subicura-gitops
최근 국내와 글로벌 서비스에서 MongoDB를 사용하는 사례가 급증하고 있습니다. 다만 전통적인 RDBMS에 비해, 아직 지식과 경험의 축적이 적게 되어 있어 손쉬운 접근과 트러블 슈팅등에 문제가 있는 것도 사실입니다. 이 세션에서는 MongoDB 와 AWS의 DocumentDB의 Architecure를 간단히 살펴보고 MongoDB 및 DocumentDB의 비교를 진행하며 특히 MongoDB와 DocumentDB를 사용할때 주의해야할 중요 포인트에 대해서 알아봅니다.
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용
김태현 솔루션즈 아키텍트, AWS
AWS에서는 Big Data 분석 및 처리를 위해 분석 목적에 맞는 다양한 Big Data Framework 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터의 분석 및 처리를 위해 사용되는 AWS Glue와 Amazon EMR 같은 AWS Big Data Framework의 내부구조를 살펴보고 머신러닝을 포함한 다양한 분석 및 ETL을 위해 효율적으로 사용할 수 있는 방법들을 소개합니다.
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...Amazon Web Services Korea
기존 온프레미스 환경에서는 비즈니스 성장에 따른 유연한 확장에 어려움 있어 AWS를 이용하여 더욱 탄력적인 환경을 구축하는 프로젝트를 수행하였습니다. 이 세션을 통해 카카오게임즈가 AWS와 함께 수행한 데이터레이크 마이그레이션의 여정과, 그 과정에서 Amazon S3, EMR, Athena, Redshift 등의 다양한 기술 요소들을 활용한 경험과 팁을 전달해 드립니다.
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기Amazon Web Services Korea
CDC 기반 upserting 기능을 제공하는 Transactional Data Lake를 Apache Iceberg와 AWS Glue를 이용해서 구축하는 방법을 소개합니다. MySQL과 같은 RDS에서 발생하는 CDC 데이터를 Amazon Kinesis 또는 MSK를 통해서 실시간으로 S3에 Apache Iceberg 포맷으로 저장하는 Transactional Data Lake 아키텍처를 소개합니다.
by Mahesh Pakal, AWS
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
OpenSearch는 배포형 오픈 소스 검색과 분석 제품군으로 실시간 애플리케이션 모니터링, 로그 분석 및 웹 사이트 검색과 같이 다양한 사용 사례에 사용됩니다. OpenSearch는 데이터 탐색을 쉽게 도와주는 통합 시각화 도구 OpenSearch와 함께 뛰어난 확장성을 지닌 시스템을 제공하여 대량 데이터 볼륨에 빠르게 액세스 및 응답합니다. 이 세션에서는 실제 동작 구조에 대한 설명을 바탕으로 최적화를 하기 위한 방법과 운영상에 발생할 수 있는 이슈에 대해서 알아봅니다.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
Future database administration will be highly automated. Until then, we still live in a world where extensive manual interactions are required from a skilled DBA. This will change soon as more "autonomous databases" reach maturity and enter the production environment.
Postgres-specific monitoring tools and systems continue to improve, detecting and analyzing performance issues and bottlenecks in production databases. However, while these tools can detect current issues, they require highly-experienced DBAs to analyze and recommend mitigations.
In this session, the speaker will present the initial results of the POSTGRES.AI project – Nancy CLI, a unified way to manage automated database experiments. Nancy CLI is an automated database management framework based on well-known open-source projects and incorporating major open-source tools and Postgres modules: pgBadger, pg_stat_kcache, auto_explain, pgreplay, and others.
Originally developed with the goal to simulate various SQL query use cases in various environments and collect data to train ML models, Nancy CLI turned out to be very a universal framework that can play a crucial role in CI/CD pipelines in any company.
Using Nancy CLI, casual DBAs and any engineers can easily conduct automated experiments today, either on AWS EC2 Spot instances or on any other servers. All you need is to tell Nancy which database to use, specify workload (synthetic or "real", generated based on the Postgres logs), and what you want to test – say, check how a new index will affect all most expensive query groups from pg_stat_statements, or compare various values of "default_statistics_target". All the collected information with a very high level of confidence will give you understanding, how various queries and overall Postgres performance will be affected when you apply this change to production.
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study
이 세션에서는 데브시스터즈의 Case Study를 통하여 Data Lake를 만들고 사용하는데 있어 요구 되는 사항들에 대해 공유합니다. 여러 목적에 맞는 데이터를 전달하기 위해 AWS 를 활용하여 Data Lake 를 구축하게된 계기와 실제 구축 작업을 하면서 경험하게 된 것들에 대해 말씀드리고자 합니다. 기존 인프라 구조 대비 효율성 및 비용적 측면을 소개해드리고, 빅데이터를 이용한 부서별 데이터 세분화를 진행할 때 어떠한 Architecture가 사용되었는지 소개드리고자 합니다.
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!
This one is about advanced indexing in PostgreSQL. It guides you through basic concepts as well as through advanced techniques to speed up the database.
All important PostgreSQL Index types explained: btree, gin, gist, sp-gist and hashes.
Regular expression indexes and LIKE queries are also covered.
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...Amazon Web Services Korea
발표영상 다시보기: https://youtu.be/hPvBst9TPlI
S3 기반의 데이터레이크에서 대량의 데이터 변환과 처리에 사용될 수 있는 가장 대표적인 솔루션이 Apache Spark 입니다. EMR 플랫폼 환경에서 쉽게 적용 가능한 Apache Spark의 성능 향상 팁을 소개합니다. 또한 데이터의 레코드 레벨 업데이트, 리소스 확장, 권한 관리 및 모니터링과 같은 다양한 데이터 워크로드 관리 최적화 방안을 함께 살펴봅니다.
AWS에서는 Big Data 분석 및 처리를 위해 다양한 Analytics 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터 분석 및 처리를 위해 데이터 레이크 카탈로그를 구축하거나 ETL을 위해 사용되는 AWS Glue 내부 구조를 살펴보고 효율적으로 사용할 수 있는 방법들을 소개합니다.
FOSSASIA - MySQL Cookbook 4e Journey APR 2023.pdfAlkin Tezuysal
Presenting the newly released MySQL Cookbook 4th edition to help developers and administrators to understand simple to complex recipes.
MySQL Cookbook 4th edition was released this summer. We are the book's authors and will show you how to "cook" MySQL. We will show you a few tasks with different priorities, such as JSON in MySQL for those who need flexibility, modern SQL for analytics, and Group Replication for high availability. We will also show how to write programs using JavaScript and Python languages, X DevAPI, and MySQL Shell. We will touch on some of the exciting features of MySQL Spatial Indexes and Geographical Data, Using a Full-Text Search, and more. We're hoping this talk will interest developers and administrators of MySQL.
I'd like to share my authoring experience e and knowledge about an open-source database product MySQL. I also want to touch on this journey's technical and non-technical aspects giving vision and inspiration to future authors. At the end of the talk, I will give away one printed copy of the MySQL Cookbook 4e to an audience after a trivia question.
최근 국내에도 글로벌 서비스나 급성장하는 웹 서비스를 쉽게 볼 수 있습니다. 초기에 RDBMS로 시작된 서비스들은 규모가 성장함에 따라 샤딩과 NoSQL의 선택의 기로에 서게 됩니다. Amazon DynamoDB는 모든 스케일에서 사용할 수 있는 완전 관리형 Key-Value NoSQL 데이터베이스이지만 여전히 Key Design은 가장 어려운 영역 중 하나입니다. 이 세션에서는 대규모 서비스의 키 디자인 방법을 알아봅니다.
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
Apache Doris (incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis. It is open-sourced by Baidu. Doris mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris not only provides batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris.
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
Session Video: https://youtu.be/7MPH1mknIxE
In this talk, we share Devsisters' journey of migrating its internal data platform including Spark to Kubernetes, with its benefits and issues.
데브시스터즈에서 데이터플랫폼 컴포넌트를 쿠버네티스로 옮기면서 얻은 장점들과 이슈들에 대해 공유합니다.
Conference session page:
- English: https://sched.co/WIRK
- Korean: https://sched.co/WYRc
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. The heightened ease-of-use of AI/ML has lead to a surge of interested of storing vector data alongside application data, leading to some unique challenges. PostgreSQL has seen this story before with JSON, when JSON became the lingua franca of the web. So how can you use PostgreSQL to manage your vector data, and what challenges should you be aware of?
In this session, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector, an extension that adds additional vector search capabilities to PostgreSQL. Finally, we'll review ongoing development in both PostgreSQL and pgvector that will make it easier and more performant to search vector data in PostgreSQL.
Revolutionizing API Development: Collaborative Workflows with PostmanPostman
There are many pitfalls of siloed API development processes, such as subpar APIs, delayed releases, and duplicated efforts. Join us to explore how Postman's collaborative workflows address these challenges head-on. We will look at how workspaces and collections allow API teams to work together effectively while also accelerating the onboarding process for new consumers of your API. The seamless integration with Amazon API Gateway further streamlines the process, fostering high-quality API development and expediting release cycles.
by Mahesh Pakal, AWS
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
OpenSearch는 배포형 오픈 소스 검색과 분석 제품군으로 실시간 애플리케이션 모니터링, 로그 분석 및 웹 사이트 검색과 같이 다양한 사용 사례에 사용됩니다. OpenSearch는 데이터 탐색을 쉽게 도와주는 통합 시각화 도구 OpenSearch와 함께 뛰어난 확장성을 지닌 시스템을 제공하여 대량 데이터 볼륨에 빠르게 액세스 및 응답합니다. 이 세션에서는 실제 동작 구조에 대한 설명을 바탕으로 최적화를 하기 위한 방법과 운영상에 발생할 수 있는 이슈에 대해서 알아봅니다.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
Future database administration will be highly automated. Until then, we still live in a world where extensive manual interactions are required from a skilled DBA. This will change soon as more "autonomous databases" reach maturity and enter the production environment.
Postgres-specific monitoring tools and systems continue to improve, detecting and analyzing performance issues and bottlenecks in production databases. However, while these tools can detect current issues, they require highly-experienced DBAs to analyze and recommend mitigations.
In this session, the speaker will present the initial results of the POSTGRES.AI project – Nancy CLI, a unified way to manage automated database experiments. Nancy CLI is an automated database management framework based on well-known open-source projects and incorporating major open-source tools and Postgres modules: pgBadger, pg_stat_kcache, auto_explain, pgreplay, and others.
Originally developed with the goal to simulate various SQL query use cases in various environments and collect data to train ML models, Nancy CLI turned out to be very a universal framework that can play a crucial role in CI/CD pipelines in any company.
Using Nancy CLI, casual DBAs and any engineers can easily conduct automated experiments today, either on AWS EC2 Spot instances or on any other servers. All you need is to tell Nancy which database to use, specify workload (synthetic or "real", generated based on the Postgres logs), and what you want to test – say, check how a new index will affect all most expensive query groups from pg_stat_statements, or compare various values of "default_statistics_target". All the collected information with a very high level of confidence will give you understanding, how various queries and overall Postgres performance will be affected when you apply this change to production.
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study
이 세션에서는 데브시스터즈의 Case Study를 통하여 Data Lake를 만들고 사용하는데 있어 요구 되는 사항들에 대해 공유합니다. 여러 목적에 맞는 데이터를 전달하기 위해 AWS 를 활용하여 Data Lake 를 구축하게된 계기와 실제 구축 작업을 하면서 경험하게 된 것들에 대해 말씀드리고자 합니다. 기존 인프라 구조 대비 효율성 및 비용적 측면을 소개해드리고, 빅데이터를 이용한 부서별 데이터 세분화를 진행할 때 어떠한 Architecture가 사용되었는지 소개드리고자 합니다.
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!
This one is about advanced indexing in PostgreSQL. It guides you through basic concepts as well as through advanced techniques to speed up the database.
All important PostgreSQL Index types explained: btree, gin, gist, sp-gist and hashes.
Regular expression indexes and LIKE queries are also covered.
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...Amazon Web Services Korea
발표영상 다시보기: https://youtu.be/hPvBst9TPlI
S3 기반의 데이터레이크에서 대량의 데이터 변환과 처리에 사용될 수 있는 가장 대표적인 솔루션이 Apache Spark 입니다. EMR 플랫폼 환경에서 쉽게 적용 가능한 Apache Spark의 성능 향상 팁을 소개합니다. 또한 데이터의 레코드 레벨 업데이트, 리소스 확장, 권한 관리 및 모니터링과 같은 다양한 데이터 워크로드 관리 최적화 방안을 함께 살펴봅니다.
AWS에서는 Big Data 분석 및 처리를 위해 다양한 Analytics 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터 분석 및 처리를 위해 데이터 레이크 카탈로그를 구축하거나 ETL을 위해 사용되는 AWS Glue 내부 구조를 살펴보고 효율적으로 사용할 수 있는 방법들을 소개합니다.
FOSSASIA - MySQL Cookbook 4e Journey APR 2023.pdfAlkin Tezuysal
Presenting the newly released MySQL Cookbook 4th edition to help developers and administrators to understand simple to complex recipes.
MySQL Cookbook 4th edition was released this summer. We are the book's authors and will show you how to "cook" MySQL. We will show you a few tasks with different priorities, such as JSON in MySQL for those who need flexibility, modern SQL for analytics, and Group Replication for high availability. We will also show how to write programs using JavaScript and Python languages, X DevAPI, and MySQL Shell. We will touch on some of the exciting features of MySQL Spatial Indexes and Geographical Data, Using a Full-Text Search, and more. We're hoping this talk will interest developers and administrators of MySQL.
I'd like to share my authoring experience e and knowledge about an open-source database product MySQL. I also want to touch on this journey's technical and non-technical aspects giving vision and inspiration to future authors. At the end of the talk, I will give away one printed copy of the MySQL Cookbook 4e to an audience after a trivia question.
최근 국내에도 글로벌 서비스나 급성장하는 웹 서비스를 쉽게 볼 수 있습니다. 초기에 RDBMS로 시작된 서비스들은 규모가 성장함에 따라 샤딩과 NoSQL의 선택의 기로에 서게 됩니다. Amazon DynamoDB는 모든 스케일에서 사용할 수 있는 완전 관리형 Key-Value NoSQL 데이터베이스이지만 여전히 Key Design은 가장 어려운 영역 중 하나입니다. 이 세션에서는 대규모 서비스의 키 디자인 방법을 알아봅니다.
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
Apache Doris (incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis. It is open-sourced by Baidu. Doris mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris not only provides batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris.
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
Session Video: https://youtu.be/7MPH1mknIxE
In this talk, we share Devsisters' journey of migrating its internal data platform including Spark to Kubernetes, with its benefits and issues.
데브시스터즈에서 데이터플랫폼 컴포넌트를 쿠버네티스로 옮기면서 얻은 장점들과 이슈들에 대해 공유합니다.
Conference session page:
- English: https://sched.co/WIRK
- Korean: https://sched.co/WYRc
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. The heightened ease-of-use of AI/ML has lead to a surge of interested of storing vector data alongside application data, leading to some unique challenges. PostgreSQL has seen this story before with JSON, when JSON became the lingua franca of the web. So how can you use PostgreSQL to manage your vector data, and what challenges should you be aware of?
In this session, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector, an extension that adds additional vector search capabilities to PostgreSQL. Finally, we'll review ongoing development in both PostgreSQL and pgvector that will make it easier and more performant to search vector data in PostgreSQL.
Revolutionizing API Development: Collaborative Workflows with PostmanPostman
There are many pitfalls of siloed API development processes, such as subpar APIs, delayed releases, and duplicated efforts. Join us to explore how Postman's collaborative workflows address these challenges head-on. We will look at how workspaces and collections allow API teams to work together effectively while also accelerating the onboarding process for new consumers of your API. The seamless integration with Amazon API Gateway further streamlines the process, fostering high-quality API development and expediting release cycles.
[Keynote] Data Driven Organizations with AWS Data - 발표자: Agnes Panosian, Head...Amazon Web Services Korea
데이터는 모든 애플리케이션, 프로세스 및 비즈니스 의사 결정의 중심에 있습니다. 데이터는 거의 모든 조직의 디지털 트랜스포메이션의 초석입니다. 데이터는 새로운 경험을 촉진하고 혁신을 이끌어내는 통찰력으로 이어집니다. 전체 조직을 위한 데이터의 가치를 실현하는 전략을 구축하는 것은 쉽고 간단한 여정이 아닙니다. 이 세션에서는 데이터 기반 조직화를 위한 모범 사례와 그 여정에서 AWS가 어떻게 도움을 드릴 수 있는지를 다룹니다.
Re:cap día 1 del Aws Re:Invent 2023 - AWS UG ChileAlvaro Garcia
Host y orador de la primer sesion Re:Cap del AWS Re:Invent 2023 - 16 de Enero 2024
Separamos los temas en dos días a continuación estaré subiendo la siguiente slide.
This panel will discuss how Generative Artificial Intelligence can contribute to strengthen the credibility of public policies.
not only for governments, but also for Fiscal Councils.
AWS Lambda Powertools is a developer toolkit to implement Serverless best practices and increase developer velocity. It started as an open-source project in 2020 focused in making Tracing, Logging, and Metrics easier. Fast-forward, Powertools added 13 more features, grew a vibrant community who regularly contributes up to 60% of our releases, now covering a plethora of use cases: REST and GraphQL APIs, Batch processing, Idempotency, Feature Flags, Data Validation, and more.
You’ll learn why this developer toolkit was created, key use cases, and find out how you can adopt common industry and AWS best practices in seconds. We’ll also cover two of the most anticipated new features coming in 2023, and live demo(s).
Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Conflue...HostedbyConfluent
"In this talk, we will explore how Confluent and Amazon Web Services (AWS) work together to help you in the journey of data modernization and innovation.
We guide you through the migration journey to Confluent Cloud on AWS, delving into advanced features and capabilities for streamlined migration and business continuity. Gain insights from customer success stories, learn cloud modernization strategies, patterns, and best practices, and AWS resources to kickstart your initiatives.
Explore modern app development on Confluent Cloud on AWS, alongside strategic ISV partners like MongoDB, and unlock the full potential of real-time streaming."
[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...Amazon Web Services Korea
데이터는 최종 소비자의 성공에 초점을 맞춘 디지털 혁신에서 중추적인 역할을 하고 있습니다. 모든 기업들은 데이터를 자산으로 사용하여 사례 제공을 추진하고 까다로운 결과를 해결하고 있습니다. AWS 클라우드 기술과 분석 솔루션의 강력한 성능을 통해 고객은 혁신 여정을 가속화할 수 있습니다. 이 세션에서는 기업 고객들이 클라우드에서 데이터의 힘을 활용하여 혁신 목표를 달성하고 필요한 결과를 제공하는 방법에 대해 다룹니다.
커머스 스타트업의 효율적인 데이터 분석 플랫폼 구축기 - 하지양 데이터 엔지니어, 발란 / 강웅석 데이터 엔지니어, 크로키닷컴 :: AWS...Amazon Web Services Korea
스타트업에서 빠르게 분석 서비스를 구성하기 위한 AWS 분석 서비스를 활용하고 있습니다. 본 세션에서는 커머스 서비스의 대용량 데이터를 Amazon Kinesis Firehose를 이용하여 실시간으로 사내에 흐르는 중요 데이터를 캡쳐하여 다양한 용도로 사용하는 방법을 알아봅니다. 매달 수백억 건의 사용자 행동 로그를 안정적이고 견고하게 수집하여 인하우스 데이터 분석 방법을 소개합니다. 또한, Amazon Personalize를 통한 개인화 추천 및 Amazon SageMaker를 이용한 이미지분류 등 기계 학습 활용 사례도 공유합니다.
Unveiling the Inner Workings of Apache Kafka® with Flamegraphs with Christo L...HostedbyConfluent
Apache Kafka® is known for its scalability, reliability, and performance, but understanding how it works and identifying performance bottlenecks can be challenging. In this talk, we will explore the use of flamegraphs as a tool for understanding the internals of Apache Kafka and for identifying performance issues. Flamegraphs are a visualization technique that allows you to see the relative usage of CPU and memory by different functions in a program. They provide a clear and concise view of the call stack, making it easier to understand how different components of the system are interacting and where the hot spots are. We will cover the basics of how to generate and interpret flamegraphs, and we will walk through a number of real-world examples of using them to troubleshoot performance issues in Apache Kafka® . These examples will include cases where flamegraphs helped to identify issues such as CPU and memory contention, inefficient code paths, and suboptimal configuration choices. By the end of this talk, you will have a better understanding of how to use flamegraphs to understand the inner workings of Apache Kafka® and to identify performance bottlenecks. You will also have a set of practical tips and techniques for using flamegraphs to troubleshoot performance issues in your own Kafka deployments.
Better Together: Delivering Graph Value with AWS & Neo4j - Antony Prasad The...Neo4j
Featurization is one of the most difficult problems in machine learning, just behind data wrangling in terms of the time it consumes. For many problems, featurization plays the largest role in determining model performance, greater even than choice of machine learning method. We’ll walk through how graph features engineered in Neo4j can be used in a supervised learning model trained with Amazon SageMaker. These novel graph features can improve model performance beyond what is possible with more traditional approaches.
AI, or artificial intelligence, is powering a massive shift in how engineers, scientists, and programmers develop and improve products and services. 85% of executives expect to gain or strengthen their competitive advantage through the use of AI, but is AI really poised to transform your research, products, or business?
Learn how AI system can be designed to perceive its environment, make decisions, and take action. Get an overview of AI for engineers, and discover the ways in which it fits into an engineering workflow. You will also learn how MATLAB and Simulink® are giving engineers and scientists AI capabilities that were once available only to highly-specialized software developers and Data Scientists.
sing observability effectively is essential for proving your resilient system operates the way you planned. Well applied observability helps you find early signs of problems, before they impact customers, and react quickly to mitigate impact. In this session, learn how you can use observability best practices to improve your resilience posture in AWS. Dive deep into real world failure modes, and see how you can use the right combination of instrumentation and observability tools to solve them quickly. This session includes a demo of these techniques and practices using AWS services like Amazon CloudWatch and AWS X Ray.
Better Together: Delivering Graph Value with AWS & Neo4j - Antony Prasad Thev...Neo4j
Featurization is one of the most difficult problems in machine learning, just behind data wrangling in terms of the time it consumes. For many problems, featurization plays the largest role in determining model performance, greater even than choice of machine learning method. We’ll walk through how graph features engineered in Neo4j can be used in a supervised learning model trained with Amazon SageMaker. These novel graph features can improve model performance beyond what is possible with more traditional approaches.
Build Developer Experience Teams for Open SourceAll Things Open
Presented at All Things Open 2023
Presented by Arundeep Nagaraj - Amazon Web Services (AWS)
Title: Build Developer Experience Teams for Open Source
Abstract: Open Source has become the default strategy for many IT organizations and Enterprises. However, the constant challenge with Open Source leaders of these organizations has been -
How is my product's developer experience?
Is this the right metric to track?
How can I scale my team to support our products better?
How can I add automation to scale redundant workflows?
If my product involves working with developers, how can I scale to the complexity of the requests and reduce Engineering bandwidth?
The challenges within support of open source products continues to magnify depending on the end user persona whether they are consumers or contributors to your product. Consumers utilize your product, SDK's and API's and are blocked with using it or run into issues, whereas contributors are advanced users of your software that understands the codebase to provide a meaningful contribution back to the product.
The answer to the above is to look at Open Source support as a first-class citizen of your corporate support strategy. To employ the right level of developer focused support as opposed to traditional infrastructure based support is key to scale to the amount of developers using your product. Supporting customers in the open involves more than pure support - building customer / developer experiences (DX) in the open (across platforms and communities) that pivots over the ability of your product's users or developers to be focused on the end-to-end value add. This helps with your active developer growth and retention of users.
Key Takeaways:
- IT leaders of Open Source will learn to employ strategies to build a DX team that engages on multiple platforms
- Work on identifying accurate metrics for product and organization
- Innovate on platforms such as Discord to build a bot and a dashboard
- Ability to leverage customer feedback and iterate over the customer success flywheel
- Distinguish between DX and Developer Advocacy (DA)
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
From Insights to Action, How to build and maintain a Data Driven Organization...Amazon Web Services Korea
데이터는 혁신과 변혁의 토대입니다. 비즈니스 혁신을 이끄는 혁신은 특정 시점의 전략이나 솔루션이 아니라 성장을 위한 반복적이고 집단적인 계획입니다. 혁신에 이러한 접근 방식을 채택하는 기업은 전략과 비즈니스 문화에서 데이터를 기반으로 하는 경우가 많습니다. 이러한 접근 방식을 개발하려면 리더가 데이터를 조직의 자산처럼 취급하고 조직이 더 나은 비즈니스 성과를 위해 데이터를 활용할 수 있도록 권한을 부여해야 합니다. AWS와 Amazon이 어떻게 데이터와 분석을 활용하여 확장 가능한 비즈니스 효율성을 창출하고 고객의 가장 복잡한 문제를 해결하는 메커니즘을 개발했는지 알아보십시오.
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017Amazon Web Services
Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models, at scale. This session will introduce you the features of Amazon SageMaker, including a one-click training environment, highly-optimized machine learning algorithms with built-in model tuning, and deployment without engineering effort. With zero-setup required, Amazon SageMaker significantly decreases your training time and overall cost of building production machine learning systems. You'll also hear how and why Intuit is using Amazon SaeMaker on AWS for real-time fraud detection.
Similar to Vectors are the new JSON in PostgreSQL (20)
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
Congratulations: you've been selected to build an application that will manage reservations for rooms!
On the surface, this sounds simple, but you are building a system for managing a high traffic reservation web page, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the website checking to see what availability each room has.
Fortunately, PostgreSQL is prepared for this! And even better, we will be using Postgres 14 to make the problem even easier!
We will explore the following PostgreSQL features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges / Multirnages
Indexes such as:
* GiST
* Common Table Expressions and Recursion (though multiranges will make things easier!)
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all users made possible by the innovation of PostgreSQL!
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
Passwords: they just seem to work. You connect to your PostgreSQL database and you are prompted for your password. You type in the correct character combination, and presto! you're in, safe and sound.
But what if I told you that all was not as it seemed. What if I told you there was a better, safer way to use passwords with PostgreSQL? What if I told you it was imperative that you upgraded, too?
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
In this talk, we will look at:
* A history of the evolution of password storage and authentication in PostgreSQL
* How SCRAM works with a step-by-step deep dive into the algorithm (and convince you why you need to upgrade!)
* SCRAM channel binding, which helps prevent MITM attacks during authentication
* How to safely set and modify your passwords, as well as how to upgrade to SCRAM-SHA-256 (which we will do live!)
all of which will be explained by some adorable elephants and hippos!
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Operating PostgreSQL at Scale with KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
All this sounds great, but if you are new to the world of containers, it can be very overwhelming to find a place to start. In this talk, which centers around demos, we will see how you can get PostgreSQL up and running in a containerized environment with some advanced sidecars in only a few steps! We will also see how it extends to a larger production environment with Kubernetes, and what the future holds for PostgreSQL in a containerized world.
We will cover the following:
* Why containers are important and what they mean for PostgreSQL
* Create a development environment with PostgreSQL, pgadmin4, monitoring, and more
* How to use Kubernetes to create your own "database-as-a-service"-like PostgreSQL environment
* Trends in the container world and how it will affect PostgreSQL
At the conclusion of the talk, you will understand the fundamentals of how to use container technologies with PostgreSQL and be on your way to running a containerized PostgreSQL environment at scale!
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
Congratulations: you've been selected to build an application that will manage whether or not the rooms for PGConf.EU are being occupied by a session!
On the surface, this sounds simple, but we will be managing the rooms of PGConf.EU, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the PGConf.EU website checking to see what availability each of the PGConf.EU rooms has.
To do this, we will explore the following PGConf.EU features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges
Indexes such as:
* GiST
* SP-Gist
* Common Table Expressions and Recursion
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all PGConf.EU attendees made possible by the innovation of PGConf.EU!
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL along with pgadmin4 and monitoring
- Running PostgreSQL on Kubernetes with a Demo
- Trends in the container world and how it will affect PostgreSQL
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL container
- Extending your setup with a pgadmin4 container
- Container orchestration: What this means, and how to use Kubernetes to leverage database-as-a-service with PostgreSQL
- Trends in the container world and how it will affect PostgreSQL
Developing and Deploying Apps with the Postgres FDWJonathan Katz
I couldn't wait to use the Postgres Foreign Data Wrapper (postgres_fdw) in a project; imagine being able to read and write data to many databases all from a single database! I finally found a project where it made sense to use this amazing technology.
I mapped out my architecture and began to code, and realized there were some things that did not work as expected: I could not call remote functions or insert into a table with a serial primary key and have it autoupdate. I found workarounds (which I will share), so the project went on.
We tested the setup, everything seemed to work well, and then we went to deploy to production. And then the real fun began.
Despite the title, I still love the Postgres FDW but wanted to provide some cautionary tales from a hybrid developer/DBA perspective on how to properly use them in your working environment. This talk will cover:
* Basic Postgres FDW setup in a development environment vs. production environment
* Handling some common FDW uses case that you think are trivial but are not
* Working with advanced Postgres constructs such as schemas and sequences with FDWs
* Putting it all together to make sure your production application is safe with your FDWs
* ...and when you really, really need to make a remote call and it is not supported by a FDW, how to do that too!
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
All data is relational and can be represented through relational algebra, right? Perhaps, but there are other ways to represent data, and the PostgreSQL team continues to work on making it easier and more efficient to do so!
With the upcoming 9.4 release, PostgreSQL is introducing the "JSONB" data type which allows for fast, compressed, storage of JSON formatted data, and for quick retrieval. And JSONB comes with all the benefits of PostgreSQL, like its data durability, MVCC, and of course, access to all the other data types and features in PostgreSQL.
How fast is JSONB? How do we access data stored with this type? What can it do with the rest of PostgreSQL? What can't it do? How can we leverage this new data type and make PostgreSQL scale horizontally? Follow along with our presentation as we try to answer these questions.
PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.