How to navigate the rich but confusing field of (Full) Text Search in PostgreSQL. A short introduction will explain the concepts involved, followed by a discussion of functions, operators, indexes and collation support in Postgres in relevance to searching for text. Examples of usage will be provided, along with some stats demonstrating the differences.
This document provides an introduction and overview of Cassandra including:
- Cassandra's history as a NoSQL database created at Facebook and open sourced in 2008
- Key features of Cassandra including linear scalability, continuous availability, support for multiple data centers, operational simplicity, and analytics capabilities
- Details on Cassandra's architecture including its cluster layer based on Amazon Dynamo and data store layer based on Google BigTable
- Explanations of Cassandra's data distribution, token ranges, replication, coordinator nodes, tunable consistency levels, and write path
- Descriptions of Cassandra's data model including last write win and examples of CRUD operations and table schemas
Full text search in PostgreSQL is a flexible and powerful facility to search collection of documents using natural language queries. We will discuss several new improvements of FTS in PostgreSQL 9.6 release, such as phrase search, better dictionaries support and tsvector editing functions. Also, we will present new features currently in development - RUM index support, which enables acceleration of some important kinds of full text queries, new and better ranking function for relevance search, loading dictionaries into shared memory and support for search multilingual content.
This document summarizes full text search capabilities in PostgreSQL. It begins with an introduction and overview of common full text search solutions. It then discusses reasons to use full text search in PostgreSQL, including consistency and no need for additional software. The document covers basics of full text search in PostgreSQL like to_tsvector, to_tsquery, and indexes. It also covers fuzzy full text search using pg_trgm and functions like similarity. Other topics mentioned include ts_headline, ts_rank, and the RUM extension.
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
(talk delivered at OSA CON 23)
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed.
We will learn how it deals with data ingestion, and which SQL extensions it implements for working with time-series efficiently.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or data deduplication.
Pig Latin is a data flow language and execution framework for parallel computation. It allows users to express data analysis programs intuitively as a series of steps. Pig runs these steps on Hadoop for scalable processing. Key features include a simple declarative language, support for nested data types, user defined functions, and a debugging environment. The document provides an overview of Pig Latin concepts like loading and transforming data, filtering, joining, and outputting results. It also compares Pig Latin to MapReduce and SQL, highlighting Pig's advantages for iterative data analysis tasks on large datasets.
This document provides an overview of Pig Latin, a data flow language used for analyzing large datasets. Pig Latin scripts are compiled into MapReduce programs that can run on Hadoop. The key points covered include:
- Pig Latin allows expressing data transformations like filtering, joining, grouping in a declarative way similar to SQL. This is compiled into MapReduce jobs.
- It features a rich data model including tuples, bags and nested data to represent complex data structures from files.
- User defined functions (UDFs) allow custom processing like extracting terms from documents or checking for spam.
- The language provides commands like LOAD, FOREACH, FILTER, JOIN to load, transform and analyze data in parallel across
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
Thomas Kyte discusses effective techniques for writing PL/SQL code. Some key points:
1) Use PL/SQL for data manipulation as it is tightly coupled with SQL and most efficient.
2) Write as little code as possible by leveraging SQL and thinking in sets rather than loops.
3) Use static SQL where possible for compile-time checking and dependency tracking. Dynamic SQL should only be used when static SQL is impractical.
4) Leverage packages to reduce dependencies, increase modularity, and support overloading and encapsulation.
5) Employ bulk processing techniques like bulk collects to minimize round trips to the database.
The document summarizes the D programming language compiler, including its organization with one front end and three back ends, major changes including converting the front end to D and using Dwarf exception handling. It describes the source code organization, types of compiles, memory allocation, strings, arrays, parsing, semantic analysis, lowering, constant folding, templates, inlining, and challenges in improving encapsulation, reducing complexity and memory usage.
This document provides an introduction and overview of Cassandra including:
- Cassandra's history as a NoSQL database created at Facebook and open sourced in 2008
- Key features of Cassandra including linear scalability, continuous availability, support for multiple data centers, operational simplicity, and analytics capabilities
- Details on Cassandra's architecture including its cluster layer based on Amazon Dynamo and data store layer based on Google BigTable
- Explanations of Cassandra's data distribution, token ranges, replication, coordinator nodes, tunable consistency levels, and write path
- Descriptions of Cassandra's data model including last write win and examples of CRUD operations and table schemas
Full text search in PostgreSQL is a flexible and powerful facility to search collection of documents using natural language queries. We will discuss several new improvements of FTS in PostgreSQL 9.6 release, such as phrase search, better dictionaries support and tsvector editing functions. Also, we will present new features currently in development - RUM index support, which enables acceleration of some important kinds of full text queries, new and better ranking function for relevance search, loading dictionaries into shared memory and support for search multilingual content.
This document summarizes full text search capabilities in PostgreSQL. It begins with an introduction and overview of common full text search solutions. It then discusses reasons to use full text search in PostgreSQL, including consistency and no need for additional software. The document covers basics of full text search in PostgreSQL like to_tsvector, to_tsquery, and indexes. It also covers fuzzy full text search using pg_trgm and functions like similarity. Other topics mentioned include ts_headline, ts_rank, and the RUM extension.
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
(talk delivered at OSA CON 23)
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed.
We will learn how it deals with data ingestion, and which SQL extensions it implements for working with time-series efficiently.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or data deduplication.
Pig Latin is a data flow language and execution framework for parallel computation. It allows users to express data analysis programs intuitively as a series of steps. Pig runs these steps on Hadoop for scalable processing. Key features include a simple declarative language, support for nested data types, user defined functions, and a debugging environment. The document provides an overview of Pig Latin concepts like loading and transforming data, filtering, joining, and outputting results. It also compares Pig Latin to MapReduce and SQL, highlighting Pig's advantages for iterative data analysis tasks on large datasets.
This document provides an overview of Pig Latin, a data flow language used for analyzing large datasets. Pig Latin scripts are compiled into MapReduce programs that can run on Hadoop. The key points covered include:
- Pig Latin allows expressing data transformations like filtering, joining, grouping in a declarative way similar to SQL. This is compiled into MapReduce jobs.
- It features a rich data model including tuples, bags and nested data to represent complex data structures from files.
- User defined functions (UDFs) allow custom processing like extracting terms from documents or checking for spam.
- The language provides commands like LOAD, FOREACH, FILTER, JOIN to load, transform and analyze data in parallel across
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
Thomas Kyte discusses effective techniques for writing PL/SQL code. Some key points:
1) Use PL/SQL for data manipulation as it is tightly coupled with SQL and most efficient.
2) Write as little code as possible by leveraging SQL and thinking in sets rather than loops.
3) Use static SQL where possible for compile-time checking and dependency tracking. Dynamic SQL should only be used when static SQL is impractical.
4) Leverage packages to reduce dependencies, increase modularity, and support overloading and encapsulation.
5) Employ bulk processing techniques like bulk collects to minimize round trips to the database.
The document summarizes the D programming language compiler, including its organization with one front end and three back ends, major changes including converting the front end to D and using Dwarf exception handling. It describes the source code organization, types of compiles, memory allocation, strings, arrays, parsing, semantic analysis, lowering, constant folding, templates, inlining, and challenges in improving encapsulation, reducing complexity and memory usage.
The document discusses best practices for writing PL/SQL code, including writing as little code as possible by favoring set-based operations over procedural loops, using packages to organize code and reduce dependencies, employing static SQL for improved performance and maintainability, and using bulk processing to reduce round trips to the database.
The document provides an overview of Elasticsearch including that it is easy to install, horizontally scalable, and highly available. It discusses Elasticsearch's core search capabilities using Lucene and how data can be stored and retrieved. The document also covers Elasticsearch's distributed nature, plugins, scripts, custom analyzers, and other features like aggregations, filtering and sorting.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The document is a presentation about new features in PostgreSQL 9.6. It discusses several major new features including parallel queries, avoiding VACUUM on all-frozen pages using freeze maps, monitoring the progress of VACUUM, phrase full text search, multiple synchronous replication, remote_apply synchronous commit, and improved capabilities of the postgres_fdw extension including pushing down sorts, joins, updates and deletes to remote servers.
This document provides a syllabus for Class 12 Computer Science. It outlines the following units:
1) Computational Thinking and Programming - Functions, file handling including text, binary, and CSV files. Data structures including lists, stacks, and queues.
2) Computer Networks - Network concepts, devices, topologies, protocols, mobile technologies, security concepts, and web services.
3) Database Management - Relational data model concepts, SQL commands for data definition, manipulation, and queries. Connecting SQL with Python and creating database applications.
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Ontico
Я расскажу про новые возможности полнотекстового поиска, которые вошли в последний релиз PostgreSQL - поддержку фразового поиска и набор функций для манипулирования полнотекстовым типом данных (tsvector). Помимо этого, мы улучшили поддержку морфологических словарей, что привело к значительному увеличению числа поддерживаемых языков, оптимизировали работу со словарями, разработали новый индексный метод доступа RUM, который значительно ускорил выполнение ряда запросов с полнотекстовыми операторами.
This document discusses user-defined functions (UDFs) in MySQL. It describes three types of functions - built-in, SQL stored, and user-defined. It focuses on user-defined functions and discusses the C/C++ implementation including initialization functions, row functions, data types, and data structures like UDF_INIT and UDF_ARGS that are passed between the function and MySQL. It provides examples of writing UDFs and recommends resources for learning more.
SQL is a language used to interface with relational database systems. It was developed by IBM in the 1970s and is now an industry standard. SQL has three main sublanguages: DDL for defining database schemas, DML for manipulating data, and DCL for controlling access.
Some key points about SQL include:
- DDL commands like CREATE, ALTER, and DROP are used to define and modify database structures.
- DML commands like SELECT, INSERT, UPDATE, and DELETE are used to query and manipulate the data.
- DCL commands like COMMIT, ROLLBACK, GRANT and REVOKE control transactions and user privileges.
- SQL can be used
This document provides an overview of SQL analytic queries and tips and tricks, mostly related to PostgreSQL. It begins with an introduction on the topics to be covered, including SQL basics, advanced topics, and a conclusion. It then shares some lesser known facts about SQL, including that it is standardized, turing complete, and the only successful 4th generation programming language. The document reviews the revision history of SQL standards from 1986 to the present. It provides examples of common table expressions, temporary tables, unnesting and aggregation, subqueries, and lateral joins in SQL.
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
This document discusses the challenges of working with timestamped data in databases and introduces QuestDB as a time-series database designed to address these challenges. It highlights QuestDB's high performance for ingesting and querying large volumes of timestamped data. It also demonstrates several time-series focused query patterns in QuestDB like time range queries, sampling, filling missing data, retrieving the latest value, and approximate joins between tables. Finally, it outlines some areas QuestDB is exploring to further improve performance.
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
This document provides an overview of Apache Cassandra including:
- What Cassandra is and how it differs from an RDBMS by not supporting joins, having an optional schema, and being transactionless.
- Cassandra's data model using keyspaces, column families, and static vs dynamic column families.
- How to integrate Cassandra with Java applications using the Hector client and ColumnFamilyTemplate for querying, updating, and deleting data.
- Additional topics covered include the CAP theorem, data storage and compaction, and using CQL via JDBC.
Search is an important part of informative web-sites, but there are many different possible solutions to implement such a search. This session evaluates possible options for the integration of a search engine into your web-site, ranging from simple solutions as MySQL's full text to using an external engine to power search
PostGIS is a spatial database extender for PostgreSQL that allows geographic queries. OS MasterMap is a UK database with 450 million mapped features. Updates are distributed as Change Only Updates (CoU) containing 6 million changed features rather than full datasets. AstunLoader loads CoU into PostGIS. PostgreSQL audit triggers and hstore record changes to tables, enabling creation of snapshots to view past data states. Snapshots combine audit data with current data excluding changed records.
This presentation by Bruce Momjian. Co-Founder of the Global PostgreSQL Development team and a Senior Architect at EDB. He demonstrates how to use arrays, geometry and JSON for NoSQL data types to overcome restrictions of relational storage to support new innovative applications, specifically by storing and indexing multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage.
Postgres has always had strong support for relational storage. However, there are some cases where relational storage might be inefficient or overly restrictive.
This document provides an introduction and overview of Apache Spark. It discusses why in-memory computing is important for speed, compares Spark and Ignite, describes what Spark is and how it works using Resilient Distributed Datasets (RDDs) and a directed acyclic graph (DAG) model. It also provides examples of Spark operations on RDDs and shows a word count example in Java, Scala and Python.
Spark with Elasticsearch - umd version 2014Holden Karau
Holden Karau gave a talk on using Apache Spark and Elasticsearch. The talk covered indexing data from Spark to Elasticsearch both online using Spark Streaming and offline. It showed how to customize the Elasticsearch connector to write indexed data directly to shards based on partitions to reduce network overhead. It also demonstrated querying Elasticsearch from Spark, extracting top tags from tweets, and reindexing data from Twitter to Elasticsearch.
The document discusses several new features and enhancements in Oracle Database 11g Release 1. Key points include:
1) Encrypted tablespaces allow full encryption of data while maintaining functionality like indexing and foreign keys.
2) New caching capabilities improve performance by caching more results and metadata to avoid repeat work.
3) Standby databases have been enhanced and can now be used for more active purposes like development, testing, reporting and backups while still providing zero data loss protection.
The document discusses new features in Oracle Database 11g Release 1. Key points include:
1. Encrypted tablespaces allow encryption of data at the tablespace level while still supporting indexing and queries.
2. New caching capabilities improve performance by caching more results in memory, such as function results and query results.
3. Standby databases have enhanced capabilities and can now be used for more active purposes like development, testing and reporting for increased usability and value.
Based on the legendary "Don't Do This" PostgreSQL wiki page, this talk explores some of the common pitfalls and misconceptions that Postgres users can face - and shows possible ways to undo them or workarounds.
Some of the things discussed:
- Bad SQL habits
- Correct types for data storage
- (Sub-)Partitioning (and how to get it wrong)
- Table inheritance (and how to undo it)
- Connections (number of, and properly handling)
- Security issues (unsafe configurations and usage)
Talk given at FOSDEM 2023
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
Talk from FOSDEM 2022
It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes. However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect. Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should. In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature. Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.
Topics:
1. Understand what we mean by high concurrency.
2. Understand ACID & MVCC in Postgres.
3. Understand how high concurrency affects Postgres performance.
4. Understand how locks/latches affect Postgres performance.
5. Understand how high transaction rates can affect Postgres.
6. Mitigation strategies for high concurrency scenarios.
Contenu connexe
Similaire à The State of (Full) Text Search in PostgreSQL 12
The document discusses best practices for writing PL/SQL code, including writing as little code as possible by favoring set-based operations over procedural loops, using packages to organize code and reduce dependencies, employing static SQL for improved performance and maintainability, and using bulk processing to reduce round trips to the database.
The document provides an overview of Elasticsearch including that it is easy to install, horizontally scalable, and highly available. It discusses Elasticsearch's core search capabilities using Lucene and how data can be stored and retrieved. The document also covers Elasticsearch's distributed nature, plugins, scripts, custom analyzers, and other features like aggregations, filtering and sorting.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The document is a presentation about new features in PostgreSQL 9.6. It discusses several major new features including parallel queries, avoiding VACUUM on all-frozen pages using freeze maps, monitoring the progress of VACUUM, phrase full text search, multiple synchronous replication, remote_apply synchronous commit, and improved capabilities of the postgres_fdw extension including pushing down sorts, joins, updates and deletes to remote servers.
This document provides a syllabus for Class 12 Computer Science. It outlines the following units:
1) Computational Thinking and Programming - Functions, file handling including text, binary, and CSV files. Data structures including lists, stacks, and queues.
2) Computer Networks - Network concepts, devices, topologies, protocols, mobile technologies, security concepts, and web services.
3) Database Management - Relational data model concepts, SQL commands for data definition, manipulation, and queries. Connecting SQL with Python and creating database applications.
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Ontico
Я расскажу про новые возможности полнотекстового поиска, которые вошли в последний релиз PostgreSQL - поддержку фразового поиска и набор функций для манипулирования полнотекстовым типом данных (tsvector). Помимо этого, мы улучшили поддержку морфологических словарей, что привело к значительному увеличению числа поддерживаемых языков, оптимизировали работу со словарями, разработали новый индексный метод доступа RUM, который значительно ускорил выполнение ряда запросов с полнотекстовыми операторами.
This document discusses user-defined functions (UDFs) in MySQL. It describes three types of functions - built-in, SQL stored, and user-defined. It focuses on user-defined functions and discusses the C/C++ implementation including initialization functions, row functions, data types, and data structures like UDF_INIT and UDF_ARGS that are passed between the function and MySQL. It provides examples of writing UDFs and recommends resources for learning more.
SQL is a language used to interface with relational database systems. It was developed by IBM in the 1970s and is now an industry standard. SQL has three main sublanguages: DDL for defining database schemas, DML for manipulating data, and DCL for controlling access.
Some key points about SQL include:
- DDL commands like CREATE, ALTER, and DROP are used to define and modify database structures.
- DML commands like SELECT, INSERT, UPDATE, and DELETE are used to query and manipulate the data.
- DCL commands like COMMIT, ROLLBACK, GRANT and REVOKE control transactions and user privileges.
- SQL can be used
This document provides an overview of SQL analytic queries and tips and tricks, mostly related to PostgreSQL. It begins with an introduction on the topics to be covered, including SQL basics, advanced topics, and a conclusion. It then shares some lesser known facts about SQL, including that it is standardized, turing complete, and the only successful 4th generation programming language. The document reviews the revision history of SQL standards from 1986 to the present. It provides examples of common table expressions, temporary tables, unnesting and aggregation, subqueries, and lateral joins in SQL.
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
This document discusses the challenges of working with timestamped data in databases and introduces QuestDB as a time-series database designed to address these challenges. It highlights QuestDB's high performance for ingesting and querying large volumes of timestamped data. It also demonstrates several time-series focused query patterns in QuestDB like time range queries, sampling, filling missing data, retrieving the latest value, and approximate joins between tables. Finally, it outlines some areas QuestDB is exploring to further improve performance.
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
This document provides an overview of Apache Cassandra including:
- What Cassandra is and how it differs from an RDBMS by not supporting joins, having an optional schema, and being transactionless.
- Cassandra's data model using keyspaces, column families, and static vs dynamic column families.
- How to integrate Cassandra with Java applications using the Hector client and ColumnFamilyTemplate for querying, updating, and deleting data.
- Additional topics covered include the CAP theorem, data storage and compaction, and using CQL via JDBC.
Search is an important part of informative web-sites, but there are many different possible solutions to implement such a search. This session evaluates possible options for the integration of a search engine into your web-site, ranging from simple solutions as MySQL's full text to using an external engine to power search
PostGIS is a spatial database extender for PostgreSQL that allows geographic queries. OS MasterMap is a UK database with 450 million mapped features. Updates are distributed as Change Only Updates (CoU) containing 6 million changed features rather than full datasets. AstunLoader loads CoU into PostGIS. PostgreSQL audit triggers and hstore record changes to tables, enabling creation of snapshots to view past data states. Snapshots combine audit data with current data excluding changed records.
This presentation by Bruce Momjian. Co-Founder of the Global PostgreSQL Development team and a Senior Architect at EDB. He demonstrates how to use arrays, geometry and JSON for NoSQL data types to overcome restrictions of relational storage to support new innovative applications, specifically by storing and indexing multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage.
Postgres has always had strong support for relational storage. However, there are some cases where relational storage might be inefficient or overly restrictive.
This document provides an introduction and overview of Apache Spark. It discusses why in-memory computing is important for speed, compares Spark and Ignite, describes what Spark is and how it works using Resilient Distributed Datasets (RDDs) and a directed acyclic graph (DAG) model. It also provides examples of Spark operations on RDDs and shows a word count example in Java, Scala and Python.
Spark with Elasticsearch - umd version 2014Holden Karau
Holden Karau gave a talk on using Apache Spark and Elasticsearch. The talk covered indexing data from Spark to Elasticsearch both online using Spark Streaming and offline. It showed how to customize the Elasticsearch connector to write indexed data directly to shards based on partitions to reduce network overhead. It also demonstrated querying Elasticsearch from Spark, extracting top tags from tweets, and reindexing data from Twitter to Elasticsearch.
The document discusses several new features and enhancements in Oracle Database 11g Release 1. Key points include:
1) Encrypted tablespaces allow full encryption of data while maintaining functionality like indexing and foreign keys.
2) New caching capabilities improve performance by caching more results and metadata to avoid repeat work.
3) Standby databases have been enhanced and can now be used for more active purposes like development, testing, reporting and backups while still providing zero data loss protection.
The document discusses new features in Oracle Database 11g Release 1. Key points include:
1. Encrypted tablespaces allow encryption of data at the tablespace level while still supporting indexing and queries.
2. New caching capabilities improve performance by caching more results in memory, such as function results and query results.
3. Standby databases have enhanced capabilities and can now be used for more active purposes like development, testing and reporting for increased usability and value.
Similaire à The State of (Full) Text Search in PostgreSQL 12 (20)
Based on the legendary "Don't Do This" PostgreSQL wiki page, this talk explores some of the common pitfalls and misconceptions that Postgres users can face - and shows possible ways to undo them or workarounds.
Some of the things discussed:
- Bad SQL habits
- Correct types for data storage
- (Sub-)Partitioning (and how to get it wrong)
- Table inheritance (and how to undo it)
- Connections (number of, and properly handling)
- Security issues (unsafe configurations and usage)
Talk given at FOSDEM 2023
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
Talk from FOSDEM 2022
It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes. However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect. Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should. In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature. Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.
Topics:
1. Understand what we mean by high concurrency.
2. Understand ACID & MVCC in Postgres.
3. Understand how high concurrency affects Postgres performance.
4. Understand how locks/latches affect Postgres performance.
5. Understand how high transaction rates can affect Postgres.
6. Mitigation strategies for high concurrency scenarios.
Practical Partitioning in Production with PostgresJimmy Angelakos
Has your table become too large to handle? Have you thought about chopping it up into smaller pieces that are easier to query and maintain? What if it's in constant use?
An introduction to the problems that can arise and how PostgreSQL's partitioning features can help, followed by a real-world scenario of partitioning an existing huge table on a live system.
Talk from Postgres Vision 2021
Changing your huge table's data types in productionJimmy Angelakos
This document discusses changing data types for a large production table in PostgreSQL. It describes how to add a new column with the correct data type, copy over the values in batches using a trigger and procedure to avoid locking the table, and then drop the old column and rename the new column. The process takes over 7 hours to complete on a table with 1.7 billion rows but allows the table to remain online and available during the migration.
A look at some of the ways available to deploy Postgres in a Kubernetes cloud environment, either in small scale using simple configurations, or in larger scale using tools such as Helm charts and the Crunchy PostgreSQL Operator. A short introduction to Kubernetes will be given to explain the concepts involved, followed by examples from each deployment method and observations on the key differences.
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
Presentation of an investigation into how Python's RDFLib and SQLAlchemy can be used to leverage PostgreSQL's capabilities to provide a persistent storage back-end for Graphs, and become the elusive practical RDF triple store for the Semantic Web (or simply help you export your data to someone who's expecting RDF)!
Talk presented at FOSDEM 2017 in Brussels on 04-05/02/2017. Practical & hands-on presentation with example code which is certainly not optimal ;)
Video:
MP4: http://video.fosdem.org/2017/H.1309/postgresql_semantic_web.mp4
WebM/VP8: http://ftp.osuosl.org/pub/fosdem/2017/H.1309/postgresql_semantic_web.vp8.webm
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονJimmy Angelakos
Παρουσίασή μου για τη βάση δεδομένων PostgreSQL και τη χρήση της σε επιχειρησιακό περιβάλλον, στα πλαίσια της εκδήλωσης "Προηγμένες Εφαρμογές της βάσης δεδομένων PostgreSQL" στις 26/6/2013 στο Εθνικό Ίδρυμα Ερευνών.
Παρουσίασή μου για τη βάση δεδομένων PostgreSQL και των δυνατοτήτων Data Replication που προσφέρει, στα πλαίσια του 6ου Συνεδρίου Κοινοτήτων Ανοιχτού Λογισμικού FOSSCOMM 2013 στις 21/4/2013 στο Χαροκόπειο Πανεπιστήμιο Αθηνών.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Top 9 Trends in Cybersecurity for 2024.pptxdevvsandy
Security and risk management (SRM) leaders face disruptions on technological, organizational, and human fronts. Preparation and pragmatic execution are key for dealing with these disruptions and providing the right cybersecurity program.
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
All you need to know about Spring Boot and GraalVM
The State of (Full) Text Search in PostgreSQL 12
1. https://www.2ndQuadrant.com
Event / Conference name
Location, Date
The State of (Full) Text
Search in PostgreSQL 12
FOSDEM 2020
Jimmy Angelakos
Senior PostgreSQL Architect
Twitter: @vyruss 🏴 🇪🇺 🇬🇷
3. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Your attention please
● This presentation contains linguistics, NLP,
Markov chains, Levenshtein distances, and
various other confounding terms.
● These have been known to induce drowsiness
and inappropriate sleep onset in lecture theatres.
Allergy advice
4. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
What is Text?
(Baby don’t hurt me)
●
PostgreSQL character types
– CHAR(n)
– VARCHAR(n)
– VARCHAR, TEXT
●
Trailing spaces: significant (e.g. for LIKE / regex)
●
Storage
– Character Set (e.g. UTF-8)
– 1+126 bytes 4+→ n bytes
– Compression, TOAST
5. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
What is Text Search?
●
Information retrieval Text retrieval→
●
Search on metadata
– Descriptive, bibliographic, tags, etc.
– Discovery & identification
●
Search on parts of the text
– Matching
– Substring search
– Data extraction, cleaning, mining
6. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Text search operators in PostgreSQL
●
LIKE, ILIKE (~~, ~~*)
●
~, ~* (POSIX regex)
●
regexp_match(string text, pattern text)
●
But are SQL/regular expressions enough?
– No ranking of results
– No concept of language
– Cannot be indexed
●
Okay okay, can be somewhat indexed*
●
SIMILAR TO best forget about this one→
7. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
What is Full Text Search (FTS)?
●
Information retrieval Text retrieval Document retrieval→ →
●
Search on words (on tokens) in a database (all documents)
●
No index Serial search (e.g.→ grep)
●
Indexing Avoid scanning whole documents→
●
Techniques for criteria-based matching
– Natural Language Processing (NLP)
●
Precision vs Recall
– Stop words
– Stemming
8. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Documents? Tokens?
●
Document: a chunk of text (a field in a row)
●
Parsing of documents into classes of tokens
– PostgreSQL parser (or write your own… in C)
●
Conversion of tokens into lexemes
– Normalisation of strings
●
Lexeme: an abstract lexical unit representing related
words (i.e. word root)
– SEARCH searched, searcher→
13. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Dictionaries in PostgreSQL
●
Programs!
●
Accept tokens as input
●
Improve search quality
– Eliminate stop words
– Normalise words into lexemes
●
Reduce size of tsvector
●
CREATE TEXT SEARCH DICTIONARY name
(TEMPLATE = simple, STOPWORDS = english);
●
Can be chained: most specific more general→
ALTER TEXT SEARCH CONFIGURATION name
ADD MAPPING FOR word WITH english_ispell, simple;
●
ispell, myspell, hunspell, etc.
14. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Text matching example (1)
fts=# SELECT to_tsvector('A nice day for a car ride')
fts-# @@ plainto_tsquery('I am riding');
?column?
----------
t
(1 row)
fts=# SELECT to_tsvector('A nice day for a car ride');
to_tsvector
-----------------------------------
'car':6 'day':3 'nice':2 'ride':7
(1 row)
fts=# SELECT plainto_tsquery('I am riding');
plainto_tsquery
-----------------
'ride'
(1 row)
15. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Text matching example (2)
fts=# SELECT to_tsvector('A nice day for a car ride')
fts-# @@ plainto_tsquery('I am riding a bike');
?column?
----------
f
(1 row)
fts=# SELECT to_tsvector('A nice day for a car ride');
to_tsvector
-----------------------------------
'car':6 'day':3 'nice':2 'ride':7
(1 row)
fts=# SELECT plainto_tsquery('I am riding a bike');
plainto_tsquery
-----------------
'ride' & 'bike'
(1 row)
17. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
An example table
●
pgsql-hackers mailing list archive subset
fts=# d mail_messages
Table "public.mail_messages"
Column | Type | Collation | Nullable |
------------+-----------------------------+-----------+----------+-------------
id | integer | | not null | nextval('mai
parent_id | integer | | |
sent | timestamp without time zone | | |
subject | text | | |
author | text | | |
body_plain | text | | |
fts=# dt+ mail_messages
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------+-------+----------+--------+-------------
public | mail_messages | table | postgres | 478 MB |
18. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Ranking results
ts_rank (and Cover Density variant ts_rank_cd)
fts=# SELECT subject, ts_rank(to_tsvector(coalesce(body_plain,'')),
fts(# to_tsquery('aggregate'), 32) AS rank
fts-# FROM mail_messages ORDER BY rank DESC LIMIT 5;
subject | rank
--------------------------------------------------------------+-------------
Re: Window functions patch v04 for the September commit fest | 0.08969686
Re: Window functions patch v04 for the September commit fest | 0.08940695
Re: [HACKERS] PoC: Grouped base relation | 0.08936066
Re: [HACKERS] PoC: Grouped base relation | 0.08931142
Re: [PERFORM] not using index for select min(...) | 0.08925897
19. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
FTS Stats
ts_stat for verifying your TS configuration, identifying stop words
fts=# SELECT * FROM ts_stat(
fts(# 'SELECT to_tsvector(body_plain)
fts'# FROM mail_messages')
fts-# ORDER BY nentry DESC, ndoc DESC, word
fts-# LIMIT 5;
word | ndoc | nentry
-------+--------+--------
use | 173833 | 380951
wrote | 231174 | 350905
would | 157169 | 316416
think | 149858 | 256661
patch | 100991 | 226099
20. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Text indexing
Normal default:
●
B-Tree
– with B-Tree text_pattern_ops for left, right anchored text
– CREATE INDEX name ON table (column varchar_pattern_ops);
For FTS we have:
●
GIN
– Inverted index: one entry per lexeme
– Larger, slower to update Better on less dynamic data→
– On tsvector columns
●
GiST
– Lossy index, smaller but slower (to eliminate false positives)
– Better on fewer unique items
– On tsvector or tsquery columns
22. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
FTS indexing
CREATE INDEX ON mail_messages USING GIN
(to_tsvector('english',
subject ||' '|| body_plain));
●
New in PG12: Generated columns (stored):
ALTER TABLE mail_messages
ADD COLUMN fts_col tsvector
GENERATED ALWAYS AS (to_tsvector('english',
coalesce(subject, '') ||' '||
coalesce(body_plain, ''))) STORED;
CREATE INDEX ON mail_messages USING GIN (fts_col);
23. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
FTS, GiST indexed
fts=# EXPLAIN ANALYZE SELECT count(*) FROM mail_messages
fts-# WHERE to_tsvector('english',body_plain) @@ to_tsquery('aggregate');
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=7210.61..7210.62 rows=1 width=8) (actual time=5630.167..5630.
-> Bitmap Heap Scan on mail_messages (cost=330.46..7206.16 rows=1781 width
Recheck Cond: (to_tsvector('english'::regconfig, body_plain) @@ to_tsq
Rows Removed by Index Recheck: 4267
Heap Blocks: exact=7883
-> Bitmap Index Scan on mail_messages_to_tsvector_idx (cost=0.00..33
Index Cond: (to_tsvector('english'::regconfig, body_plain) @@ to
Planning Time: 0.620 ms
Execution Time: 5630.249 ms
●
26.99 seconds 5.63 seconds! ~4.8x faster→ →
24. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
FTS, GIN indexed
fts=# EXPLAIN ANALYZE SELECT count(*) FROM mail_messages
fts-# WHERE to_tsvector('english',body_plain) @@ to_tsquery('aggregate');
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=6873.60..6873.61 rows=1 width=8) (actual time=6.133..6.134 ro
-> Bitmap Heap Scan on mail_messages (cost=33.96..6869.18 rows=1769 width=
Recheck Cond: (to_tsvector('english'::regconfig, body_plain) @@ to_tsq
Heap Blocks: exact=4630
-> Bitmap Index Scan on mail_messages_to_tsvector_idx (cost=0.00..33
Index Cond: (to_tsvector('english'::regconfig, body_plain) @@ to
Planning Time: 0.433 ms
Execution Time: 5.684 ms
●
26.99 seconds 5.684→ milliseconds! → ~4700x faster
28. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Free text but not natural?
●
One use case: identifying arbitrary strings
– e.g. keywords in device logs
●
Dictionaries not very helpful here
●
Arbitrary example: 10M * ~100 char “IoT device” log entries
– Some contain strings that are significant to user
(but we don’t know these keywords)
– Populate table with random hex codes but 1% of log entries
contains a keyword from /etc/dictionaries-common/words:
c4f2cede5da57f0ace6e669b51186cbaexcruciating9635d8a26a
efb2b4ee8b9845e89718577b3266f68dffa5ae12ebfebf1a508b21
29. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Free text but not natural?
fts=# SELECT message FROM logentries LIMIT 5 OFFSET 495;
message
--------------------------------------------------------------------------------------------------
da40c1006cd75105c1eb8ea70705828d195b264565f047c6d449e51cf99d01e901cf532f03018e793a394fdac9bb5d2a
aa88a5c43ec8b2a8578d44f924053e842584c0e6b8295b72230f7d19aa3ba2f2b9e1a4bffcf0f82e4d29344645b714ca
fe9731c39108a74714cad9fc8570b115howlingb9904fa4ad86544fb778ef5edfe362e02a94c66851c3c8d7fe47b26e5
b68430decf30085cc2e7810585c5d681source2b638d61c5972f25aa3fa5c35aa2be282f04843cfca007689cc6ecdbe3
5b7ba17108e416d04788dc9ac15121fad7625fa7c216666bf54c1b0ca21ab618829262dfd67a5cd40aefd66235cf9c7f
(5 rows)
fts=# dt+ logentries
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------+-------+----------+---------+-------------
public | logentries | table | postgres | 1421 MB |
(1 row)
fts=# SELECT * FROM logentries WHERE message LIKE '%source%';
30. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
How long?
fts=# EXPLAIN ANALYZE SELECT * FROM logentries WHERE message LIKE '%source%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..235029.95 rows=1000 width=109) (actual time=143.010..9654.769 rows=16 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on logentries (cost=0.00..233929.95 rows=417 width=109) (actual time=1017.442..
Filter: (message ~~ '%source%'::text)
Rows Removed by Filter: 3333594
Planning Time: 0.220 ms
JIT:
Functions: 6
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 18.918 ms, Inlining 0.000 ms, Optimization 41.736 ms, Emission 121.955 ms, Total 18
Execution Time: 9673.582 ms
(12 rows)
●
9.6 seconds!
31. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Trigrams
●
n-gram model: probabilistic language model (Markov Chains)
●
3 characters trigrams→
●
Similarity of alphanumeric text number of shared trigrams→
●
CREATE EXTENSION pg_trgm;
●
fts=# SELECT show_trgm('source');
show_trgm
-------------------------------------
{" s"," so","ce ",our,rce,sou,urc}
●
fts=# CREATE INDEX ON logentries
fts-# USING GIN (message gin_trgm_ops);
32. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Did trigrams help?
fts=# EXPLAIN ANALYZE SELECT * FROM logentries WHERE message LIKE '%source%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on logentries (cost=87.75..3870.45 rows=1000 width=109) (actual time=0.152..0.206 rows
Recheck Cond: (message ~~ '%source%'::text)
Rows Removed by Index Recheck: 2
Heap Blocks: exact=18
-> Bitmap Index Scan on logentries_message_idx (cost=0.00..87.50 rows=1000 width=0) (actual time=0.1
Index Cond: (message ~~ '%source%'::text)
Planning Time: 0.222 ms
Execution Time: 0.258 ms
(8 rows)
●
0.258 milliseconds! → ~37000x faster
●
Also work with regex
33. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
This comes at a cost
fts=# di+ logentries_message_idx
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+------------------------+-------+----------+------------+---------+-------------
public | logentries_message_idx | index | postgres | logentries | 1601 MB |
(1 row)
34. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Other neat trigram tricks
●
similarity(text, text) real→
●
text <-> text → Distance (1-similarity)
●
text % text true→ if over similarity_threshold
●
Supported by indexes:
– GIN
– GiST is efficient: k-nearest neighbour (k-NN)
36. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Collation in PostgreSQL
●
Sort order and character classification
– Per-column: CREATE TABLE test1 (a text
COLLATE "de_DE" …
– Per-operation: SELECT a < b COLLATE "de_DE"
FROM test1;
– Not restricted by DB LC_COLLATE, LC_CTYPE
●
New in PG12: Nondeterministic collations (case-
insensitive, ignore accents)
37. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Other types of documents JSON→
●
Also a real world use case
●
JSONB supports indexing
(article ->> 'title' ||''||
article ->> 'author')::tsvector
●
jsonb_to_tsvector()
SELECT jsonb_to_tsvector('english', column,
'["numeric","key","string","boolean"]') FROM table;
●
New in PG12: SQL/JSON (SQL:2016) jsonpath expressions→
●
JsQuery: JSONB query language with GIN support
– Equivalent to tsquery, JSON query as a single value
– https://github.com/postgrespro/jsquery
38. https://www.2ndQuadrant.com
FOSDEM
Brussels, 2020-02-02
Finally, maintenance
●
VACUUM ANALYZE
– Keep your table statistics up-to-date
– Pending GIN entries
●
ALTER TABLE SET STATISTICS
– Keep your table statistics accurate
●
Number of distinct values
●
Correlated columns
●
EXPLAIN ANALYZE from time to time
– Your query works now – but a year from now?
●
maintenance_work_mem