Everyone uses JSON files. Thankfully, most of the time the JSON files we use are small and we can always just read and process everything in memory because it is convenient and easy to do. Most of the time it is not all the time. Sometimes you must process big JSON files and the moment you try to do this the old-fashioned way you are soon going to see the dreadful “java.lang.OutOfMemoryError.” One search on the internet and you will find solutions to this problem. Concisely you will see a variation of these answers:
Split your file into smaller ones Increase max memory used (yes, this is one of the answers)
Save the JSON in a temporary file and use the streaming capabilities of GSON or Jackson.
GSON or Jackson work well but they require you to write a lot of boilerplate code and get your hands dirty with lots of tokens, if checks, path checks etc. We developed a fourth option, and we were able to abstract away what Jackson can do and create an interface that is easy to understand and interact with. With its help we managed to deliver increased performance, reduce the memory we need to run our service by more than 50% while also being able to translate an infinite number of paragraphs because now we no longer have the entire file in memory.
The document discusses a hackday focused on exploring the JSR 353 JSON Processing API and JSR 356 WebSockets API in Java. The hackday aims to adapt the JSR programs, explore new APIs, and provide feedback. It provides overviews of the WebSocket protocol and how it works, browser support for WebSockets, the JSR 356 Java API for WebSockets including annotations and example code. It also discusses JSON and its history, the goals of JSR 353 for a Java API for JSON processing, example code for generating and parsing JSON, and the low-level event-based API.
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
Spark streaming can be used for near-real-time data analysis of data streams. It processes data in micro-batches and provides windowing operations. Stateful operations like updateStateByKey allow tracking state across batches. Data can be obtained from sources like Kafka, Flume, HDFS and processed using transformations before being saved to destinations like Cassandra. Fault tolerance is provided by replicating batches, but some data may be lost depending on how receivers collect data.
Twitter has launched a Geotagging API – we really wanted to enable users to not only talk about “What’s happening?” but also “What’s happening right here?” For a while now, we’ve been watching as users have been trying to geo-tag their tweets through a variety of methods, all of which involve a link to a map service embedded in their Tweet. This talk will delve into how Twitter handles their geocontent including tool suggestions.
As a platform, we’ve tried to make it easier for our users by making location be omnipresent through our platform, and an inherent (but optional) part of a tweet. We’re making the platform be not just about time, but also about place.
Twitter has launched a Geotagging API – we really wanted to enable users to not only talk about “What’s happening?” but also “What’s happening right here?” For a while now, we’ve been watching as users have been trying to geo-tag their tweets through a variety of methods, all of which involve a link to a map service embedded in their Tweet. This talk will delve into how Twitter handles their geocontent including tool suggestions.
As a platform, we’ve tried to make it easier for our users by making location be omnipresent through our platform, and an inherent (but optional) part of a tweet. We’re making the platform be not just about time, but also about place.
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
The United States will be deploying 16,000 traffic speed monitoring sensors - 1 on every mile of US interstate in urban centers. These sensors update the speed, weather, and pavement conditions once per minute. MongoDB will collect and aggregate live sensor data feeds from roadways around the country, support real-time queries from cars on traffic conditions on their route as well as be the platform for real-time dashboards displaying traffic conditions and more complex analytical queries used to identify traffic trends. In this session, we’ll implement a few different data aggregation techniques to query and dashboard the metrics gathered from the US interstate.
This document discusses using Python to connect to and interact with a PostgreSQL database. It covers:
- Popular Python database drivers for PostgreSQL, including Psycopg which is the most full-featured.
- The basics of connecting to a database, executing queries, and fetching results using the DB-API standard. This includes passing parameters, handling different data types, and error handling.
- Additional Psycopg features like server-side cursors, transaction handling, and custom connection factories to access columns by name rather than number.
In summary, it provides an overview of using Python with PostgreSQL for both basic and advanced database operations from the Python side.
The document discusses a hackday focused on exploring the JSR 353 JSON Processing API and JSR 356 WebSockets API in Java. The hackday aims to adapt the JSR programs, explore new APIs, and provide feedback. It provides overviews of the WebSocket protocol and how it works, browser support for WebSockets, the JSR 356 Java API for WebSockets including annotations and example code. It also discusses JSON and its history, the goals of JSR 353 for a Java API for JSON processing, example code for generating and parsing JSON, and the low-level event-based API.
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
Spark streaming can be used for near-real-time data analysis of data streams. It processes data in micro-batches and provides windowing operations. Stateful operations like updateStateByKey allow tracking state across batches. Data can be obtained from sources like Kafka, Flume, HDFS and processed using transformations before being saved to destinations like Cassandra. Fault tolerance is provided by replicating batches, but some data may be lost depending on how receivers collect data.
Twitter has launched a Geotagging API – we really wanted to enable users to not only talk about “What’s happening?” but also “What’s happening right here?” For a while now, we’ve been watching as users have been trying to geo-tag their tweets through a variety of methods, all of which involve a link to a map service embedded in their Tweet. This talk will delve into how Twitter handles their geocontent including tool suggestions.
As a platform, we’ve tried to make it easier for our users by making location be omnipresent through our platform, and an inherent (but optional) part of a tweet. We’re making the platform be not just about time, but also about place.
Twitter has launched a Geotagging API – we really wanted to enable users to not only talk about “What’s happening?” but also “What’s happening right here?” For a while now, we’ve been watching as users have been trying to geo-tag their tweets through a variety of methods, all of which involve a link to a map service embedded in their Tweet. This talk will delve into how Twitter handles their geocontent including tool suggestions.
As a platform, we’ve tried to make it easier for our users by making location be omnipresent through our platform, and an inherent (but optional) part of a tweet. We’re making the platform be not just about time, but also about place.
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
The United States will be deploying 16,000 traffic speed monitoring sensors - 1 on every mile of US interstate in urban centers. These sensors update the speed, weather, and pavement conditions once per minute. MongoDB will collect and aggregate live sensor data feeds from roadways around the country, support real-time queries from cars on traffic conditions on their route as well as be the platform for real-time dashboards displaying traffic conditions and more complex analytical queries used to identify traffic trends. In this session, we’ll implement a few different data aggregation techniques to query and dashboard the metrics gathered from the US interstate.
This document discusses using Python to connect to and interact with a PostgreSQL database. It covers:
- Popular Python database drivers for PostgreSQL, including Psycopg which is the most full-featured.
- The basics of connecting to a database, executing queries, and fetching results using the DB-API standard. This includes passing parameters, handling different data types, and error handling.
- Additional Psycopg features like server-side cursors, transaction handling, and custom connection factories to access columns by name rather than number.
In summary, it provides an overview of using Python with PostgreSQL for both basic and advanced database operations from the Python side.
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
In this talk we will focus on several of the reasons why developers have come to love the richness, flexibility, and ease of use that MongoDB provides. First we will give a brief introduction of MongoDB, comparing and contrasting it to the traditional relational database. Next, we’ll give an overview of the APIs and tools that are part of the MongoDB ecosystem. Then we’ll look at how MongoDB CRUD (Create, Read, Update, Delete) operations work, and also explore query, update, and projection operators. Finally, we will discuss MongoDB indexes and look at some examples of how indexes are used.
This document discusses using MongoDB to build location-based applications. It describes how to model location and check-in data, perform queries and analytics on that data, and deploy MongoDB in both unsharded and sharded configurations to scale the application. Examples of using MongoDB for a location application include storing location documents with name, address, tags, latitude/longitude, and user tips, and user documents with check-in arrays referencing location IDs.
This document discusses using Codable types in Kitura to build RESTful APIs with routing in Swift. It introduces JSON encoding and decoding using Codable, shows examples of defining Codable model types and implementing GET, POST, and query parameter handlers, and discusses benefits like type safety, code sharing between backend and clients.
The document discusses using MongoDB as a log collector. It provides an agenda that includes who the presenter is, how logging is currently done, and ideas for using MongoDB for logging in the future. Specific topics covered include using syslog-ng to send logs to MongoDB, examples of logging Apache traffic, and map-reduce examples for analyzing logs like finding the top 10 IP addresses.
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB
Does exploring data excite you? Do you use Python or R as your language of choice for data analysis? Does your job title include the term Data Analyst? If you answered yes to any of those questions, then the Exploring Your MongoDB Data with Pirates and Snakes is the session for you! MongoDB Developer Advocate Ken Alger will show the suggested method for using dataframe structures in R and Python with your MongoDB data. He’ll show the code for best practices in both languages to move your array based MongoDB data into the popular fast, flexible, and expressive dataframes used for data analysis in these prominent programming languages.
A Century Of Weather Data - Midwest.ioRandall Hunt
This document summarizes the key considerations and performance tests for storing and querying a large weather dataset containing over 2.5 billion data points. It describes the schema design using MongoDB to embed data and index on location. Bulk loading of data was 10 hours on a single server but only 3 hours on a sharded cluster. Queries for a single data point were fastest on the cluster at under 1ms while worldwide queries were faster at 310/second. Analytics like maximum temperature took 2.5 hours on a single server but only 2 minutes on the cluster. The cluster provided much higher throughput and better performance for complex queries while being more expensive.
This document provides an overview of MongoDB, including its key features such as document-oriented storage, full index support, replication and high availability, auto sharding, querying capabilities, and fast in-place updates. It also discusses MongoDB's architecture for replication, sharding, and configuration servers.
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
This document discusses using time series data from traffic sensors to monitor road conditions and support navigation systems. It reviews using MongoDB to store sensor data and perform aggregations to calculate metrics like average speed. MapReduce and the aggregation framework are demonstrated for queries like calculating average speeds by weather, road status, or pavement conditions. Hadoop and the MongoDB connector for Hadoop are mentioned for processing large datasets in parallel across nodes.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Andrii Lashchenko
This presentation will cover the history of creation, implementation details and various challenges related to embedded documents querying in MongoDB, along with examples of how to properly create and utilize the extension on top of official MongoDB Scala Driver. This newly introduced extension allows to fully utilize Spray JSON and represents bidirectional serialization for case classes in BSON, as well as flexible DSL for MongoDB query operators, documents and collections.
1) MongoDB is used to collect analytics data from GitHub pages in real-time with over 10-15 million page views per day stored across 13 servers.
2) Data is stored in a denormalized manner across multiple collections to optimize for space, RAM, and read performance while live querying is supported.
3) As data volume grows over time, the data will need to be partitioned either by time frame, functionality, or individual servers to support the increased load.
This document provides information about Redis, including what it is, who uses it, data types supported, commands, and examples of usage. Some key points:
- Redis is an open source, in-memory data structure store used as a database, cache, message broker, and queue. It supports strings, hashes, lists, sets, sorted sets, and geospatial indexes.
- Major companies that use Redis include Twitter, GitHub, Pinterest, Snapchat, and Craigslist for use cases like caching, pub/sub, and queuing.
- Redis has advantages over Memcached like the ability to persist data to disk and support data types beyond strings.
- Examples demonstrate basic Redis data
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)Red Hat Developers
JBoss EAP7 brings support for the most recent industry standards and technologies, including Java EE7, the latest edition of the premier enterprise development standard. This session will provide an overview of the major additions to Java EE7, and how your team can use these capabilities on the advanced EAP7 runtime to produce better applications with less code.
This document discusses using jQuery and Google App Engine to create cross-domain web mashups in 3 sentences or less:
The document introduces techniques for creating cross-domain web mashups using jQuery to make AJAX calls across domains and Google App Engine for hosting, discussing JSONP and proxies to overcome the same-origin policy limitation. It then provides an example mashup that displays tweets tagged with a hashtag on a map by geocoding hashtag names to locations and querying Twitter, Google Maps, and other domains.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
This document provides an overview of analytics with MongoDB and Hadoop Connector. It discusses how to collect and explore data, use visualization and aggregation, and make predictions. It describes how MongoDB can be used for data collection, pre-aggregation, and real-time queries. The Aggregation Framework and MapReduce in MongoDB are explained. It also covers using the Hadoop Connector to process large amounts of MongoDB data in Hadoop and writing results back to MongoDB. Examples of analytics use cases like recommendations, A/B testing, and personalization are briefly outlined.
Big Data Expo 2015 - Gigaspaces Making Sense of it allBigDataExpo
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines. We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a mash-up between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouJAXLondon_Conference
The document provides an overview of different technologies used in big data solutions, including SQL, NoSQL, in-memory data grids (IMDG), key/value stores, and stream processing technologies. It discusses how SSD technology can help shape the big data landscape by enabling greater scale at lower costs. The document then discusses building a hybrid big data solution using an IMDG and SSD to create a common high-speed data store. It provides examples of using rich query semantics like nested queries, projections, and aggregations with data grids. Finally, it briefly discusses orchestration in big data, including deploying and managing big data applications over their lifecycle.
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
Contenu connexe
Similaire à IT Days - Parse huge JSON files in a streaming way.pptx
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
In this talk we will focus on several of the reasons why developers have come to love the richness, flexibility, and ease of use that MongoDB provides. First we will give a brief introduction of MongoDB, comparing and contrasting it to the traditional relational database. Next, we’ll give an overview of the APIs and tools that are part of the MongoDB ecosystem. Then we’ll look at how MongoDB CRUD (Create, Read, Update, Delete) operations work, and also explore query, update, and projection operators. Finally, we will discuss MongoDB indexes and look at some examples of how indexes are used.
This document discusses using MongoDB to build location-based applications. It describes how to model location and check-in data, perform queries and analytics on that data, and deploy MongoDB in both unsharded and sharded configurations to scale the application. Examples of using MongoDB for a location application include storing location documents with name, address, tags, latitude/longitude, and user tips, and user documents with check-in arrays referencing location IDs.
This document discusses using Codable types in Kitura to build RESTful APIs with routing in Swift. It introduces JSON encoding and decoding using Codable, shows examples of defining Codable model types and implementing GET, POST, and query parameter handlers, and discusses benefits like type safety, code sharing between backend and clients.
The document discusses using MongoDB as a log collector. It provides an agenda that includes who the presenter is, how logging is currently done, and ideas for using MongoDB for logging in the future. Specific topics covered include using syslog-ng to send logs to MongoDB, examples of logging Apache traffic, and map-reduce examples for analyzing logs like finding the top 10 IP addresses.
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB
Does exploring data excite you? Do you use Python or R as your language of choice for data analysis? Does your job title include the term Data Analyst? If you answered yes to any of those questions, then the Exploring Your MongoDB Data with Pirates and Snakes is the session for you! MongoDB Developer Advocate Ken Alger will show the suggested method for using dataframe structures in R and Python with your MongoDB data. He’ll show the code for best practices in both languages to move your array based MongoDB data into the popular fast, flexible, and expressive dataframes used for data analysis in these prominent programming languages.
A Century Of Weather Data - Midwest.ioRandall Hunt
This document summarizes the key considerations and performance tests for storing and querying a large weather dataset containing over 2.5 billion data points. It describes the schema design using MongoDB to embed data and index on location. Bulk loading of data was 10 hours on a single server but only 3 hours on a sharded cluster. Queries for a single data point were fastest on the cluster at under 1ms while worldwide queries were faster at 310/second. Analytics like maximum temperature took 2.5 hours on a single server but only 2 minutes on the cluster. The cluster provided much higher throughput and better performance for complex queries while being more expensive.
This document provides an overview of MongoDB, including its key features such as document-oriented storage, full index support, replication and high availability, auto sharding, querying capabilities, and fast in-place updates. It also discusses MongoDB's architecture for replication, sharding, and configuration servers.
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
This document discusses using time series data from traffic sensors to monitor road conditions and support navigation systems. It reviews using MongoDB to store sensor data and perform aggregations to calculate metrics like average speed. MapReduce and the aggregation framework are demonstrated for queries like calculating average speeds by weather, road status, or pavement conditions. Hadoop and the MongoDB connector for Hadoop are mentioned for processing large datasets in parallel across nodes.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Andrii Lashchenko
This presentation will cover the history of creation, implementation details and various challenges related to embedded documents querying in MongoDB, along with examples of how to properly create and utilize the extension on top of official MongoDB Scala Driver. This newly introduced extension allows to fully utilize Spray JSON and represents bidirectional serialization for case classes in BSON, as well as flexible DSL for MongoDB query operators, documents and collections.
1) MongoDB is used to collect analytics data from GitHub pages in real-time with over 10-15 million page views per day stored across 13 servers.
2) Data is stored in a denormalized manner across multiple collections to optimize for space, RAM, and read performance while live querying is supported.
3) As data volume grows over time, the data will need to be partitioned either by time frame, functionality, or individual servers to support the increased load.
This document provides information about Redis, including what it is, who uses it, data types supported, commands, and examples of usage. Some key points:
- Redis is an open source, in-memory data structure store used as a database, cache, message broker, and queue. It supports strings, hashes, lists, sets, sorted sets, and geospatial indexes.
- Major companies that use Redis include Twitter, GitHub, Pinterest, Snapchat, and Craigslist for use cases like caching, pub/sub, and queuing.
- Redis has advantages over Memcached like the ability to persist data to disk and support data types beyond strings.
- Examples demonstrate basic Redis data
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)Red Hat Developers
JBoss EAP7 brings support for the most recent industry standards and technologies, including Java EE7, the latest edition of the premier enterprise development standard. This session will provide an overview of the major additions to Java EE7, and how your team can use these capabilities on the advanced EAP7 runtime to produce better applications with less code.
This document discusses using jQuery and Google App Engine to create cross-domain web mashups in 3 sentences or less:
The document introduces techniques for creating cross-domain web mashups using jQuery to make AJAX calls across domains and Google App Engine for hosting, discussing JSONP and proxies to overcome the same-origin policy limitation. It then provides an example mashup that displays tweets tagged with a hashtag on a map by geocoding hashtag names to locations and querying Twitter, Google Maps, and other domains.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
This document provides an overview of analytics with MongoDB and Hadoop Connector. It discusses how to collect and explore data, use visualization and aggregation, and make predictions. It describes how MongoDB can be used for data collection, pre-aggregation, and real-time queries. The Aggregation Framework and MapReduce in MongoDB are explained. It also covers using the Hadoop Connector to process large amounts of MongoDB data in Hadoop and writing results back to MongoDB. Examples of analytics use cases like recommendations, A/B testing, and personalization are briefly outlined.
Big Data Expo 2015 - Gigaspaces Making Sense of it allBigDataExpo
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines. We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a mash-up between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouJAXLondon_Conference
The document provides an overview of different technologies used in big data solutions, including SQL, NoSQL, in-memory data grids (IMDG), key/value stores, and stream processing technologies. It discusses how SSD technology can help shape the big data landscape by enabling greater scale at lower costs. The document then discusses building a hybrid big data solution using an IMDG and SSD to create a common high-speed data store. It provides examples of using rich query semantics like nested queries, projections, and aggregations with data grids. Finally, it briefly discusses orchestration in big data, including deploying and managing big data applications over their lifecycle.
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
React.js, a JavaScript library developed by Facebook, has gained immense popularity for building user interfaces, especially for single-page applications. Over the years, React has evolved and expanded its capabilities, becoming a preferred choice for mobile app development. This article will explore why React.js is an excellent choice for the Best Mobile App development company in Noida.
Visit Us For Information: https://www.linkedin.com/pulse/what-makes-reactjs-stand-out-mobile-app-development-rajesh-rai-pihvf/
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Odoo releases a new update every year. The latest version, Odoo 17, came out in October 2023. It brought many improvements to the user interface and user experience, along with new features in modules like accounting, marketing, manufacturing, websites, and more.
The Odoo 17 update has been a hot topic among startups, mid-sized businesses, large enterprises, and Odoo developers aiming to grow their businesses. Since it is now already the first quarter of 2024, you must have a clear idea of what Odoo 17 entails and what it can offer your business if you are still not aware of it.
This blog covers the features and functionalities. Explore the entire blog and get in touch with expert Odoo ERP consultants to leverage Odoo 17 and its features for your business too.
An Overview of Odoo ERP
Odoo ERP was first released as OpenERP software in February 2005. It is a suite of business applications used for ERP, CRM, eCommerce, websites, and project management. Ten years ago, the Odoo Enterprise edition was launched to help fund the Odoo Community version.
When you compare Odoo Community and Enterprise, the Enterprise edition offers exclusive features like mobile app access, Odoo Studio customisation, Odoo hosting, and unlimited functional support.
Today, Odoo is a well-known name used by companies of all sizes across various industries, including manufacturing, retail, accounting, marketing, healthcare, IT consulting, and R&D.
The latest version, Odoo 17, has been available since October 2023. Key highlights of this update include:
Enhanced user experience with improvements to the command bar, faster backend page loading, and multiple dashboard views.
Instant report generation, credit limit alerts for sales and invoices, separate OCR settings for invoice creation, and an auto-complete feature for forms in the accounting module.
Improved image handling and global attribute changes for mailing lists in email marketing.
A default auto-signature option and a refuse-to-sign option in HR modules.
Options to divide and merge manufacturing orders, track the status of manufacturing orders, and more in the MRP module.
Dark mode in Odoo 17.
Now that the Odoo 17 announcement is official, let’s look at what’s new in Odoo 17!
What is Odoo ERP 17?
Odoo 17 is the latest version of one of the world’s leading open-source enterprise ERPs. This version has come up with significant improvements explained here in this blog. Also, this new version aims to introduce features that enhance time-saving, efficiency, and productivity for users across various organisations.
Odoo 17, released at the Odoo Experience 2023, brought notable improvements to the user interface and added new functionalities with enhancements in performance, accessibility, data analysis, and management, further expanding its reach in the market.
🏎️Tech Transformation: DevOps Insights from the Experts 👩💻campbellclarkson
Connect with fellow Trailblazers, learn from industry experts Glenda Thomson (Salesforce, Principal Technical Architect) and Will Dinn (Judo Bank, Salesforce Development Lead), and discover how to harness DevOps tools with Salesforce.
Transforming Product Development using OnePlan To Boost Efficiency and Innova...OnePlan Solutions
Ready to overcome challenges and drive innovation in your organization? Join us in our upcoming webinar where we discuss how to combat resource limitations, scope creep, and the difficulties of aligning your projects with strategic goals. Discover how OnePlan can revolutionize your product development processes, helping your team to innovate faster, manage resources more effectively, and deliver exceptional results.
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Orca: Nocode Graphical Editor for Container OrchestrationPedro J. Molina
Tool demo on CEDI/SISTEDES/JISBD2024 at A Coruña, Spain. 2024.06.18
"Orca: Nocode Graphical Editor for Container Orchestration"
by Pedro J. Molina PhD. from Metadev
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
Hello & Welcome
A production issue fixed with the help of an open source library we built
Let’s start from the top
Why do we even have to process huge JSON files?
By process I mean actually manipulating these files not only storing them for import/export flows or analytics
We, in RWS, do translation’s a lot of them for a lot of clients
In our system whatever the customer uploads turns into a JSON that we call BCM (Billingual Content Model)
How does a simple translation flow
Upload files
Convert to BCM’s (which are JSON’s that hold both the original file details and the resulting translated document details)
Apply Machine Translation
Maybe someone will then rewrite parts of the machined translated text
Finally we convert this JSON back to the original file type with the translated text
To give you a sense of how big these files can get I will quickly show some books and tell you the BCM’s size for each of them
Keep in mind this is before applying any translation which will of course make files even bigger
About 50 thousand words
Gets to 3Mb
About 105 thousand words
Gets you to 5 Mb
Couldn’t skip Uncle Bob’s book
About 68 thousand words
Gets you to 8 and a half Mb JSON
Let’s jump to one of the longest novels
About 570 thousand words
Gives us 33 Mb Json
The entire Lord of The Rings Trilogy
Slightly less words than the previous book
53 Mb Json
Finally, Introduction to Algorithms
About 450 thousand words
Almost 150 Mb JSON
Finally just to introduce an Inception moment
This presentation has about 2.5k words including notes
And the JSON we will process to translate is a little over 1Mb
- Now that you have a sense of how big some of our BCM’s can be let’s zoom into the place where we first noticed problems
- First we receive a message that we have to apply translation to a file
- Then we download the file in memory
- Remove the already translated paragraphs
- Send the file to translation
The send to translation might give us some problems
But when we were merging the translation back into the original file we had more problems
This is triggered by receiving another message that the translation for a given file is done
First we download the original file again
Then we download the translation
When we have both files in memory we merge them together
Then we upload the BCM to be used by other services
Where’s the problem?
Go back to the merging
Its here where we have 2 JSON’s in memory to be able to merge them together
Let me show you how much memory this might use
What happens if we try to translate Introduction to Algorithms
The file without any translation is 150Mb
Assume the translation doubles the file just to make math easier
In memory those same files can be a lot bigger with one bigger JSON that I’ve tested it was 3 times bigger
Assuming the same multiplier results in almost 1.3 Gb of memory being used
This is on top of everything else running on the service
What’s the immediate fix you can do?
Pay more for more memory
Reduce your throughput
This is not a long term solution
Here is where we started working on a different approach
Working in the Java ecosystem we looked around for a solution that fits our needs
Here is where I found about Jackson’s streaming capabilities
Say we have this JSON
We receive a request from specifying an action to perform on a list of numbers from a user
The JSON is too big to deserialize in memory
How do we sum up all numbers with limited memory use using Jackson?
First we add the Jackson dependency to our project
Then we will be creating a JsonParser
To do this we will need an InputSource from where the JSON will be read, let’s assume a file
- Then we will be using the parser.nextToken() the most to process this JSON
If we keep calling the .nextToken() method
If you want to sum the numbers you will have a while block
Find the numbers field
After that iterate over all numbers and add them to a total
Doing this will ensure that you have at most a few Kb of memory being used for the bytes that Jackson will buffer in memory for better performance
This is how a version of the code might look like
We managed to sum the numbers
The only issue is that even for our simple example this is the logic, imagine how your own logic mapped to tokens would look like
Performance wise this is the best but our use cases are too complex to implement them using with tokens
We built a new library
Backed by Jackson but without the token logic
Let’s try again to sum all the numbers with our new library
As of last night you can import the library in your own project
We initialize a processor to read from our file
We define the path where we can find the list of numbers
After that using the processor and the path defined we can say that at the given path we expect to find Integer’s and to read them all
Using the iterator we can quickly sum up all the numbers
Most of our operations on the BCM’s require us to rewrite it and apply different logic while we do that
We made sure that the library we wrote permits us to do that as well
Let’s look over another exercise and see how we can do it in a streamed way
won’t bother you with how you can do the rewrite and +1 using Jackson because it’s a lot
I’m just showing you a snippet of how one might be able to do it in over 20 lines of codes
Using our library it’s a lot easier to do this
We create a processor builder and initialize with the input source pointing to the numbers file and and outputSource where we write our JSON
Define numbers path again
Create what we call a transformer that we tell to look at the numbers array path, expect to find Integer’s and apply the following function
Then we create a visitor using this transformer
Build our Visiting processor
And we call the visit method of the processor
We will iterate over JSON and apply all the transformers specified while rewriting the entire JSON in the OutputStream
- If we want to complicate the problem even more and say we want to extract the username from the JSON
Remember we had a requester field that has an object including the username
Using our library we can quickly achieve this
Define our path matchers
Define an object where we will be storing out username value
Define our JSON visitor specifying the 2 transformers
Then we visit the JSON
At the end we will have the username value stored and we can do whatever we want with it
How does our library compare with Jackson and processing files in memory?
Did some benchmark tests just to give you a glimpse of how Jackson our Library and processing files in memory compare
As you can see from 10Mb files to 100Mb JSON’s, if we want to sum the numbers Jackson will be the fastest, followed closely by our library
Starting from about 60Mb files processing in memory becomes way slower
I don’t have a graph to show you how much memory is being used but is Kb vs Mb’s
For my numbers JSON I’ve only used 3 digit numbers.
3 digits on disk it’s 3 bits, same 3 digits as an int in Java will use 32 bits. Each number is more than 10 times bigger in memory.
Comparing performance for rewriting JSON’s shows us a similar story
Doing anything in memory is always slower than doing it in a streamed way
If you know your JSON’s are too big for you to hold in memory, when is it a good idea to use our library over Jackson?
- This is a good tell-tale sign that you should be using our library
I’ve only shown you deserialization to Integer’s and String’s but using our library you can deserialize to an entire object
Split your logic into smaller processable units
Like I mentioned the slide before you can deserialize to any object and process into smaller units
Deserialization has it’s performance penalty, you have to check how many deserialization’s you end up doing
Even for the easiest of logics building that logic using our library will be faster
For now we use the library in 4 distinct flows
What are our own personal results from processing files using the library we built
For our BCM’s our processable units are Paragraphs and we were able to do everything we needed by only deserializing one paragraph at a time
- Doesn’t matter the size of a BCM a paragraph can’t be so big that we can’t hold in memory
- Back to processing multiple files in parallel since we use less memory
- We simply don’t need as much memory added to these services because we don’t use as much now
On GitHub you can find the source code, all the other transformers you can use and tests that show you how they all work
I’ve got a list of refactorings, new api’s, tests to add so if you like the library please do follow the progress there
Would love to receive any feedback on it
If you have a use case where you think this library might work for you try using the library or come contact me
If you use a different programming language than Java and you need the same functionality also contact me because we can make something work