When it comes to Data Access performance of .NET application, many people focus on the tuning and optimizing the Backend datasources. But there is lot to gain from just tuning your connectivity solutions for the need of your application.
This presentation focuses on how to tune your ADO.NET connectivity solution to get the best performance out of them, with the examples.
Introduction of Presenter
10 Years of Experience in Database driver development
Product Owner of ODBC, JDBC and ADO.NET Drivers. Owning ADO.NET Products for last 3 Years
ADO.NET Product lines currently supports Connectivity to Oracle, DB2, Microsoft SQL Server and Sybase ASE.
Entity Framework 6.0 Support for Oracle and DB2 for i.
What we are going to cover
Not Covering Entity Framework in this session, if interested pleas get back to us and we would try to schedule something.
Will stay focused on the core ADO.NET Connectivity to Relation Databases for this session.
Different components of Performance and How does it impacts.
Database – Database tuning, indexing, normalizations are still important part of performance improvements. CPU, Memory, Network bandwidth, sharing of resources all becomes imp.
Network – This is one of the critical resource which impacts the performance. Fewer the network round trips better the performance. Example
Data provider – Provider architecture, Configurations and how it handles the system resources are key factors
Data Access Code – A well written Data Access Code helps Data provider to optimize the executions and improve performance, we will touch on this
Client Infrastructure – Not only Database server, but resources available on client are also play important part in overall performance of the application
Performance Strategy :
Usually we build an application to meet a business requirement; and then once we notice a performance issue; we reactively try and fix the performance problems.
This is very costly as appose to designing the application proactively with performance as an important design criteria. Here are some design guidelines on how to make best use of your data Adaptor in your application...
Managed vs Un-Managed
Managed Architecture provides a lot more than just performance, Garbage Collector, Type Safetly, Code Access Security are just few advantages to mention.
Unmanaged calls P/Invokes requires security context switching which has high cost.
The Data provider should be smart to utilize the resources on both server and client wisely to improve performance, each of these sub-items can impact the performance of your application.
Distributed Transaction Enabled Connections take more time and Resources on server to create on few backend e.g. DB2
We will discuss each of them in coming slides
Connections – Opening a Connection is very expensive
Sets up Environment on both client and Server side with required memory buffers
Opens Sockets/communication channel, Certificate/Key Validations in case of SSL/Encryptions.
Connection Pooling is a mechanism to reuse the existing sets of Connections and avoid the multiple open and close calls for better performance
Fix Size of Pool
Strategy 1: Fill up the pool with connections at the start of application. Drawback, slow start
Strategy 2: Open the connection only once needed, if matching connection found in pool, loaded from there, else create new one. Close as soon as the work is done, so that it will be available for next request.
More complex strategies like FIFO, LIFO etc.
Protocol Packet Size defines numbers of bytes can be transferred in single network round trip. All database instances and Data Providers can be configured to a max value for this packet size.
This max Packet size is used by database server and data providers to initialize the internal arrays/structures required for communication.
Actual Communication Packet size is Minimum of Server’s Max Packet Size and Clients Max Packet Size.
More the Packet Size, less are the network roundtrips to transfer the same data across, Hence better the performance
Distributed Transaction: Transaction spanning multiple connection for same database or across multiple databases.
Suppose a company has different databases for their payroll and accounting. Employee Salary operations needs to update both of these as an atomic operation.
Auto Commit TRUE means Commit is performed after every SQL statement that servers as request to server i.e. Insert, Update, Delete and Select.
Because the database objects/rows/pages are locked by transactions they have negative impact on the throughput. The choice of Isolation Levels also determines this impact.
Server can use this cached query plans for multiple executions of same statement without any overhead.
Benefit 1: It reduced extra round trip of prepare and execute separate.
Benefit 2: You don’t have to keep track of which statements you have prepared for multiple executions and which you have not.
Dynamic queries (which uses Parameters) are better than static (which use literals) for performance, as dynamic queries can be prepared once and executed multiple time with different parameter values.
If your application has same statements executed multiple times, then statement caching/pooling can help a lot in the performance improvement.
In ArrayBinding, all the parameters Values are passed to the server in single go.
It reduces the network round trips, compared to one row at a time approach.
If you set DataAdapter.UpdateBatchSize property; Data Providers will internally use ArrayBinding Mechanism to speed up the execution.
Bulk Load is done through the Specialized protocols exposed by database vendor to fast uploading of data.
Not all database vendors or data providers support this functionality.
Microsoft Tools like SSIS also can benefit from BulkLoad functionality to insert large amount of data.
Writing SQL Statements to Fetch limited LOB data
Facebook –Read More…
Oracle - LOB Pre-fetch This helps in prefetching the LOB contents, chunk size and actual size along with the statement execution, without doing additional round trips. This will be beneficial especially if LOB sizes are small.
Boxing and Unboxing is a performance hit compared to the normal assignment operations. If you are fetching high volume of data in your application, the performance hit because of this can quickly become considerable.
The DataTypes are listed in the order of the post-processing needed by data providers, after fetching them from the server.
If your application is fetching huge amount of data from the server, choosing the right datatypes can save you very important CPU cycles.
ExecuteNonQuery
- Do not fetch resultset description, which saves valuable network roundtrips.
Do not allocate/deallocate buffers to store the resultset descriptions on client/application server side.
ExecuteScalar
- As it needs just first Column of first row; it can limit the amount of data retrieved.
CommandBuilder
You can manually write a lot better (performant) update, insert or delete queries than CommandBuilder can possibly generate.
Though it is exciting to use CommandBuilder, as it provides speed of development, it hampers the application performance.
DataReader vs DataSets
If your application is fetching huge amount of read only data, DataReader is a better choice, which is memory efficient.
If you application needs to be used in disconnected environment, with non-sequential updates, then DataSet it a better choice.