This document provides an overview of advanced Hibernate concepts including:
- The persistence lifecycle and context
- Object equality, flush modes, and the persistence manager
- Transactions, contextual sessions, concurrency, caching
- Bulk operations, filters, interceptors, the event system
- Fetching plans and strategies for loading related objects
persistence lifecycle: the states an object goes through during its life However, one of the key factors of your success with Hibernate is your understanding of state management Transient: transient, which means they aren’t associated with any database table row and so their state is lost as soon as they’re no longer referenced by any other object. any modification of a transient instance isn’t known to a persistence context (they are not transactional) Objects that are referenced only by other transient instances are transient. Persistent: An entity instance with a database identity Persistent instances are always associated with a persistence context Detached: detached, indicating that their state is no longer guaranteed to be synchronized with database state (they’re no longer attached to a persistence context) Hibernate offers two operations, reattachment and merging, to reattach .
One session has one internal persistence context (it isn’t something you see in your application) It is useful because we can extend it for a long conversation.
Automatic dirty check: Hibernate propagates state changes to the database as late as possible but hides this detail from the application (to keep lock times in the Database as short as possible) Hibernate is able to detect exactly which properties have been modified. it’s possible to include only the columns that need updating in the SQL UPDATE (This may bring some performance gains. However, it’s usually not a significant difference and, in theory, could harm performance in some environments). The setting in the class mapping file. We recommend you consider this setting when you have an extraordinarily large number of columns in a table (say, more than 50);
Using Database identifier we can know whether two objects match the same record in the database or no. In the most cases the Database identifier is the record primary key The question is it possible to have two different objects match the same record? Or is it always two objects will match two different object. For the scope of object identity we have three common choices: no identity scope: we got the record twice in the same trxn then we will get two different objects if we made two different changes on the two objects. How should we decide which state should be propagated to the Database? session-scoped identity : process scoped identity: one object in the whole JVM. The problem is that we need to synchronize the objects in the multithreaded systems which is expensive.
Do not treat detached objects as identical in memory
Set does not allow duplicate entries, to determine whether the object is exist or no it uses the equals() The default impl of equals uses the == The Answer: If the equals() method implemented the size is one, otherwise it is two. This can lead to problems if you treat them as equal in detached state.
Database identifier can’t be used for transient objects All properties: may cause un equal objects in case two objects from two sessions where one object has been changed
Database identifier for associated object: For example, a candidate business key for the Bid class is the identifier of the Item it was made for together with the bid amount. Another example user name and company id as business key for the user
Write-behinds usefulness: coalesce many changes into a minimal number of database requests Shorter lock durations inside the database. Take advantage of the JDBC Batch.
Persistence manager exposed in many interfaces: Session, Query, Criteria and Transaction Manager services CRUD, Query execution, Trxn mgmt, management of persistence context. SAVE operation: Call to save() does not mean (all the time) an immediate SQL insert stmt execution. That’s depend on the id generator. It’s better to fully initialize the object before managing it with a session.
get(): return null if object not exist. Always return object load(): throws ObjectNotFoundException Return proxy (placeholder, wrapper for the id)
Here we use the automatic dirty check service to get whether the object changed or no
The item object is in removed state after you call delete(); you shouldn’t continue working with it, and, in most cases, you should make sure any reference to it in your application is removed. Do I have to load an object to delete it? Yes, an object has to be loaded into the persistence context; an instance has to be in persistent state to be removed (note that a proxy is good enough). The reason is the interceptors. Otherwise use Bulk operations. hibernate.use_identifier_rollback configuration option: Hibernate sets the database identifier property of the deleted item to null after deletion and flushing. It’s then a clean transient instance.
Replication: retrieve objects from one database and store them in another. ■ ReplicationMode.IGNORE —Ignores the object when there is an existing database row with the same identifier in the target database. ■ ReplicationMode.OVERWRITE —Overwrites any existing database row with the same identifier in the target database. ■ ReplicationMode.EXCEPTION —Throws an exception if there is an existing database row with the same identifier in the target database. ■ ReplicationMode.LATEST_VERSION —Overwrites the row in the target database if its version is earlier than the version of the object, or ignores the object otherwise. Requires enabled Hibernate optimistic concurrency control.
It doesn’t matter if the item object is modified before or after it’s passed to update() . The important thing here is that the call to update() is reattaching the detached instance to the new Session (and persistence context). Hibernate always treats the object as dirty and schedules an SQL UPDATE ., which will be executed during flush. select-before-update=“true” (mapping config) : Hibernate determines whether the object is dirty by executing a SELECT statement and comparing the object’s current state to the current database state
Changes made before the call to lock() aren’t propagated to the database
Database transactions have to be short (to keep resources available). In practice, you also need a concept, that allows you to have long-running conversations. Conversations allow the user of your application to have think-time, while still guaranteeing atomic, isolated, and consistent behavior.
Hibernate doesn’t roll back in-memory changes to persistent objects
The main benefit, however, is tight integration with persistence context management —for example, a Session is flushed automatically when you commit
Is it faster to roll back read-only transactions? some developers found this to be faster in some special circumstances The book authors tested this with the more popular database systems and found no difference. And there is no reason why a database system should have a suboptimal implementation No source of real numbers showing a performance difference Always commit your transaction and roll back if the commit fails.
If you configure Hibernate to use CMT, it knows that it should flush and close a Session that participates in a system transaction automatically.
Using the jta session context if no current session associated with the current JTA transaction one will be started and associated with that transaction. Note that for backwards compatibility, if hibernate.current_session_context_class is not set but a org.hibernate.transaction.TransactionManagerLookup is configured, Hibernate will use the org.hibernate.context.JTASessionContext The Sessions retrieved via getCurrentSession() in "jta" context will be set to automatically flush before the transaction completes, close after the transaction completes, and aggressively release JDBC connections after each statement.
These options work in case using JDBC transaction but will not work in case datasource
From time to time, it’s useful to specify a more restrictive lock for a particular transaction switching all database connections to a higher isolation level than read committed, but this is a bad default when scalability of the application is a concern. You need better isolation guarantees only for a particular unit of work
From time to time, it’s useful to specify a more restrictive lock for a particular transaction switching all database connections to a higher isolation level than read committed, but this is a bad default when scalability of the application is a concern. You need better isolation guarantees only for a particular unit of work
Trans action scope cache— Attached to the current unit of work, which may be a database transaction or even a conversation. It’s valid and used only as long as the unit of work runs. Every unit of work has its own cache. Data in this cache isn’t accessed concurrently. ■ Process scope cache— Shared between many (possibly concurrent) units of work or transactions. This means that data in the process scope cache is accessed by concurrently running threads, obviously with implications on transaction isolation. ■ Cluster scope cache— Shared between multiple processes on the same machine or between multiple machines in a cluster. Here, network communication is an important point worth consideration.
it’s neither necessary nor desirable to have identical objects in two concurrent threads. Locks held in memory should be avoided for web and enterprise applications. In cluster the communication is required for consistency Process-scoped cache (Return by value used by hibernate as second level cache) Cluster-scoped cache (it might be the second level cache)
Transactional—Available in a managed environment only, it guarantees full transactional isolation up to repeatable read, if required. Use this strategy for read-mostly data where it’s critical to prevent stale data in concurrent transactions, in the rare case of an update. Read-write—This strategy maintains read committed isolation, using a timestamping mechanism and is available only in nonclustered environments. Again, use this strategy for read-mostly data where it’s critical to prevent stale data in concurrent transactions, in the rare case if an update. Nonstrict-read-write—Makes no guarantee of consistency between the cache and the database. If there is a possibility of concurrent access to the same entity, you should configure a sufficiently short expiry timeout. Otherwise, you may read stale data from the cache. Use this strategy if data hardly ever changes (many hours, days, or even a week) and a small likelihood of stale data isn’t of critical concern. ■ Read-only—A concurrency strategy suitable for data which never changes. Use it for reference data only.
Session.evict(object) to remove an element from the first level cache Session.clear() to remove all items in the first level cache SessionFactory.evict(class) remove from the second level cache all the instances with the same type SessionFactory.evict(class, id) remove the element from the second level cache
VIPCustomer must be subclass of Customer.
But instead of retrieving the result of the query completely into memory, you open an online cursor. A cursor is a pointer to a result set that stays in the database. To avoid memory exhaustion, you flush() and clear() the persistence context before loading the next 100 objects into it.
Note that you should disable the second-level cache for any batch operations; otherwise, each modification of an object during the batch procedure must be propagated to the second-level cache for that persistent class. This is an unnecessary overhead.
Filter is an alternative to dynamic database view with dynamic parameterization at runtime. For example, the currently logged-in application user may not have the rights to see everything.
Filter is an alternative to dynamic database view with dynamic parameterization at runtime. For example, the currently logged-in application user may not have the rights to see everything.
Useful for the audit log of all object modifications Do not use the session in the interceptor rather than use temporary session Interceptor has many other methods. You can use marker interface to mark all classes that should be intercepted. postFlush() is the correct place for logging (auditing) to guarantee the id existence.
You can create a stack of listeners
The figure: the load does not execute any SQL statement till it is necessary this is opposite to the get() method which always hits the DB. Load returns a proxy Proxies are placeholders that are generated at runtime Proxy is a sub class of the actual bean, so it is required to make all hibernate beans with no private const and no param cons
Some times you need to an entity should always be loaded into memory and no place holder should be returned instead. Disabling proxy generation for a particular entity : load() method will try to load the userBean without initialization but when it tries to create company proxy it can’t because it is disabled so it will make an immediate SELECT stmt to fetch both the userBean and companyBean. Some times you want to specify that a particular association should always be loaded
The goal of Fetching strategy is to minimize the number of SQL statements and to simplify the SQL statements, so that querying can be as sufficient as possible.
In this example: telling Hibernate to pre-fetch up to 10 uninitialized proxies Instead of n+1 selects, you now see n/10+1 selects to retrieve the required collections into memory
Obviously, the seller is no longer lazily loaded on demand, but immediately with join. with lazy="false", you see an immediate second SELECT With fetch="join", you get the seller loaded in the same single SELECT Max fetch depth: Recommended values from 1-5 tables Max fetch depth: 0 will disable join fetching Do not use global fetch join rather than use dynamic fetching
FlushMode.COMMIT: does not flush before the query execution (when you don’t need to flush your modifications to the database before executing a query, because conflicting results aren’t a problem) CacheMode.IGNORE: any object retrieved by this query isn’t put in the second-level cache. setReadOnly(): disable the auto-dirty check setTimeout(): long a query is allowed to run by setting a timeout setFetchSize(): can improve data retrieval if you execute a query with list() setLockMode(): pessimistic lock (a lock is held until the end of the database transaction) setComment(): hibernate add your custom comment to each sql stmt it writes to the logs
List: One or several SELECT statements are executing immediately, depending on your fetch plan. Iterate Hibernate retrieves only the primary key (identifier) values of entity objects in a first SQL SELECT, and then tries to find the rest of the state of the objects in the persistence context cache, and (if enabled) the second-level cache. Otherwise it will fetch the record using an additional SELECT for each turn. It is effective only if the second level cache enabled for the iterated entity. Otherwise it will execute n+1 problem. Scroll Uses a cursor that is held on the database The cursor points to a particular row in the result of a query, and the application can move the cursor forward and backword. Use it when you need to execute statements that may return too large results that you want to keep at the database level. For example you need all the data but want to retrieve it in several steps. Stop the curose before database transaction.
Any function that is called in the WHERE clause of an HQL statement, and that isn’t known to Hibernate, is passed directly to the database, as an SQL function call. You have to register a function in your org.hibernate.Dialect to enable it for the SELECT clause in HQL.
These are scalar values, not entity instances. Therefore, they aren’t in any persistent state, like an entity instance would be. They aren’t transactional and obviously aren’t checked automatically for dirty state. The Object[]s returned by this query contain a Long at index 0, a String at index 1, and int at index 2 Demo how to use aggregate functions in the where and select. Demo how to use SQL functions in the where and select. The StudentSummary class is a Java bean, it doesn’t have to be a mapped persistent entity class. On the other hand, if you use the SELECT NEW technique with a mapped entity class, all instances returned by your query are in transient .
One property List of properties Aggregation functions Dynamic instantiation
Implicit join can be used in many-to-one or one-to-one Demo how to use querying components with where How to use in select
you have to assign an alias to a joined association Can be used for restrictions (inner join) Query1: This query returns list of Object[] object[0] is user and object[1] is company Query2: return only list of users rather than pair of users and its company as pair (if you try to access the company from user “user.getCompany()” the company will be loaded lazily). Addresses are not initialized tell they used. And user will be duplicated for each of his addresses Using DISTINCT No duplicates will be returned
Inner join createCriteria for *-to-one fetch associated entity eagerly while in associated collections it will be lazy loaded
List of users each user has an initialized company Distinct to prevent the duplicate data Uses the outer join Do not use iterate() Do not use setMaxResults In both queries will return list of users and its company in the same query (even if lazy enabled the both entities will be loaded immediately). Don’t fetch more than one collection in parallel (Cartesian product). You can fetch as many single-valued associated objects as you like. Dynamic fetching ignore any fetching strategy you have defined in mapping metadata. Reference duplicate may return, so you can use DISTINCT or Set to remove duplicates.
createCriteria for *-to-one fetch associated entity eagerly while in associated collections it will be lazy loaded In case *-to-one by default it will be inner join createCriteria() in case *-to-one fetch eagerly and using inner join In case *-to-one setFetchMode Can be used as createCriteria but you can not add any restrictions on the associated entity In case associated collections I have to use the setFetchMode() to initialize the collection, but it will use the left outer join I have to use the CriteriaSpecification.INNER_JOIN if I want to use the eager fetching with inner join (note there is duplicates) Remove any duplicates
The query result consists of ordered pairs
Hibernate lets you externalize query strings to the mapping file.
FlushMode.COMMIT: does not flush before the query execution (when you don’t need to flush your modifications to the database before executing a query, because conflicting results aren’t a problem) CacheMode.IGNORE: any object retrieved by this query isn’t put in the second-level cache. setReadOnly(): disable the auto-dirty check setTimeout(): long a query is allowed to run by setting a timeout setFetchSize(): can improve data retrieval if you execute a query with list() setLockMode(): pessimistic lock (a lock is held until the end of the database transaction) setComment(): hibernate add your custom comment to each sql stmt it writes to the logs