2. Index
Current approach (sum of entries)
● Current approach explained.
● Performance analysis.
Proposal: “Precalculated period sums”
● Alternative 1: Accumulated values using triggers
– Proposed by Ferdinand Gassauer (Chricar)
● Alternative 2: Period totals using the ORM
– Proposed by Borja L.S. (@NeoPolus)
● Current approach vs Precalculated period sums
3. Current approach: Sum of entries
Currently each time you read the
credit/debit/balance of one account OpenERP has
to recalculate it from the account entries
(move lines).
The magic is done by the “_query_get()” method
of account.move.line, that selects the lines to
consider, and the “__compute()” method of
account.account that does the sums.
4. Inside the current approach
_query_get() filters: builds the “WHERE” part
of the SQL query that selects all the account
move lines involving a set of accounts.
● Allows to do complex filters, but usually look like
“include non-draft entries from these periods for these
accounts”.
__compute() sums: uses the filter to query for
the sums of debit/credit/balance for the current
account and its children.
● Does just one SQL query for all the accounts. (nice!)
● Has to aggregate the children values on python.
5. Sample query done by __compute
SELECT l.account_id as id,
COALESCE(SUM(l.debit), 0) as debit,
COALESCE(SUM(l.credit), 0) as credit,
COALESCE(SUM(l.debit),0) -
COALESCE(SUM(l.credit), 0) as balance
FROM account_move_line l Account + children = lot of ids!
WHERE l.account_id IN (2, 3, 4, 5, 6, ...,
1648, 1649, 1650, 1651) AND l.state <>
'draft' AND l.period_id IN (SELECT id FROM
account_period WHERE fiscalyear_id IN (1))
AND l.move_id IN (SELECT id FROM account_move
WHERE account_move.state = 'posted')
GROUP BY l.account_id
6. Sample query plan
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=57.83..57.85 rows=1 width=18)
-> Nested Loop Semi Join (cost=45.00..57.82 rows=1 width=18) Ugh!, sequential scan
Join Filter: (l.period_id = account_period.id) on a table with (potentially)
-> Nested Loop (cost=45.00..57.52 rows=1 width=22)
lots of records... :(
-> HashAggregate (cost=45.00..45.01 rows=1 width=4)
-> Seq Scan on account_move (cost=0.00..45.00 rows=1 width=4)
Filter: ((state)::text = 'posted'::text)
-> Index Scan using account_move_line_move_id_index on account_move_line l (cost=0.00..12.49 rows=1 width=26)
Index Cond: (l.move_id = account_move.id)
Filter: (((l.state)::text <> 'draft'::text) AND (l.account_id = ANY ('{2,3,4,5, ...,
1649,1650,1651}'::integer[])))
-> Index Scan using account_period_fiscalyear_id_index on account_period (cost=0.00..0.29 rows=1 width=4)
Index Cond: (account_period.fiscalyear_id = 1)
7. Performance Analysis
Current approach big O 1/2
“Selects all the account move lines”
The query complexity depends on l, the
number of move lines for that account and
(recursive) children:
O(query) = O(f(l))
“Has to aggregate the children values”
The complexity depends on c, the number of
children.
O(aggregate) = O(g(c))
8. Current approach big O 2/2
O(__compute) = O(query) + O(aggregate)
O(__compute) = O(f(l)) + O(g(c))
What kind of functions are f and g?
Let's do some empiric testing (funnier than
maths, isn't it?)...
9. Let's test this chart... 1/2
The official Spanish
chart of accounts, when
empty:
Has about 1600
accounts.
Has 5 levels.
(to test this chart of
accounts install the
l10n_es module)
10. Let's test this chart... 2/2
How many accounts
below each level?
Account code Number of
children
(recursive)
Level 5 – 430000 0
(leaf account)
Level 4 - 4300 1
Level 3 - 430 6
Level 2 - 43 43
Level 1 - 4 192
Level 0 – 0 1678
(root account)
To get the balance of account “4” we need to sum the balance of 192 accounts!
11. Ok, looks like the number of children c has a
lot of influence, and the number of moves l
has little or zero influence, g(c) >> f(l)
Lets split them...
12. Now it is clear that g(c) is linear!
(note: the nº of children grows exponentially)
O(g(c)) = O(c)
14. Big O - Conclusion
O(__compute) = O(l) + O(c)
c has an unexpectedly big influence on the
results
=> Bad performance on complex charts of
accounts!
c does not grow with time, but l does...
=> OpenERP accounting becomes slower and
slower with time! (though it's not that bad as expected)
15. Proposal: Precalculated sums
OpenERP recalculates the debit/credit/balance
from move lines each time.
Most accounting programs store the totals per
period (or the cumulative values) for each
account. Why?
● Reading the debit/credit/balance becomes much
faster.
● ...and reading is much more data intensive than
writing:
– Accounting reports read lots of times lots of accounts.
– Accountants only update a few accounts at a time.
16. It's really faster?
Precalculated sums per period means:
● O(p)query (get the debit/credit/balance of each
period for that account) instead of O(l)query, with
p being the number of periods, p << l.
Using opening entries, or cumulative totals, p
becomes constant => O(1)
● If aggregated sums (with children values) are also
precalculated, we don't have to do one
O(c)aggregation per read.
It's O(1) for reading!!
(but creating/editing entries is a bit slower)
17. Alternative 1: Accumulated values
using triggers (I)
Proposed by Ferdinand Gassauer.
How does it work?
● New object to store the accumulated
debit/credit/balance per account and period (let's
call it account.period.sum).
Opening 1st 2nd 3rd 4th
Move line values 400 +200, +25 -400 +25,
in period +50 +200
Value in table 400 650 675 275 500
● Triggers on Postgres (PL/pgSQL) update the
account_period_sum table each time an account
move line is created/updated/deleted.
18. Alternative 1: Accumulated values
using triggers (II)
How does it work?(cont.)
● The data is calculated accumulating the values from
previous periods. (Ferdinand prototype requires an special naming of
periods for this).
● Creates SQL views based on the account
account_period_sum table.
● For reports that show data aggregated by period:
– New reports can be created that either directly use the
SQL views, or use the account.period.sum object.
● The account.account.__compute() method could be
extended to optimize queries (modified to make use
of the account_period_sum when possible) in the
future.
19. Alternative 1: Accumulated values
using triggers (III)
Good points Bad points
Triggers guarantee that Database dependent
the data is always in triggers.
sync.
(even if somebody writes directly to Triggers are harder to
the database!) maintain than Python
Triggers are fast. code.
Prototype available and Makes some
working! - “used this method assumptions on period
already in very big names.
installations - some 100 (as OpenERP currently does
not flag opening periods apart
accountants some millions
from closing ones)
moves without any problems”
(Ferdinand)
20. Alternative 2: Period totals using the
ORM (I)
Proposed by Borja L.S. (@NeoPolus).
How does it work?
● New object to store the debit/credit/balance sums
per account and period (and state):
Opening 1st 2nd 3rd 4th
Move line values 400 +200, +25 -400 +25,
in period +50 +200
Value in table 400 250 25 -400 225
● Extends the account.move.line open object to
update the account.sum objects each time a line is
created/updated/deleted.
21. Alternative 2: Period totals using the
ORM (II)
How does it work?(cont.)
● Extends account.account.__compute() method to
optimize queries:
– If the query filters only by period/fiscal year/state, the
data is retrieved from the account.sum object.
– If the query filters by dates, and one ore more fiscal
periods are fully included on that range, the data is
retrieved from for the account.sum objects (for the range
covered by the periods) plus the account.move.lines (the
range not covered by periods).
– Filtering by every other field (for example partner_id)
causes a fallback into the normal __compute method.
22. Alternative 2: Period totals using the
ORM (III)
Good points Bad points
Database Does not guarantee
independent. that the sums are in
Optimizes all the sync with the move
accounting. lines.
(but nobody should directly alter
the database in first place...)
Flexible.
No PL/pgSQL triggers Python is slower than
required, just Python using triggers.
=> Easier to maintain. No prototype yet! :)
(But take a look at
Tryton stock quantity computation)
23. Current approach VS Period sums
Current approach Precalculated sums
Pros Pros
● No redundant data. ● Fast, always.
● Simpler queries. ● Drill-down navigation.
Cons Cons
● Slow. ● Need to keep sums in
– Reports and sync with move lines.
dashboard ● More complex
charts/tables are
(__compute) or
performance hungry.
specific queries to
● Becomes even slower make use of the
with time. precalculated sums.