In the following slides, I will share my solutions that have worked out with Django ORM. These are not only Django ORM issues but also some resolutions that have been effective with other techniques.
5. Case 1) Implementing an idempotent function
* https://docs.djangoproject.com/en/1.11/ref/models/querysets/#update-or-create
django.db.models.QuerySet.update_or_create()*
6. P2P company settles the deposit amount every day.
Borrower Repayment SettlementP2P Company Investors
7. And a human can always make a mistake.
Borrower Repayment P2P Company Investors
Settlement
2 times!?
8. We have prevented this problem with idempotent update_or_create function.
def settle(loan_id, sequence, amount):
settlement, created =
Settlement.objects.update_or_create(
loan_id=loan_id, sequence=sequence,
defaults={'amount': amount}
)
if created:
return 'Settlement completed!'
return 'You have already settled this!'
9. Case 2) Overriding Predefined Model Methods
django.db.models.Model.save()
https://docs.djangoproject.com/en/1.11/ref/models/instances/#django.db.models.Model.save
10. We needed to show a number of investors per a loan
11. but using the ‘COUNT(*)' function for every loan brings overhead.
12. We have solved this problem with denormalization
and Model.save() function
Loan
Investment Investment Investment …
count( )
13. We have solved this problem with denormalization
and Model.save() function
Loan
Investment Investment Investment …
count()
+ investor_count
14. We have solved this problem with denormalization
and Model.save() function
class Investment(models.Model):
...
def save(self, *args, **kwargs):
self.loan.investor_count += 1
self.loan.save()
super().save(*args, **kwargs)
…
15. Note 1: You can specify which fields to save with ‘update_fields’.
Note 2: If you use ‘update_fields’, ‘auto_now’ attribute doesn’t work.
https://code.djangoproject.com/ticket/22981
https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.DateField.auto_now
class Investment(models.Model):
...
def save(self, *args, **kwargs):
self.loan.investor_count += 1
self.loan.save(
update_fields=[‘investor_count’, ‘updated_at’])
super().save(*args, **kwargs)
…
17. Enum* is provided from Python 3.4.
But we can’t use it in Django.
In Django, enum is just a string.
* https://docs.python.org/3/library/enum.html
GRADES = (
('A1', 'A1'),
('A2', 'A2'),
('A3', 'A3'),
...
)
18. And we found the ‘django-model-utils*’ package.
from model_utils import Choices
class Loan(models.Model):
GRADES = Choices(
'A1', ‘A2', ‘A3', ...
)
grade = models.CharField(choices=GRADES)
...
...
def use_choices():
loans = Loan.objects.filter(
grade=Loan.GRADES.A1
)
…
* https://github.com/jazzband/django-model-utils/
19. Note: You can use integers (or other types) as an enum.
Then, integer values are stored in a database.
from model_utils import Choices
class Loan(models.Model):
GRADES = Choices(
(1, 'CODE1', 1), (2, 'CODE2', 2), ...
)
grade = models.CharField(choices=GRADES)
…
...
def use_choices():
loans = Loan.objects.filter(
grade=Loan.GRADES.CODE1
)
…
22. Our (internal) customers wanted to know the exact time of some actions.
Unfortunately, they ask them a year later!
23. Our (internal) customers wanted to know the exact time of some actions.
Unfortunately, they ask them a year later!
So we always add created_at and updated_at fields to all models.
24. And we thought that abstraction would make it better.
class TimeStampedModel(models.Model):
created_datetime = models.DateTimeField(auto_now_add=True)
updated_datetime = models.DateTimeField(auto_now=True)
class Meta:
abstract = True
class Loan(TimeStampedModel):
amount = models.FloatField()
interest = models.FloatField()
...
25. And we applied this more.
class Loan(TimeStampedModel):
amount = models.FloatField()
interest = models.FloatField()
class Meta:
abstract = True
class SecuredLoan(Loan):
security = models.ForeignKey(Security)
...
class UnsecuredLoan(Loan):
credit_grade = models.IntegerField()
...
29. class Loan(TimeStampedModel):
amount = models.FloatField(verbose_name='Amount')
interest = models.FloatField(verbose_name='Interest')
class Meta:
abstract = True
class UnsecuredLoan(Loan):
credit_grade = models.IntegerField(verbose_name='Credit Grade')
...
And we have solved this problem with abstraction again.
First of all, we added ‘verbose_name’ to all models.
31. verbose_names = ('Amount', 'Interest', 'Credit Grade')
field_names = tuple(map(partial(get_field_names_by_verbose_name, model), verbose_names))
loans = model.objects.filter(field=condition)
with xlsxwriter.Workbook('/tmp/temp.xlsx') as workbook:
worksheet = workbook.add_worksheet()
def write_row(*args):
row_num, loan = args
def get_value(field_name):
return getattr(loan, field_name)
row = tuple(map(get_value, field_names))
worksheet.write_row(row_num, 0, row)
apply(map(write_row, enumerate(loans, 0)))
Then we made a small program to get raw data.
36. Could you calculate an interest amount of a loan for this month?
Our (internal) customers asked something like below
37. Could you calculate an interest amount of a loan for this month?
Start from today, how much can we earn from that loan?
Our (internal) customers asked something like below
38. Could you calculate an interest amount of a loan for this month?
Start from today, how much can we earn from that loan?
How much principal remains for that loan?
…
Our (internal) customers asked something like below
39. Could you calculate an interest amount of a loan for this month?
Start from today, how much can we earn from that loan?
How much principal remains for that loan?
…
Calculation based on remaining principal
Summation based on remaining interest
Summation based on remaining principal
Our (internal) customers asked something like below
and we found common grounds.
40. Could you calculate an interest amount of a loan for this month?
Start from today, how much can we earn from that loan?
How much principal remains for that loan?
…
Calculation based on remaining principal
Summation based on remaining interest
Summation based on remaining principal
Our (internal) customers asked something like below
and we found common grounds.
41. So we defined some filters
.filter(
loan=loan,
status__in=REPAYMENT_STATUS.COMPLETED
)
.filter(
loan=loan
).exclude(
status__in=REPAYMENT_STATUS.COMPLETED
)
42. So we defined some filters
and moved them into a custom manager of a model
class RepaymentManager(models.Manager):
def completed(self, loan):
return self.filter(
loan=loan,
status__in=REPAYMENT_STATUS.COMPLETED
)
def not_completed(self, loan):
return self.filter(
loan=loan
).exclude(
status__in=REPAYMENT_STATUS.COMPLETED
)
class Repayment(models.Model):
objects = RepaymentManager()
...
43. So we defined some filters
and moved them into a custom manager of a model
and used it everywhere.
...
remaining_principal = Repayment.objects.not_completed(
loan=loan
).aggregate(
remaining_principal=Coalesce(Sum('principal'), 0)
)['remaining_principal']
...
46. We try to show an investor’s summary based on loan status.
Status
47. So we filtered with some conditions first.
schedules = Schedule.objects.filter(
user_id=user_id,
planned_date__gte=start_date,
planned_date__lt=end_date
)
48. And we made groups with ‘values’ statement.
If you use ‘values’ before ‘annotate' or ‘aggregate’,
it works as a ‘group by’ statement.
schedules = schedules.values('status').annotate(
cnt=Count('loan_id', distinct=True),
sum_principal=AbsoluteSum('principal'),
sum_interest=Sum('interest'),
sum_commission=Sum('commission'),
sum_tax=Sum('tax')
)
54. It was not the end. They wanted sorted results.
So we made a trick.
custom_status_annotation = Case(
When(status__in=(PLANNED, SETTLING), then=Value('02_PLANNED')),
When(status__in=(DELAYED, OVERDUE,), then=Value('03_DELAYED')),
When(status__in=(LONG_OVERDUE,), then=Value('04_LONG_OVERDUE')),
When(status__in=(SOLD,), then=Value('05_SOLD')),
default=Value(’01_COMPLETED'),
output_field=CharField(),
)
57. Case 9) Custom Functions
AbsoluteSum*
* https://gist.github.com/iandmyhand/b2c32311715113e8c470932a053a6732
58. We stored transaction values like below.
Category Value
Deposit ₩100000
Investment -₩100000
Settlement ₩100100
Withdraw -₩50100
59. If we wanted to know the balance of some user, we needed to sum all values.
Category Value
Deposit ₩100000
Investment -₩100000
Settlement ₩100100
Withdraw -₩50100
₩50000Balance
60. But our (internal) customers wanted to know total transaction amount.
Category Value
Deposit ₩100,000
Investment -₩100,000
Settlement ₩100,100
Withdraw -₩50,100
₩350,200
Total
Transaction
Amount
61. So we created custom ORM function.
class AbsoluteSum(Sum):
name = 'AbsoluteSum'
template = '%(function)s(%(absolute)s(%(expressions)s))'
def __init__(self, expression, **extra):
super(AbsoluteSum, self).__init__(
expression, absolute='ABS ', output_field=IntegerField(), **extra)
def __repr__(self):
return "SUM(ABS(%s))".format(
self.arg_joiner.join(str(arg) for arg in self.source_expressions)
)
62. And used it.
result = Statement.objects.annotate(
absolute_sum=AbsoluteSum('amount'),
normal_sum=Sum('amount')
).values(
'absolute_sum',
'normal_sum'
)
…
print(str(result.query))
SELECT
(SUM(ABS(`test`.`amount`))) AS `absolute_sum`,
(SUM(`test`.`amount`)) AS `normal_sum`
FROM `statement`
print(result['absolute_sum']) # 350200
print(result['normal_sum']) # 50000
65. Every investor can invest to a loan simultaneously,
BorrowerInvestInvestors
100,000
50,000
30,000
66. Every investor can invest to a loan simultaneously,
but we need to match the sum of investment amount
and the loan amount.
BorrowerInvestInvestors
100,000
50,000
30,000
150,000
67. So we used a transaction with a lock.
@transaction.atomic
def invest(loan_id, user_id, amount):
loan = Loan.objects.select_for_update().get(pk=loan_id)
balance = Balance.objects.select_for_update().get(user_id=user_id)
...
68. How to check programmatically that a transaction and a lock work well?
@transaction.atomic
def invest(loan_id, user_id, amount):
loan = Loan.objects.select_for_update().get(pk=loan_id)
balance = Balance.objects.select_for_update().get(user_id=user_id)
...
69. We do not know the perfect way.
So we tested it with our eyes.
@transaction.atomic
def invest(loan_id, user_id, amount):
loan = Loan.objects.select_for_update().get(pk=loan_id)
balance = Balance.objects.select_for_update().get(user_id=user_id)
time.sleep(60)
…
70. We do not know the perfect way.
So we tested it with our eyes.
Request simultaneously
71. Yes I know, this is not a good way.
So if you have a nicer way, please share that idea.
Request simultaneously
72. Note: Ordering execution of queries is the most important.
@transaction.atomic
def invest(loan_id, user_id, amount):
a = AnotherModel.objects.all().first()
loan = Loan.objects.select_for_update().get(pk=loan_id)
balance = Balance.objects.select_for_update().get(user_id=user_id)
...
The lock will not be acquired if a query without a lock executed first.
74. We split DB into two instances.
One is for a bank, and another is for our product.
Bank Customers
Database
for internal products
Database
for a bank
75. But we needed to tie two databases in one transaction.
Bank Customers
Database
for internal products
Database
for a bank
One transaction
76. There is one tricky way to solve this problem.
That is using transaction statement twice.
with transaction.atomic(using='default'):
with transaction.atomic(using='bank'):
peoplefund = PeoplefundModel.objects.select_for_update().get(pk=loan_id)
bank = BankModel.objects.select_for_update().get(user_id=user_id)
...
79. We needed to show some information from both an investment and a loan model.
from an investment modelfrom a loan model
80. And we wrote some codes like below.
def get_investments(user_id):
result = []
investments = Investment.objects.filter(user_id=user_id)
for investment in investments:
element = {
'investment_amount': investment.amount,
'loan_title': investment.loan.title,
}
result.append(element)
81. And it was getting slower as time goes by,
because we did not know how Django ORM works.
def get_investments(user_id):
result = []
investments = Investment.objects.filter(user_id=user_id)
for investment in investments:
element = {
'investment_amount': investment.amount,
'loan_title': investment.loan.title,
}
result.append(element)
When the process reaches this point for the first time,
Django ORM takes all of the investments
But at this point, Django ORM takes one loan per each iteration!
82. There is a simple way. Using select_related.
If you use it, process takes all related objects at once.
def get_investments(user_id):
result = []
investments = Investment.objects.select_related('loan').filter(user_id=user_id)
for investment in investments:
element = {
'investment_amount': investment.amount,
'loan_title': investment.loan.title,
}
result.append(element)