Uladzimir Kalashnikau (EPAM Systems): Magento 2 Import/Export: Performance Challenges and Victories We Got at Open Source Ecommerce
Владимир Калашников (EPAM Systems): Импорт/экспорт для Magento 2: решение проблем производительности и наши успехи в open source e-commerce
2. How EPAM was engaged by Magento
History:
Magento was looking to outsource some product development work to extend
internal team capacity. EPAM’s “Developer’s Developer” reputation, and our
experience in product development for other eCommerce platforms helped
Magento to recognize us as partners.
Benefits for Magento:
• Outsource part of functionality to extend internal team capacity
• Deliver more new features with Merchant Beta and GA releases
Benefits for EPAM:
• Improve our knowledge of Magento 2.0
• Align development approaches and best practices with Magento core team
3. Goals of project
• Improve import-export functionalities for products/customers
• Implement new functionality to import/export prices
• Change obsolete file format for import/export purposes
• Optimize import/export performance
• Improve error processing for import/export operations
• All functionality should be covered with the tests and correspond to Magento
coding standards
4. Acceptance criteria
Import procedure should be a linear process for Magento framework and number of
records in a single file should not exponentially increase process time until the
bottleneck is a MySQL server itself:
Run #1 100k 30 min
simple_products = 60000
configurable_products = 20000 (each configurable
has 3 simple products as options)
bundle_products = 10000 (each bundle product
has 3 simple products as options)
grouped_products = 10000 (each grouped product
has 3 simple products as options)
categories = 1000
categories_nesting_level = 3
Each product has 2 images attached using local
storage only.
Number of product attribute sets = 100
Number of attributes per product = 10
Total Number of attributes = 1000
Run #2 200k 1 hour
simple_products = 120000
configurable_products = 40000 (each configurable has
3 simple products as options)
bundle_products = 20000 (each bundle product has 3
simple products as options)
grouped_products = 20000 (each grouped product has
3 simple products as options)
categories = 1000
categories_nesting_level = 3
Each product has 2 images attached using local
storage only.
Number of product attribute sets (product templates) =
100
Number of attributes per product = 10
Total Number of attributes = 1000
Import process shouldn’t affect frontend load time more than 20% of average
page load, metered by JMeter
6. Why it’s not so simple?
Product
Media
images
Categories
Links to
other
products
Taxes
Custom
options
Custom
attributes
Complex
products
attributes
• Product – is a key entity for eCommerce
• DB uses EAV model for data storage
• Product has many linked entities
Product
types
Simple
Virtual
Configurable
BundleGrouped
Virtual
Gift cards
(EE only)
8. One of the concepts for import
optimization
Append data to
model
Prepare data for
insert
Query to DB
Get imported data
Retrieve data ready
to insert
Create multi-insert
query
Standard saving process
Multi-insert process
9. How it’s actually working
Standard saving process
Multi-insert process
Append data to
model
Prepare data for
insert
Query to DB
Get imported data
Create multi-insert
query
Prepare data for
insert
10. Sort products
from simple to
complex
Divide full pack
to bunches of 50
products in each
Import full bunch
of products in
one query
Retrieve Ids of
inserted/updated
products
Import connected
entities one by
one
Bunch import idea
11. • Importing of 500k products on cluster – nearly 4-5h
• Creating URL rewrites for them – nearly 12h
• Total time: 17h
• Need to be less that 2.5h
Before optimizations takes a place
12. XHProf is a function-level hierarchical profiler for PHP and has a simple HTML based
navigational interface. The raw data collection component is implemented in C (as a PHP
extension). The reporting/UI layer is all in PHP. It is capable of reporting function-level
inclusive and exclusive wall times, memory usage, CPU times and number of calls for each
function. Additionally, it supports ability to compare two runs (hierarchical DIFF reports), or
aggregate results from multiple runs.
• More lightweight and faster than xDebug
• Hierarchical reports with memory and CPU usage show
• Ability to create call-graph image based on report
• Ability to create summary report based on couple of runs
T - Technology
MAIN ABILITIES
DESCRIPTION
13. How to implement XHProf
<?
//Initialize XHProf
xhprof_enable(XHPROF_FLAGS_CPU + XHPROF_FLAGS_MEMORY);
//Run our code
run();
//Stop profiler and retrieve profiling data
$xhprof_data = xhprof_disable();
//Generate report
include_once "/var/www/xhprof-0.9.4/xhprof_lib/utils/xhprof_lib.php";
include_once "/var/www/xhprof-0.9.4/xhprof_lib/utils/xhprof_runs.php";
$xhprof_runs = new XHProfRuns_Default();
$run_id = $xhprof_runs->save_run($xhprof_data, "test");
17. • Static (one-time):
– Mostly affects small size import
– On large pack of imported products hard to find
• Linear:
– Hard to detect on small size import, because of static
bottlenecks
– Takes almost same percent on medium and big packs
• Exponential:
– Hard to find on small/medium size of import pack
– Could be detected on big pack of products
Bottlenecks, classification
18. ― Generate queue
― Create number of workers
― Pray that it won’t affect frontend loading time
Pros:
• We could use several processor cores to increase data
process speed
Cons:
• Troubles with disabled thread/system functions due to
security reasons
• Potential risks to frontend loading time tests
• Quite complex mechanism to implement
• Potential risks of rows/tables lock lags due to parallel read-
write to single DB
Approaches to optimization
Implement multi-processing
― Change attribute load process
― Change URL Rewrites save process
― Implement effective plugin cache
― Other small optimizations
Pros:
• We could deliver by iterations
• Less shit-code
Cons:
• We don’t know capability of such fixes to deliver
performance increase
• These changes could affect tests and core processes
Find and fix bottle-necks
19. • Time, quality - what should we prefer on really dirty code?
• Import/export functionality is a part of MTF (Magento testing framework) so changing it
brakes tests
• Results are affected by the size of import file
• Results varies on different DB data and we didn’t have etalon DB
• Long time to get report
• To detect exponential bottlenecks we should compare reports on different import files
• How to import related entities if we haven’t got an unique key?
• Memory usage vs. queries to DB
• How to compare elephant and fly if we don’t know real server configuration?
• XHProf lies, we cannot be sure in results and should use it only as a guideline
Difficulties in optimization
21. Interceptors idea
Main class
Method 1
Method 2
Method 3
Interceptors covered class
extends Main class
Method 1
Method 2
Method 3
Method 1
Before plugin call
Around plugin call
After plugin call
24. Example of optimizing static bottleneck
Load list of
product types
Product
left for
init?
Load attribute
entities for the
product
Load data for the
attribute
Start init
End init
Get next product
type
yes
no
Load list of
product types
Product
left for
init?
Load absent
attribute entities
by Id
Load data for the
attributes
Start init
End init
Get next product
type
yes
no
Load attributes Ids
by product type
Is every
attribut
e in
cache?
Add an attributes
to cache
Get an attributes
from cache by id
no
yes
before after
25. Cache reusability on URL Rewrite
example
Get category
from DB
Category
exist?
Create category
Start creating
category
Any
categories
left?
End creating
category
yes
yes
no
no
Start creating
URL rewrites
Get category
from DB
Create URL
rewrite
Any
categories
left?
yes
End creating
URL rewrites
no
Get category
from cache
Place to cache
Get category
from cache
Get all
categories and
place to cache
26. Global URL Rewrite optimization
Get IDs of all
inserted/update
d products
Produ
cts
exists?
Start creating
URL
Rewrites
Load
categories for
the product
Load product
attributes
Load
categories
attributes
Load next
product by Id
End creating
URL
Rewrites
Generate
URLs for the
product
Generate
URLs for the
categories
Generate
URLs for the
websites
Insert URLs for
current product
yes
no
Get array of
products from
bunch
Produ
cts
exists?
Start creating
URL
Rewrites
Get categories
from the cache
Populate data
for one product
to object from
array
End creating
URL
Rewrites
Generate URL
for the product
Generate
URLs for the
categories
Generate
URLs for the
websites
Store URLs in
temporary
cache
yes
no
Multi-insert
URLs from the
cache
before after
27. • CPU 4 physical cores 3.5GHz (2 for VM)
• L2 cache 1 Mb
• L3 cache 6 Mb
• RAM 16GB
• SATA3 HDD (64 Mb buffer)
How to compare performance?
First config
• CPU 2 physical cores with Hyper-Thread 3.2GHz (2 for VM)
• L2 cache 512 Kb
• L3 cache 4 Mb
• RAM 8GB
• SATA1 HDD (16 Mb buffer)
Total time: ~50mTotal time: ~16.5m
Second config
29. Magento 2 Merchant Beta Release
We are tremendously excited to announce that today we reached another significant development
milestone with the release of the Magento 2 Merchant Beta. This release brings us to the last stage
before the general availability (GA) of Magento 2 in Q4 2015.
…
• The Enterprise Edition module includes updates to merchant features like import/export
functionality, configurable swatches, transactional emails and more.
• It demonstrates significant performance improvements for both the Magento Community
Edition and Enterprise Edition with holistic updates to both server-side and client-side architecture.
Server-side updates include out of box Varnish 4, full page caching, and support for HHVM3.6.
Client-side updates include static content caching in browser, image compression, use of jQuery,
and RequireJS for better management of JavaScript and bundling to reduce file download counts.
News link: http://magento.com/blog/technical/magento-2-merchant-beta-release
Our changes goes in release!