SlideShare une entreprise Scribd logo
1  sur  70
Télécharger pour lire hors ligne
How to import 1 million SKUs
in under 10 minutes
1/26
Building an import correctly is hard
2/26
2008
it all started with
DataFlow
3/26
DataFlow
4/26
DataFlow
Stores CSV
records into
database
4/26
DataFlow
Stores CSV
records into
database
Imports product
by product via
AJAX call
4/26
DataFlow
Stores CSV
records into
database
Imports product
by product via
AJAX call
Uses product
model to save
data
4/26
DataFlow
Stores CSV
records into
database
Imports product
by product via
AJAX call
Uses product
model to save
data
Closing the
browser window
stops the whole
process
4/26
Speed
2-3 products per second
5/26
Speed
2-3 products per second
~ 20 minutes for 5k products
5/26
2011
Import/Export saved us all
6/26
ImportExport
7/26
ImportExport
Stored batches of
data into the
database during
validation
7/26
ImportExport
Stored batches of
data into the
database during
validation
Processes stored
data in one HTTP
request
7/26
ImportExport
Stored batches of
data into the
database during
validation
Processes stored
data in one HTTP
request
Validates product
data without using
product model
7/26
ImportExport
Stored batches of
data into the
database during
validation
Processes stored
data in one HTTP
request
Validates product
data without using
product model
Uses multi-row
inserts to
populate tables
7/26
ImportExport
Stored batches of
data into the
database during
validation
Processes stored
data in one HTTP
request
Validates product
data without using
product model
Uses multi-row
inserts to
populate tables
Does not run
indexers
7/26
Speed
41 product per second
8/26
Speed
41 product per second
~ 2 minutes for 5k products
8/26
But there are some
drawbacks
9/26
High memory usage on large datasets
But there are some
drawbacks
9/26
High memory usage on large datasets
Slow in generating primary keys for new products
But there are some
drawbacks
9/26
2015
Magento 2.x
Import Export
10/26
ImportExport M2
11/26
Same base functionality as in M1
ImportExport M2
11/26
Same base functionality as in M1
More complex file format to edit and parse
ImportExport M2
11/26
Same base functionality as in M1
More complex file format to edit and parse
Slower on complex product data
ImportExport M2
11/26
Same base functionality as in M1
More complex file format to edit and parse
Slower on complex product data
Adds additional single statement inserts
ImportExport M2
11/26
2019
I got an idea and a project to implement it on
12/26
Separate Feeds
13/26
Separate Feeds
Main entity (sku, type, set)
13/26
Separate Feeds
Main entity (sku, type, set)
Attributes (sku, attribute, store, value)
13/26
Separate Feeds
Main entity (sku, type, set)
Attributes (sku, attribute, store, value)
Category (sku, category slug, position)
13/26
Separate Feeds
Main entity (sku, type, set)
Attributes (sku, attribute, store, value)
Category (sku, category slug, position)
Configurable Options (sku, attribute, label)
13/26
Separate Feeds
Main entity (sku, type, set)
Attributes (sku, attribute, store, value)
Category (sku, category slug, position)
Configurable Options (sku, attribute, label)
Images (sku, image)
13/26
Separate Feeds
Main entity (sku, type, set)
Attributes (sku, attribute, store, value)
Category (sku, category slug, position)
Configurable Options (sku, attribute, label)
Images (sku, image)
...
13/26
Lazy Entity Resolving
14/26
Lazy Entity Resolving
Reduce memory requirements of the import
14/26
Lazy Entity Resolving
Reduce memory requirements of the import
Cleaner and more readable feed processing
14/26
Lazy Entity Resolving
Reduce memory requirements of the import
Cleaner and more readable feed processing
Possibility of acquiring entity ids in batches
automatically
14/26
Lazy Entity Resolving
$resolver = $this->resolverFactory->createSingleValueResolver(
'catalog_product_entity', 'sku', 'entity_id'
);
 
$insert = InsertOnDuplicate::create(
'catalog_product_entity_varchar',
['entity_id', 'attribute_id', 'store_id', 'value']
)->withResolver($resolver);
 
$insert
->withRow($resolver->unresolved('sku1'), 1, 0, 'some value')
->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1')
->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2');
15/26
Configure table lookup information
Lazy Entity Resolving
$resolver = $this->resolverFactory->createSingleValueResolver(
'catalog_product_entity', 'sku', 'entity_id'
);
 
$insert = InsertOnDuplicate::create(
'catalog_product_entity_varchar',
['entity_id', 'attribute_id', 'store_id', 'value']
)->withResolver($resolver);
 
$insert
->withRow($resolver->unresolved('sku1'), 1, 0, 'some value')
->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1')
->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2');
15/26
Pass resolver into insert builder
Lazy Entity Resolving
$insert = InsertOnDuplicate::create(
'catalog_product_entity_varchar',
['entity_id', 'attribute_id', 'store_id', 'value']
)->withResolver($resolver);
$resolver = $this->resolverFactory->createSingleValueResolver(
'catalog_product_entity', 'sku', 'entity_id'
);
 
 
$insert
->withRow($resolver->unresolved('sku1'), 1, 0, 'some value')
->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1')
->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2');
15/26
Use resolver to create identifier containers
Lazy Entity Resolving
->withRow($resolver->unresolved('sku1'), 1, 0, 'some value')
->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1')
->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2');
$resolver = $this->resolverFactory->createSingleValueResolver(
'catalog_product_entity', 'sku', 'entity_id'
);
 
$insert = InsertOnDuplicate::create(
'catalog_product_entity_varchar',
['entity_id', 'attribute_id', 'store_id', 'value']
)->withResolver($resolver);
 
$insert
15/26
Insert on duplicate will skip any unresolved entries
Lazy Entity Resolving
$resolver = $this->resolverFactory->createSingleValueResolver(
'catalog_product_entity', 'sku', 'entity_id'
);
 
$insert = InsertOnDuplicate::create(
'catalog_product_entity_varchar',
['entity_id', 'attribute_id', 'store_id', 'value']
)->withResolver($resolver);
 
$insert
->withRow($resolver->unresolved('sku1'), 1, 0, 'some value')
->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1')
->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2');
15/26
16/26
Batch auto-increment generation
START TRANSACTION;
 
INSERT INTO catalog_product_entity (sku)
VALUES
('sku1'),
('sku2'),
('sku3'),
('sku4');
 
SELECT entity_id, sku
FROM catalog_product_entity
WHERE
sku IN ('sku1', 'sku2', 'sku3', 'sku4');
 
ROLLBACK;
17/26
Start a transaction
Batch auto-increment generation
START TRANSACTION;
 
INSERT INTO catalog_product_entity (sku)
VALUES
('sku1'),
('sku2'),
('sku3'),
('sku4');
 
SELECT entity_id, sku
FROM catalog_product_entity
WHERE
sku IN ('sku1', 'sku2', 'sku3', 'sku4');
 
ROLLBACK;
17/26
Populate table with un-resolved keys
Batch auto-increment generation
 
INSERT INTO catalog_product_entity (sku)
VALUES
('sku1'),
('sku2'),
('sku3'),
('sku4');
 
START TRANSACTION;
SELECT entity_id, sku
FROM catalog_product_entity
WHERE
sku IN ('sku1', 'sku2', 'sku3', 'sku4');
 
ROLLBACK;
17/26
Retrieve new identifiers
Batch auto-increment generation
SELECT entity_id, sku
FROM catalog_product_entity
WHERE
sku IN ('sku1', 'sku2', 'sku3', 'sku4');
 
START TRANSACTION;
 
INSERT INTO catalog_product_entity (sku)
VALUES
('sku1'),
('sku2'),
('sku3'),
('sku4');
 
ROLLBACK;
17/26
Rollback transaction
Batch auto-increment generation
ROLLBACK;
START TRANSACTION;
 
INSERT INTO catalog_product_entity (sku)
VALUES
('sku1'),
('sku2'),
('sku3'),
('sku4');
 
SELECT entity_id, sku
FROM catalog_product_entity
WHERE
sku IN ('sku1', 'sku2', 'sku3', 'sku4');
 
17/26
Prepared Statements
18/26
Compile query for constant batch size
Prepared Statements
18/26
Compile query for constant batch size
Send only data instead of generating new queries
Prepared Statements
18/26
Compile query for constant batch size
Send only data instead of generating new queries
Reduces query processing on MySQL side by half
Prepared Statements
18/26
Speed
450 products per second
19/26
Speed
450 products per second
~ 11 seconds for 5k products
19/26
But it was still not good enough
20/26
But it was still not good enough
45 minutes to import 1 million SKUs
20/26
Sure, because it's a sequential process...
21/26
So I made it asynchronous as PoC
22/26
Under the hood
23/26
Under the hood
Each target tabletarget table receives a separate connection
to MySQL
23/26
Under the hood
Each target tabletarget table receives a separate connection
to MySQL
Identity resolver is attached to the source table
connection
23/26
Under the hood
Each target tabletarget table receives a separate connection
to MySQL
Identity resolver is attached to the source table
connection
Each feed is processed concurrently by using
round robinround robin strategy
23/26
Under the hood
Each target tabletarget table receives a separate connection
to MySQL
Identity resolver is attached to the source table
connection
Each feed is processed concurrently by using
round robinround robin strategy
During MySQL query execution PHP prepares the
next batch
23/26
Speed
1,850 products per second
24/26
Speed
1,850 products per second
~ 9 minutes for 1m products
24/26
It is coming this fall as an open source tool for
everyone!
25/26
Questions
ivan@ecomdev.org
IvanChepurnyi.GitHub.io
26/26

Contenu connexe

Similaire à How to import 1 million SKUs in under 10 minutes

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Event sourcing w PHP (by Piotr Kacała)
Event sourcing w PHP (by Piotr Kacała)Event sourcing w PHP (by Piotr Kacała)
Event sourcing w PHP (by Piotr Kacała)GOG.com dev team
 
Microsoft Windows Server AppFabric
Microsoft Windows Server AppFabricMicrosoft Windows Server AppFabric
Microsoft Windows Server AppFabricMark Ginnebaugh
 
Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...
Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...
Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...Codemotion
 
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016Luigi Dell'Aquila
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKYoungHeon (Roy) Kim
 
Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...
Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...
Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...Trivadis
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingKyle Hailey
 
Pieter De Baets - An introduction to React Native
Pieter De Baets - An introduction to React NativePieter De Baets - An introduction to React Native
Pieter De Baets - An introduction to React Nativetlv-ios-dev
 
What's new in Cassandra 2.0
What's new in Cassandra 2.0What's new in Cassandra 2.0
What's new in Cassandra 2.0iamaleksey
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq lsInSync Conference
 
Introduction to Oracle Database.pptx
Introduction to Oracle Database.pptxIntroduction to Oracle Database.pptx
Introduction to Oracle Database.pptxSiddhantBhardwaj26
 
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowOTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowOTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
GraphQL - APIs The New Way
GraphQL - APIs The New WayGraphQL - APIs The New Way
GraphQL - APIs The New WayVladimir Tsukur
 

Similaire à How to import 1 million SKUs in under 10 minutes (20)

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Event sourcing w PHP (by Piotr Kacała)
Event sourcing w PHP (by Piotr Kacała)Event sourcing w PHP (by Piotr Kacała)
Event sourcing w PHP (by Piotr Kacała)
 
Xml4js pentaho
Xml4js pentahoXml4js pentaho
Xml4js pentaho
 
Microsoft Windows Server AppFabric
Microsoft Windows Server AppFabricMicrosoft Windows Server AppFabric
Microsoft Windows Server AppFabric
 
Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...
Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...
Geospatial Graphs made easy with OrientDB - Luigi Dell'Aquila - Codemotion Mi...
 
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 
Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...
Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...
Trivadis TechEvent 2016 Useful Oracle 12c Features for Data Warehousing by Da...
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 sampling
 
Pieter De Baets - An introduction to React Native
Pieter De Baets - An introduction to React NativePieter De Baets - An introduction to React Native
Pieter De Baets - An introduction to React Native
 
What's new in Cassandra 2.0
What's new in Cassandra 2.0What's new in Cassandra 2.0
What's new in Cassandra 2.0
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
 
Overview of Oracle database12c for developers
Overview of Oracle database12c for developersOverview of Oracle database12c for developers
Overview of Oracle database12c for developers
 
Introduction to Oracle Database.pptx
Introduction to Oracle Database.pptxIntroduction to Oracle Database.pptx
Introduction to Oracle Database.pptx
 
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowOTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
 
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowOTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
 
GraphQL - APIs The New Way
GraphQL - APIs The New WayGraphQL - APIs The New Way
GraphQL - APIs The New Way
 

Plus de Ivan Chepurnyi

Optimizing Magento by Preloading Data
Optimizing Magento by Preloading DataOptimizing Magento by Preloading Data
Optimizing Magento by Preloading DataIvan Chepurnyi
 
Meet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
Meet Magento Sweden - Magento 2 Layout and Code Compilation for PerformanceMeet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
Meet Magento Sweden - Magento 2 Layout and Code Compilation for PerformanceIvan Chepurnyi
 
Varnish Cache and its usage in the real world!
Varnish Cache and its usage in the real world!Varnish Cache and its usage in the real world!
Varnish Cache and its usage in the real world!Ivan Chepurnyi
 
Making Magento flying like a rocket! (A set of valuable tips for developers)
Making Magento flying like a rocket! (A set of valuable tips for developers)Making Magento flying like a rocket! (A set of valuable tips for developers)
Making Magento flying like a rocket! (A set of valuable tips for developers)Ivan Chepurnyi
 
Hidden Secrets of Magento Price Rules
Hidden Secrets of Magento Price RulesHidden Secrets of Magento Price Rules
Hidden Secrets of Magento Price RulesIvan Chepurnyi
 
Magento 2.0: Prepare yourself for a new way of module development
Magento 2.0: Prepare yourself for a new way of module developmentMagento 2.0: Prepare yourself for a new way of module development
Magento 2.0: Prepare yourself for a new way of module developmentIvan Chepurnyi
 
Using of TDD practices for Magento
Using of TDD practices for MagentoUsing of TDD practices for Magento
Using of TDD practices for MagentoIvan Chepurnyi
 

Plus de Ivan Chepurnyi (8)

Optimizing Magento by Preloading Data
Optimizing Magento by Preloading DataOptimizing Magento by Preloading Data
Optimizing Magento by Preloading Data
 
Meet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
Meet Magento Sweden - Magento 2 Layout and Code Compilation for PerformanceMeet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
Meet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
 
Varnish Cache and its usage in the real world!
Varnish Cache and its usage in the real world!Varnish Cache and its usage in the real world!
Varnish Cache and its usage in the real world!
 
Making Magento flying like a rocket! (A set of valuable tips for developers)
Making Magento flying like a rocket! (A set of valuable tips for developers)Making Magento flying like a rocket! (A set of valuable tips for developers)
Making Magento flying like a rocket! (A set of valuable tips for developers)
 
Hidden Secrets of Magento Price Rules
Hidden Secrets of Magento Price RulesHidden Secrets of Magento Price Rules
Hidden Secrets of Magento Price Rules
 
Magento 2.0: Prepare yourself for a new way of module development
Magento 2.0: Prepare yourself for a new way of module developmentMagento 2.0: Prepare yourself for a new way of module development
Magento 2.0: Prepare yourself for a new way of module development
 
Magento Indexes
Magento IndexesMagento Indexes
Magento Indexes
 
Using of TDD practices for Magento
Using of TDD practices for MagentoUsing of TDD practices for Magento
Using of TDD practices for Magento
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

How to import 1 million SKUs in under 10 minutes

  • 1. How to import 1 million SKUs in under 10 minutes 1/26
  • 2. Building an import correctly is hard 2/26
  • 3. 2008 it all started with DataFlow 3/26
  • 6. DataFlow Stores CSV records into database Imports product by product via AJAX call 4/26
  • 7. DataFlow Stores CSV records into database Imports product by product via AJAX call Uses product model to save data 4/26
  • 8. DataFlow Stores CSV records into database Imports product by product via AJAX call Uses product model to save data Closing the browser window stops the whole process 4/26
  • 10. Speed 2-3 products per second ~ 20 minutes for 5k products 5/26
  • 13. ImportExport Stored batches of data into the database during validation 7/26
  • 14. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request 7/26
  • 15. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request Validates product data without using product model 7/26
  • 16. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request Validates product data without using product model Uses multi-row inserts to populate tables 7/26
  • 17. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request Validates product data without using product model Uses multi-row inserts to populate tables Does not run indexers 7/26
  • 18. Speed 41 product per second 8/26
  • 19. Speed 41 product per second ~ 2 minutes for 5k products 8/26
  • 20. But there are some drawbacks 9/26
  • 21. High memory usage on large datasets But there are some drawbacks 9/26
  • 22. High memory usage on large datasets Slow in generating primary keys for new products But there are some drawbacks 9/26
  • 25. Same base functionality as in M1 ImportExport M2 11/26
  • 26. Same base functionality as in M1 More complex file format to edit and parse ImportExport M2 11/26
  • 27. Same base functionality as in M1 More complex file format to edit and parse Slower on complex product data ImportExport M2 11/26
  • 28. Same base functionality as in M1 More complex file format to edit and parse Slower on complex product data Adds additional single statement inserts ImportExport M2 11/26
  • 29. 2019 I got an idea and a project to implement it on 12/26
  • 31. Separate Feeds Main entity (sku, type, set) 13/26
  • 32. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) 13/26
  • 33. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) 13/26
  • 34. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) Configurable Options (sku, attribute, label) 13/26
  • 35. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) Configurable Options (sku, attribute, label) Images (sku, image) 13/26
  • 36. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) Configurable Options (sku, attribute, label) Images (sku, image) ... 13/26
  • 38. Lazy Entity Resolving Reduce memory requirements of the import 14/26
  • 39. Lazy Entity Resolving Reduce memory requirements of the import Cleaner and more readable feed processing 14/26
  • 40. Lazy Entity Resolving Reduce memory requirements of the import Cleaner and more readable feed processing Possibility of acquiring entity ids in batches automatically 14/26
  • 41. Lazy Entity Resolving $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  • 42. Configure table lookup information Lazy Entity Resolving $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  • 43. Pass resolver into insert builder Lazy Entity Resolving $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver); $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );     $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  • 44. Use resolver to create identifier containers Lazy Entity Resolving ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert 15/26
  • 45. Insert on duplicate will skip any unresolved entries Lazy Entity Resolving $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  • 46. 16/26
  • 47. Batch auto-increment generation START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   ROLLBACK; 17/26
  • 48. Start a transaction Batch auto-increment generation START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   ROLLBACK; 17/26
  • 49. Populate table with un-resolved keys Batch auto-increment generation   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   START TRANSACTION; SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   ROLLBACK; 17/26
  • 50. Retrieve new identifiers Batch auto-increment generation SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   ROLLBACK; 17/26
  • 51. Rollback transaction Batch auto-increment generation ROLLBACK; START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   17/26
  • 53. Compile query for constant batch size Prepared Statements 18/26
  • 54. Compile query for constant batch size Send only data instead of generating new queries Prepared Statements 18/26
  • 55. Compile query for constant batch size Send only data instead of generating new queries Reduces query processing on MySQL side by half Prepared Statements 18/26
  • 56. Speed 450 products per second 19/26
  • 57. Speed 450 products per second ~ 11 seconds for 5k products 19/26
  • 58. But it was still not good enough 20/26
  • 59. But it was still not good enough 45 minutes to import 1 million SKUs 20/26
  • 60. Sure, because it's a sequential process... 21/26
  • 61. So I made it asynchronous as PoC 22/26
  • 63. Under the hood Each target tabletarget table receives a separate connection to MySQL 23/26
  • 64. Under the hood Each target tabletarget table receives a separate connection to MySQL Identity resolver is attached to the source table connection 23/26
  • 65. Under the hood Each target tabletarget table receives a separate connection to MySQL Identity resolver is attached to the source table connection Each feed is processed concurrently by using round robinround robin strategy 23/26
  • 66. Under the hood Each target tabletarget table receives a separate connection to MySQL Identity resolver is attached to the source table connection Each feed is processed concurrently by using round robinround robin strategy During MySQL query execution PHP prepares the next batch 23/26
  • 67. Speed 1,850 products per second 24/26
  • 68. Speed 1,850 products per second ~ 9 minutes for 1m products 24/26
  • 69. It is coming this fall as an open source tool for everyone! 25/26