SlideShare a Scribd company logo
1 of 16
Data migration into EAV
model
by Oleg Kulik, Gorilla Group
The Problem
• Our task is importing ~1m entities into EAV model.
• Standard imports add high overload over the course of
processing each line item:
o app validates entity
o app creates import directive
o mysql parses query
o mysql validates row (constraints)
• The above works good for small number of import items
(~10k). It works bad for big number of items (>100k)
What do we want?
• remove as much validation as possible without harming
the database integrity
• minimize the app usage to remove possible memory
leaks and time required to assemble the import directive
• still have app decide how to process our file without
need of manual pre-processing
How to achieve our goals
• Move out of the Resource save schema
• Use bulk data loading
• Trust our sources
• Create a mechanism of connecting data after bulk loads
EAV Resource save
Data validation, assembling insert
queries
Insert query parsing, constraints
validation
Data load on row level
Loading data with “Load data infile”
No validating or assembling
layer
Bulk data loads
Less query parsing, leave
constraints for data integrity
Pros
• we use tool that was designed for bulk import from files
• it is tuned to work fast with big amount of data
• we have some control over the data integrity on MySQL
level
Cons
• no control over the incoming data quality (it can be
added as a pre-processing step)
• high possibility of duplicating data/losing integrity (again
can be added as a post-processing step - but adds
much time)
• this puts into question working with this method if we
have unpredictable data source
Getting your hands dirty
• test app @ https://github.com/SlayerBirden/migration.git
• 2 tables: actor_entity, actor_data; unique files “uin”
• foreign key from actor_data to actor_entity
• file columns: uin, name, lastname, age, movie
Some test results
System info:
memory: 2 banks of DIMM
DDR3 Synchronous 1333
MHz (0.8 ns) 4GB
cpu: Intel(R) Core(TM)
i5-3330 CPU @ 3.00GHz
MySql version: 5.5.35-
0ubuntu0.12.04.2
100k rows
oleg@oleg-Aspire-XC600:/var/www/migration$ php importer.php -h xxxx -u xxxx -p xxxx -
db test -f test.txt
100000 Entity rows imported.
IMPORT ENTITY TIME: 6.5943 seconds
100000 Data rows imported.
IMPORT DATA TIME: 10.9832 seconds
PROCESS TIME: 24.3128 seconds
PHP MEMORY USED: 1.13 kB
PHP MEMORY PEAK: 294.98 kB
oleg@oleg-Aspire-XC600:/var/www/migration$
1M rows
oleg@oleg-Aspire-XC600:/var/www/migration$ php
importer.php -h 172.20.3.227 -u oleg -p test123 -db
test -f test.txt
1000000 Entity rows imported.
IMPORT ENTITY TIME: 141.5386 seconds
1000000 Data rows imported.
IMPORT DATA TIME: 168.1476 seconds
PROCESS TIME: 363.1716 seconds
oleg@oleg-Aspire-XC600:/var/www/migration$
5m rows was a fail :)
mysqld started
swapping
Some more test results for a
stronger machine
System info:
memory: 3 banks of
DIMM DDR3 1600 MHz
8GB (2) and 4GB (1)
cpu: Intel(R)
Core(TM) i7-3610QM
CPU @ 2.30GHz
MySql version: 5.6.13-
log
SSD: OCZ-VECTOR
100k rows
c:apachehtdocsmigration>php importer.php -h localhost -u root -db test -f test.txt
100000 Entity rows imported.
IMPORT ENTITY TIME: 1.1041 seconds
100000 Data rows imported.
IMPORT DATA TIME: 1.1321 seconds
PROCESS TIME: 5.7513 seconds
1M rows
c:apachehtdocsmigration>php importer.php -h localhost
-u root -db test -f test.txt
1000000 Entity rows imported.
IMPORT ENTITY TIME: 14.2068 seconds
1000000 Data rows imported.
IMPORT DATA TIME: 10.5776 seconds
PROCESS TIME: 60.2454 seconds
5M rows
c:apachehtdocsmigration>php importer.php -h localhost
-u root -db test -f test.txt
5000000 Entity rows imported.
IMPORT ENTITY TIME: 89.3361 seconds
5000000 Data rows imported.
IMPORT DATA TIME: 62.1726 seconds
PROCESS TIME: 325.9186 seconds
Playing with
innodb_io_capacity
500k rows
innodb_io_capacity=200, innodb_io_capacity_max=2000
500000 Entity rows imported.
IMPORT ENTITY TIME: 18.9711 seconds
500000 Data rows imported.
IMPORT DATA TIME: 11.8517 seconds
PROCESS TIME: 48.3198 seconds
innodb_io_capacity=2000, innodb_io_capacity_max=20000
500000 Entity rows imported.
IMPORT ENTITY TIME: 7.6654 seconds
500000 Data rows imported.
IMPORT DATA TIME: 4.3602 seconds
PROCESS TIME: 29.8597 seconds
innodb_io_capacity=20000, innodb_io_capacity_max=30000
500000 Entity rows imported.
IMPORT ENTITY TIME: 7.6674 seconds
500000 Data rows imported.
IMPORT DATA TIME: 4.3112 seconds
PROCESS TIME: 29.6327 seconds
Tests for Resource-type save (for
comparison)
System info:
memory: 3 banks of
DIMM DDR3 1600 MHz
8GB (2) and 4GB (1)
cpu: Intel(R)
Core(TM) i7-3610QM
CPU @ 2.30GHz
MySql version: 5.6.13-
log
SSD: OCZ-VECTOR
50k rows
c:apachehtdocsmigration>php resource.php -h localhost -u root -db test -f test.txt
All rows imported
PROCESS TIME: 196.1622 seconds
MEMORY USED: 0.80 kB
MEMORY PEAK: 186.94 kB
Conclusion
Use this method if
• huge data amount (> 100k rows)
• performance is keypoint
• data source is predictable
• data integrity is not an absolute requirement
(for EAV)
Thank you!

More Related Content

What's hot

An Overview of Node.js
An Overview of Node.jsAn Overview of Node.js
An Overview of Node.js
Ayush Mishra
 

What's hot (20)

Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
 
Advanced VCL: how to use restart
Advanced VCL: how to use restartAdvanced VCL: how to use restart
Advanced VCL: how to use restart
 
Building Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker SwarmBuilding Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker Swarm
 
Async - react, don't wait - PingConf
Async - react, don't wait - PingConfAsync - react, don't wait - PingConf
Async - react, don't wait - PingConf
 
Nginx
NginxNginx
Nginx
 
Python in the database
Python in the databasePython in the database
Python in the database
 
Advanced .NET Data Access with Dapper
Advanced .NET Data Access with Dapper Advanced .NET Data Access with Dapper
Advanced .NET Data Access with Dapper
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Advanced data access with Dapper
Advanced data access with DapperAdvanced data access with Dapper
Advanced data access with Dapper
 
An Overview of Node.js
An Overview of Node.jsAn Overview of Node.js
An Overview of Node.js
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Apache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniApache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawani
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101
 
Dapper
DapperDapper
Dapper
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitives
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror Stories
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Embulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダ
 

Similar to Data migration into eav model

The rice and fail of an IoT solution
The rice and fail of an IoT solutionThe rice and fail of an IoT solution
The rice and fail of an IoT solution
Radu Vunvulea
 
Secrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archsSecrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archs
Tarik Essawi
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0
Yogi Kulkarni
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
Kaiyao Huang
 

Similar to Data migration into eav model (20)

The rice and fail of an IoT solution
The rice and fail of an IoT solutionThe rice and fail of an IoT solution
The rice and fail of an IoT solution
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
ShmooCON 2009 : Re-playing with (Blind) SQL Injection
ShmooCON 2009 : Re-playing with (Blind) SQL InjectionShmooCON 2009 : Re-playing with (Blind) SQL Injection
ShmooCON 2009 : Re-playing with (Blind) SQL Injection
 
Shuttle: Intrusion Recovery in Paas
Shuttle: Intrusion Recovery in PaasShuttle: Intrusion Recovery in Paas
Shuttle: Intrusion Recovery in Paas
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
 
Secrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archsSecrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archs
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
High Volume Payments using Mule
High Volume Payments using MuleHigh Volume Payments using Mule
High Volume Payments using Mule
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High Availability
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
 
Tale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache FlinkTale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache Flink
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
Web hacking series part 3
Web hacking series part 3Web hacking series part 3
Web hacking series part 3
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 

More from Magento Dev

DevHub 3 - Composer plus Magento
DevHub 3 - Composer plus MagentoDevHub 3 - Composer plus Magento
DevHub 3 - Composer plus Magento
Magento Dev
 
DevHub 3 - Pricing
DevHub 3 - PricingDevHub 3 - Pricing
DevHub 3 - Pricing
Magento Dev
 
Magento2 airplane
Magento2 airplaneMagento2 airplane
Magento2 airplane
Magento Dev
 
Imagine recap-devhub
Imagine recap-devhubImagine recap-devhub
Imagine recap-devhub
Magento Dev
 
Gearman jobqueue
Gearman jobqueueGearman jobqueue
Gearman jobqueue
Magento Dev
 
Choreography of web-services
Choreography of web-servicesChoreography of web-services
Choreography of web-services
Magento Dev
 
Take more from Jquery
Take more from JqueryTake more from Jquery
Take more from Jquery
Magento Dev
 

More from Magento Dev (17)

Yurii Hryhoriev "Php storm tips&tricks"
Yurii Hryhoriev "Php storm tips&tricks"Yurii Hryhoriev "Php storm tips&tricks"
Yurii Hryhoriev "Php storm tips&tricks"
 
DevHub 3 - Composer plus Magento
DevHub 3 - Composer plus MagentoDevHub 3 - Composer plus Magento
DevHub 3 - Composer plus Magento
 
DevHub 3 - Pricing
DevHub 3 - PricingDevHub 3 - Pricing
DevHub 3 - Pricing
 
DevHub 3 - CVS
DevHub 3 - CVSDevHub 3 - CVS
DevHub 3 - CVS
 
Magento2 airplane
Magento2 airplaneMagento2 airplane
Magento2 airplane
 
Imagine recap-devhub
Imagine recap-devhubImagine recap-devhub
Imagine recap-devhub
 
Разработка на стероидах или как я перестал бояться и полюбил свою IDE
Разработка на стероидах или как я перестал бояться и полюбил свою IDEРазработка на стероидах или как я перестал бояться и полюбил свою IDE
Разработка на стероидах или как я перестал бояться и полюбил свою IDE
 
Top 5 magento secure coding best practices Alex Zarichnyi
Top 5 magento secure coding best practices   Alex ZarichnyiTop 5 magento secure coding best practices   Alex Zarichnyi
Top 5 magento secure coding best practices Alex Zarichnyi
 
Magento 2 Page Cache
Magento 2 Page CacheMagento 2 Page Cache
Magento 2 Page Cache
 
Magento devhub
Magento devhubMagento devhub
Magento devhub
 
Php + erlang
Php + erlangPhp + erlang
Php + erlang
 
Tdd php
Tdd phpTdd php
Tdd php
 
Gearman jobqueue
Gearman jobqueueGearman jobqueue
Gearman jobqueue
 
Autotest
AutotestAutotest
Autotest
 
Choreography of web-services
Choreography of web-servicesChoreography of web-services
Choreography of web-services
 
Security in PHP
Security in PHPSecurity in PHP
Security in PHP
 
Take more from Jquery
Take more from JqueryTake more from Jquery
Take more from Jquery
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Data migration into eav model

  • 1. Data migration into EAV model by Oleg Kulik, Gorilla Group
  • 2. The Problem • Our task is importing ~1m entities into EAV model. • Standard imports add high overload over the course of processing each line item: o app validates entity o app creates import directive o mysql parses query o mysql validates row (constraints) • The above works good for small number of import items (~10k). It works bad for big number of items (>100k)
  • 3. What do we want? • remove as much validation as possible without harming the database integrity • minimize the app usage to remove possible memory leaks and time required to assemble the import directive • still have app decide how to process our file without need of manual pre-processing
  • 4. How to achieve our goals • Move out of the Resource save schema • Use bulk data loading • Trust our sources • Create a mechanism of connecting data after bulk loads
  • 5. EAV Resource save Data validation, assembling insert queries Insert query parsing, constraints validation Data load on row level
  • 6. Loading data with “Load data infile” No validating or assembling layer Bulk data loads Less query parsing, leave constraints for data integrity
  • 7. Pros • we use tool that was designed for bulk import from files • it is tuned to work fast with big amount of data • we have some control over the data integrity on MySQL level
  • 8. Cons • no control over the incoming data quality (it can be added as a pre-processing step) • high possibility of duplicating data/losing integrity (again can be added as a post-processing step - but adds much time) • this puts into question working with this method if we have unpredictable data source
  • 9. Getting your hands dirty • test app @ https://github.com/SlayerBirden/migration.git • 2 tables: actor_entity, actor_data; unique files “uin” • foreign key from actor_data to actor_entity • file columns: uin, name, lastname, age, movie
  • 10. Some test results System info: memory: 2 banks of DIMM DDR3 Synchronous 1333 MHz (0.8 ns) 4GB cpu: Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz MySql version: 5.5.35- 0ubuntu0.12.04.2 100k rows oleg@oleg-Aspire-XC600:/var/www/migration$ php importer.php -h xxxx -u xxxx -p xxxx - db test -f test.txt 100000 Entity rows imported. IMPORT ENTITY TIME: 6.5943 seconds 100000 Data rows imported. IMPORT DATA TIME: 10.9832 seconds PROCESS TIME: 24.3128 seconds PHP MEMORY USED: 1.13 kB PHP MEMORY PEAK: 294.98 kB oleg@oleg-Aspire-XC600:/var/www/migration$
  • 11. 1M rows oleg@oleg-Aspire-XC600:/var/www/migration$ php importer.php -h 172.20.3.227 -u oleg -p test123 -db test -f test.txt 1000000 Entity rows imported. IMPORT ENTITY TIME: 141.5386 seconds 1000000 Data rows imported. IMPORT DATA TIME: 168.1476 seconds PROCESS TIME: 363.1716 seconds oleg@oleg-Aspire-XC600:/var/www/migration$ 5m rows was a fail :) mysqld started swapping
  • 12. Some more test results for a stronger machine System info: memory: 3 banks of DIMM DDR3 1600 MHz 8GB (2) and 4GB (1) cpu: Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz MySql version: 5.6.13- log SSD: OCZ-VECTOR 100k rows c:apachehtdocsmigration>php importer.php -h localhost -u root -db test -f test.txt 100000 Entity rows imported. IMPORT ENTITY TIME: 1.1041 seconds 100000 Data rows imported. IMPORT DATA TIME: 1.1321 seconds PROCESS TIME: 5.7513 seconds
  • 13. 1M rows c:apachehtdocsmigration>php importer.php -h localhost -u root -db test -f test.txt 1000000 Entity rows imported. IMPORT ENTITY TIME: 14.2068 seconds 1000000 Data rows imported. IMPORT DATA TIME: 10.5776 seconds PROCESS TIME: 60.2454 seconds 5M rows c:apachehtdocsmigration>php importer.php -h localhost -u root -db test -f test.txt 5000000 Entity rows imported. IMPORT ENTITY TIME: 89.3361 seconds 5000000 Data rows imported. IMPORT DATA TIME: 62.1726 seconds PROCESS TIME: 325.9186 seconds Playing with innodb_io_capacity 500k rows innodb_io_capacity=200, innodb_io_capacity_max=2000 500000 Entity rows imported. IMPORT ENTITY TIME: 18.9711 seconds 500000 Data rows imported. IMPORT DATA TIME: 11.8517 seconds PROCESS TIME: 48.3198 seconds innodb_io_capacity=2000, innodb_io_capacity_max=20000 500000 Entity rows imported. IMPORT ENTITY TIME: 7.6654 seconds 500000 Data rows imported. IMPORT DATA TIME: 4.3602 seconds PROCESS TIME: 29.8597 seconds innodb_io_capacity=20000, innodb_io_capacity_max=30000 500000 Entity rows imported. IMPORT ENTITY TIME: 7.6674 seconds 500000 Data rows imported. IMPORT DATA TIME: 4.3112 seconds PROCESS TIME: 29.6327 seconds
  • 14. Tests for Resource-type save (for comparison) System info: memory: 3 banks of DIMM DDR3 1600 MHz 8GB (2) and 4GB (1) cpu: Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz MySql version: 5.6.13- log SSD: OCZ-VECTOR 50k rows c:apachehtdocsmigration>php resource.php -h localhost -u root -db test -f test.txt All rows imported PROCESS TIME: 196.1622 seconds MEMORY USED: 0.80 kB MEMORY PEAK: 186.94 kB
  • 15. Conclusion Use this method if • huge data amount (> 100k rows) • performance is keypoint • data source is predictable • data integrity is not an absolute requirement (for EAV)