IDEAS Global A.I. Conference 2022.pdf

- Manimuthu Ayyannan
Self Service Metadata driven
Data Loader Framework

About Us
Manimuthu Ayyannan
manimuthu.ayyannan@walmart.com
LinkedIn:@manimuthuayyannan
• Senior Manager II, Software Engineering @Walmart Global Tech
• Data Enthusiast
• Data Platform Services (Big Data | Spark | Cloud | AI-ML)

Agenda
• Personalization @Walmart
• Challenges
• Solution Approaches
• High Level System Architecture
• Metadata Designand Connectors
• Orchestrator
• Schedule Optimizer
• Telemetry

Personalization @Walmart
• Our Customers are becoming increasingly
omni channel
• ~220M Customers & Members visits ~10,500
stores & clubs under 46 banners in 24 countries
& eCommerce websites in a week
• Billions of product impressions served every
week which generates events in petabytes
• We at FE team, run thousands of data
applications to generate features that
powers the personalized recommendations to
our customers
source
Walmart
General
Merchandise
+Walmart
Grocery, Store
Pickup &
Delivery
+Walmart
Stores

• Data applicationonboardingrequires a lot of manualhand coding and developers need time to
develop,integrate, and test code to solve the underlying complexities
• Buildingfunctionalityrich applicationneeds integrationwith variousbig data technologies,wide
array of datasources, sinks and data processors
• Difficult to control the resource allocation/usageand do the retrospection
• Competing high and low priority applicationsare introducingthe latency to the serving layers
Challenges

Challenges | New App Onboarding | Cumbersome & Fragile
Integrate
Data App 1 Integrate Develop Implement Enable
Source System Target System Processor Security Telemetry
Test and Deploy
Integrate
Data App 2 Integrate Develop Implement Enable Test and Deploy
Integrate
Integrate
Integrate
Data App N Integrate Develop Implement Enable Test and Deploy
Allocate
Resource
Allocate
Allocate
Allocate
Allocate

Data Loader Simplifies the onboarding
Configure
Data App 1
Source System Target System Processor Security Telemetry
Test and Deploy
Configure
Data App 2
Test and Deploy
Configure
Data App 3
Test and Deploy
Configure
Data App 4
Test and Deploy
Configure
Data App N
Test and Deploy
Resource
-Data Loader Platform-
An abstract layer equippedwith standardparsers
and connectors

• A centralized metadatadriven dataloading platform with plug and play onboardingcapability
• An abstractionlayer to buildthe workflow orchestrationwhich simplifies the complex service
integrationsand faster time to deployment
• A compelling UI that dramaticallyincreases the developer’sproductivityby providingready-to-use
connectorsto configure the business logic
• An IntelligentSystem to provide optimized recommendationbased on the previousruns
• Smart run schedule pool to enqueue and dequeue the run instances based on priority
Solution Approach

High Level System Architecture

Connectors
• Framework is equipped to parse and handle all the data formats like JSON, AVRO, Parquet
and CSV
• Users can pick the existing connectors supporting different source and target systems like
Kafka, Cassandra and BQ.
• Metadata stores the system and application specific resource configuration to optimize
the resource allocations
• Abstract layer bundled with Custom UDFs that provides user flexibility to query the
systems like Kafka and Cassandra with SQL

Sample Domain API call in SQL UDF
• Accessing new domain APIs requires lot of engineering effort to integrate it in any data
applications
• Creating UDFs for Domain APIs and use these APIs in parallel computational engine like Spark
where it accepts UDFs usage in SQL
spark.sql("select getAccountStatus('cust_id:xxxxxxxxx') as is_active from table limit 1").show(false)
+------------------------------+
|is_active |
+------------------------------+
|Y|
+------------------------------+

Orchestrator
• Builds the optimized execution plan based
on the application configs from the
metadatastore
• Responsible for generating the run
instances based on the app priority and
source systems
• Executors picks the optimized execution
plan during the execution
Metadata
Store
Executors
Read App Config
Job Optimizer
Generate Run
Instance
Run Scheduler
Orchestrator

• Smart priority groups assigned to each loader for all the applicationsbased on the criticality
• Top priority jobstake precedence over the already scheduled lower priority
ones by dequeuing them
• Automatic resumption of the lower priority jobs once all the top priority and SLA bound jobs
are complete
Schedule Optimizer

• Real-time dashboardsthat provide run time statisticsfor each application
• Insightful experience to deep dive on various metrics
• Alerting and notificationmechanism to let app owners know about any erroneous or fault
scenarios
• Consolidatedview of all applicationswith corresponding success/failure ratio
Telemetry

Putting the pieces together
Self Service
Metadata Store
Multiple
Execution
Engines
E2E App Life
Cycle
Management
Multiple
Source & Target
Systems
Telemetry
Version Control
& CI/CD
Cloud Native
Plug & Play
Low or No code

• Quick turnaroundtime from few months to weeks
• Developer productivityexpected to increase by multiple folds
• Non-Engineeringteams can also leverage this framework to buildfunctionalapplicationswith
basic knowledge of SQL
• Intelligentapp execution based on the app priority compared to non-SLA applications
Outcome

IDEAS Global A.I. Conference 2022.pdf

Recommandé

Recommandé

Contenu connexe

Similaire à IDEAS Global A.I. Conference 2022.pdf

Similaire à IDEAS Global A.I. Conference 2022.pdf (20)

Dernier

Dernier (20)

IDEAS Global A.I. Conference 2022.pdf