Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Natural Language to SQL Query conversion using Machine Learning Techniques on HPCC Systems

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
HPCC Presentation
HPCC Presentation
Chargement dans…3
×

Consultez-les par la suite

1 sur 29 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Natural Language to SQL Query conversion using Machine Learning Techniques on HPCC Systems (20)

Publicité

Plus par HPCC Systems (20)

Plus récents (20)

Publicité

Natural Language to SQL Query conversion using Machine Learning Techniques on HPCC Systems

  1. 1. RV College of Engineering Go, change the world 1 Dr. G. Shobha Professor, CSE Department RV College of Engineering, Bengaluru - 59 Natural Language to SQL Query conversion using Machine Learning Techniques on HPCC Systems Platform
  2. 2. RV College of Engineering PRESENTATION CONTENTS 2 • Introduction and Motivation • Components involved in NLP for NL to SQL Conversion • Rule Based Architecture for NL to SQL conversion • Machine Learning Based Architecture to Enrich NL for SQl Conversion • HPCC Systems Architecture • Results & Conclusions
  3. 3. RV College of Engineering Introduction and Motivation 3 Key Factors of NL to SQL Go, change the world • Databases serve as the forefront for most systems today. • Structured query language (SQL) is used to access and manipulate the data stored in a relational database. • Most end users have limited knowledge of SQL and thus face difficulties in accessing such • Critical to access the data • Learn the Querying language and understand the various syntax
  4. 4. RV College of Engineering 4 Components Involved in NLP for NL to SQL Components of NLP NLP Part of Computer Science and Artificial Intelligence which deals with Human Languages Go, change the world
  5. 5. RV College of Engineering Rule Based Architecture for NL to SQl Conversion 5 Go, change the world
  6. 6. RV College of Engineering Rule Based Architecture for NL to SQl Conversion 6 Preprocessor • Tokenizes the natural language input. • Remove the redundant tokens • The output of the preprocessor is duplicated and supplied to two major components - Entity Recognizer - Intent Recognizer Entity Recognizer • entity extractor • a classifier • a filter. Go, change the world
  7. 7. RV College of Engineering Rule Based Architecture for NL to SQl Conversion 7 Entity Extractor • uses parts of speech tagging and a date parser to extract important keywords from the sentence • strong probable to form relation names, attribute names or data • These are then fed into a classifier along with the user defined schema mappings of relation names and attribute names. Classifier • The classifier uses various checks such as Direct, Concatenation, N gram, hypernyms, synonyms to discriminate the keywords into relation names, attribute names and residual keywords. Filter • The residual words are filtered to extract the words that form part of the data items of the SQL query. Go, change the world
  8. 8. RV College of Engineering Rule Based Architecture for NL to SQl Conversion 8 Intent Recognizer • Process of creating a template of the SQL query by performing checks for each SQL clause. • Various techniques such as the context identification, distance metric, keyword spotting, grammar rules etc. are applied to check for the existence of a particular clause. Go, change the world
  9. 9. RV College of Engineering Rule Based Architecture for NL to SQl Conversion 9 Challenges faced • Specific Schema • Identification of partial or implied data values • Identification of descriptive values Go To Solution : Machine Learning Techniques for NL to SQL Go, change the world
  10. 10. RV College of Engineering 10 Technologies Involved in Machine Learning for NLP to SQL Feedforward neural networks Recurrent Neural Networks (RNNs) • Networks with feedback loops (recurrent edges) • Output at current time step depends on current input as well • as previous state (via recurrent edges) Training RNNs Problem: can’t capture long-term dependencies due to vanishing/exploding gradients during backpropagation Go, change the world
  11. 11. RV College of Engineering 11 Technologies Involved in ML for NLP to SQL Go To Solution : Long Short Term Memory Model A type of RNN architecture that addresses the vanishing/exploding gradient problem and allows learning of long-term dependencies Recently risen to prominence with state-of-the-art performance in speech recognition, language modeling, translation, image captioning Go, change the world
  12. 12. RV College of Engineering 12 Technologies Involved in Machine Learning for NLP to SQL
  13. 13. RV College of Engineering 13 Machine Learning Based Architecture to Enrich NL for SQl Conversion Go, change the world
  14. 14. RV College of Engineering 14 Data Set Extraction Go, change the world • Data extracted from RDBMS • Apache Common CSV Library - used to extract the dataset in the form of CSV file • Attributes which contain descriptive values’ (Ex: Experience, Description. etc) is also provided as input. • Three separate components work synchronously to extract maximum latent information from the dataset, which can either be used to enrich the natural language or be stored to use during conversion. Partial and Implied Values • Pre-processing techniques • Embedding Layer • Long Short Term Memory • Classification of Inputs Machine Learning for Implied Data Values
  15. 15. RV College of Engineering 15 Pre-processing techniques Go, change the world Machine Learning for Implied Data Values
  16. 16. RV College of Engineering 16 Embedding Layer Go, change the world Machine Learning for Implied Data Values
  17. 17. RV College of Engineering 17 LSTM Model Go, change the world Machine Learning for Implied Data Values
  18. 18. RV College of Engineering 18 Proposed Model – Implied Data Values Classification of Inputs • The input Natural Language query is tokenized and split into different sequences. • Sequences of 1 word (1-gram) up to sequences of n words (n-gram, where n is determined by the number of tokens) is considered for prediction. • The largest sequences and its classification are considered (i.e., sub-sequences are ignored). The final, high confidence classifications given by the LSTM model can be used in multiple ways, couple of them are outlined below: • Enrich the Natural Language query • Store the data values and attribute names Go, change the world
  19. 19. RV College of Engineering 19 Elastic Search –Descriptive Values Go, change the world Elastic Search Stop Analyzer : Discards the Stop words Ex : Input: Get the doctors with masters degree Analyzer: Get doctors masters degree English Language Analyzer: converts the words of the input query to its root word. Ex: Input: Show all products which are red bikes. Analyzer: Show all product which road bike Components of Elastic Search 1. Analyzers • The extracted CSV file is used to create an index in Elastic Search. • Elastic Search’s Bulk API provides the necessary functions that can create and store large data simultaneously.
  20. 20. RV College of Engineering 20 Proposed Model – Descriptive Values Go, change the world Components of Elastic Search 2. Searching through multiple attributes 3. Generation of suitable fieldname-value pair in WHERE clause Multiple columns can be searched in Elastic Search by using “multi_match” keyword { “query”: { “multi_match”: { “query”: input query, “fields”:[list of descriptive column names]; } } } WHERE fieldname1 = value1 AND fieldname2 = value2 AND.… fieldnameN = valueN
  21. 21. RV College of Engineering 21 Proposed Model – Descriptive Values Go, change the world
  22. 22. RV College of Engineering HPCC Systems Platform 22 Key Factors of HPCC Systems Platform Go, change the world Go To Solutions : Synchronous Combination of Hybrid Machine Learning Model, Elastic Search, WordNet , HPCC Systems Platform • Highly integrated system environment - capabilities from raw data processing to high- performance queries and data analysis using a common language; • Optimized cluster approach - provides high performance at a much lower system cost than other system alternatives • Stable and reliable processing environment proven in production applications for varied organizations over a 15-year period; • Innovative data-centric programming language (ECL) • High-level of fault resilience and capabilities • Suitable for a wide range of data-intensive
  23. 23. RV College of Engineering Introduction and Motivation 23 Go, change the world
  24. 24. RV College of Engineering 24 Results Input Natural Language Query Enriched Natural Language Query Output SQL Query show all unmarried customers who are men show all single Gender 'male' customers SELECT * FROM t_cstmrs WHERE LOWER( MaritalStatus ) = 'single' AND LOWER( Gender ) = 'male' Names of customers who have graduated and from germany or france FullName Names of customers who have Education 'graduate degree' and from CountryRegion 'germany' or CountryRegion 'france' SELECT t_cstmrs.FullName FROM t_cstmrs INNER JOIN t_ggrphy ON t_ggrphy.GeographyKey = t_cstmrs.GeographyKey WHERE LOWER ( t_ggrphy.CountryRegion ) = 'germany' OR LOWER (t_ggrphy.CountryRegion ) = 'france' ) AND (LOWER( t_cstmrs.Education ) = 'graduate degree' ) Go, change the world
  25. 25. RV College of Engineering 25 Results get the price of red or dark helmet get the price of Color 'red' or Color ‘black' ProductSubCategoryName 'helmet' SELECT ListPrice , Color FROM t_prdsubcat INNER JOIN t_prds ON t_prdsubcat.ProductSubCategoryKey = t_prds.ProductSubCategoryKey WHERE LOWER( Color ) = 'red' OR LOWER( Color ) = 'black' how much does tire tube cost how much does ProductName ‘road tire tube’ cost SELECT ListPrice , ProductName FROM t_prds WHERE LOWER( ProductName ) = 'road tire tube' get the orders from new south wales australia get the orders from StateProvince 'new south wales' CountryRegion 'australia' SELECT t_saldtls.OrderQuantity, t_ggrphy.CountryRegion, t_ t_cstmrs.FullName , t_ggrphy.StateProvince FROM t_ggrphy INNER JOIN t_cstmrs ON t_cstmrs.GeographyKey = t_ggrphy.GeographyKey INNER JOIN t_saldtls ON t_cstmrs.CustomerKey = t_saldtls.CustomerKey WHERE LOWER( t_cstmrs.StateProvince) = 'new south wales' AND LOWER( t_ggrphy.CountryRegion ) = 'australia' show subtotal of orders for helmet show subtotal of orders for ProductSubCategoryName 'helmet’ SELECT SUM( t_saldtls.SalesOrderint ) FROM t_prds INNER JOIN t_saldtls ON t_prds.ProductKey = t_saldtls.ProductKey WHERE LOWER( t_prds.ProductName ) = 'helmet' Go, change the world
  26. 26. RV College of Engineering 26 Results – Descriptive values Go, change the world Select an item with mountain wheel for entry- level rider. SELECT * FROM t_prds WHERE t_prds.Description = 'Replacement mountain wheel for entry-level rider.' Name the items which have pioneering frame technology as the HQ steel frame. SELECT t_prds.ProductName FROM t_prds WHERE t_prds.Description = 'The same pioneering frame technology is used to give you the highest value as the HQ steel frame.'
  27. 27. RV College of Engineering 27 Conclusion • Partial and implied data values in the natural language queries are identified by a trained hybrid ML model. • WordNet is also used as a safety net to understand implied data values where the vocabulary of the input relational database is not expressive. • Descriptive values are identified with the help of Elastic Search. • The accuracy of the system is 91.7% on IMDb database Go, change the world
  28. 28. RV College of Engineering 28 Acknowledge Students of RVCE 1. Shubham Phal 2. Yatish H R 3. Tanmay Hukkeri 4. Akshar Prasad 5. Sourabh S Badhya 6. Yashwanth YS 7. Shetty Rohan
  29. 29. RV College of Engineering 29 Go, change the world

×