SlideShare a Scribd company logo
1 of 12
Character(Byte)-
Level Language
Model
Sequence-to-Sequence Language
Model for Transfer Learning
Firas Obeid
Agenda
• Why Unstructured Data?
• Character2Vector using Byte Level Encoding
• Embeddings
• Tips for LSTM Layer
• Transfer Learning
• Show code implementation with the model
Why the hype on Unstructured Data?
• Natural language processing (NLP) has become mainstream through a focus
on creating value from unstructured data.
• The number of firms that only use unstructured data has shot up from 2% in
2018 to 17% in 2020, and only 3% of the firms surveyed report that they do
not use alternative data sources, down from 30% in 2018.
• Last week, Refinitivs’ 2020 survey shows 72% of firms’ models were
negatively impacted by COVID-19. Some 12% of firms declared their models
obsolete, and 15% are building new ones. The main problem was the lack of
agility to quickly adapt and include new data sets in models as
circumstances changed.
Benefits of Charac2vec
• Having the character embedding, every single word’s vector can be formed even if it is out-
of-vocabulary words (no Bag-of Words necessary). On the other hand, word embedding can
only handle those seen words.
• Good fits for misspelling words
• Handles infrequent words better than word2vec embedding as later one suffers from lack
of enough training opportunity for those rare words
• Reduces model complexity and improving the performance (in terms of speed)
• All this comes at a cost of training on larger sparse sequences thus longer time to train and
optimize!
Why not Byte Level
Even?
• When ASCII encoding is used, there is no difference
between reading characters or bytes. The ASCII-
way of encoding characters allows for 256
characters to be encoded and (surprise…) these
256 possible characters are stored as bytes. 256 for
8 bit slots.
• I will use only 127 of these possible character that
are common to the English language. (7 bit slots)
• 0-31, 127  Control Characters (Nonprintable)
• 32-126  Alphabets(Upper/Lower case), Numeric,
symbols and signs
Embeddings vs One-
hot Encoding
Binary mode returns an array denoting which tokens exist at least once
in the input, while int mode replaces each token by an integer, thus
preserving their order
Embeddings
Layer
• Gives relationship between characters. Based on how
characters accompany each other.
• Dense vector representation (n-Dimensional) of float point
values. Map(char/byte) to a dense vector.
• Embeddings are trainable weights/parameters by the model
equivalent to weights learned by dense layer.
• In our case each unique character/byte is represented with an
N-Dimensional vector of floating point values, where the
learned embedding forms a lookup table by "looking up" each
characters dense vector in the table to encode it.
• A simple integer encoding of our characters is not efficient for
the model to interpret since a linear classifier only learns the
weights for a single feature but not the relationship (probability
distribution) between each feature(characters) or there
encodings.
• A higher dimensional embedding can capture “fine-grained”
relationships between characters, but takes more data to
learn.(256-Dimensions our case)
Tips for LSTM Inputs
• The LSTM input layer must be 3D. [i.e batch_input_shape=(batch_size, n_timesteps, n_features)]
• The meaning of the 3 input dimensions are: samples, time steps, and features (sequences,
sequence_length, characters).
• The LSTM input layer is defined by the input_shape argument on the first hidden layer.
• The input_shape argument takes a tuple of two values that define the number of time steps and
features.
• The number of samples is assumed to be 1 or more. Specify to None for batch_input_shape otherwise or
None for the input_length.
• The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D.
• The reshape() function takes a tuple as an argument that defines the new shape
• The LSTM return the entire sequence of outputs for each sample (one vector per timestep per sample), if
you set return_sequences=True.
• Stateful RNN only makes sense if each input sequence in a batch starts exactly where the corresponding
sequence in the previous batch left off. Our RNN model is stateless since each sample is different from
the other and they dont form a text corpus but are separate headlines.
Semi-Supervised
(Transfer Learning)
• Previously, Word2Vec take two embedding layers
for each token to predict probability of a word
before and after. LSTM can handle that without
random sampling, just take the max logit or
probability of the outcome to predict next word
or character
• Unlabeled data can compensate for labeled
fewer data in asset pricing .
Tips for Choosing
Learning Rate

More Related Content

What's hot

Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdfPo-Chuan Chen
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
GPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelGPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelZimin Park
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Rama Irsheidat
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNetDomyoung Lee
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Modelssaurav singla
 

What's hot (20)

Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
GPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelGPT : Generative Pre-Training Model
GPT : Generative Pre-Training Model
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 

Similar to Language Model.pptx

Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in CSahithi Naraparaju
 
CONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN CCONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN CSahithi Naraparaju
 
Error Detection N Correction
Error Detection N CorrectionError Detection N Correction
Error Detection N CorrectionAnkan Adhikari
 
CS4443 - Modern Programming Language - I Lecture (2)
CS4443 - Modern Programming Language - I  Lecture (2)CS4443 - Modern Programming Language - I  Lecture (2)
CS4443 - Modern Programming Language - I Lecture (2)Dilawar Khan
 
Engineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptxEngineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptxhardii0991
 
UNIT-II VISUAL BASIC.NET | BCA
UNIT-II VISUAL BASIC.NET | BCAUNIT-II VISUAL BASIC.NET | BCA
UNIT-II VISUAL BASIC.NET | BCARaj vardhan
 
Numeric Data Types & Strings
Numeric Data Types & StringsNumeric Data Types & Strings
Numeric Data Types & StringsAbhinav Porwal
 
Constants Variables Datatypes by Mrs. Sowmya Jyothi
Constants Variables Datatypes by Mrs. Sowmya JyothiConstants Variables Datatypes by Mrs. Sowmya Jyothi
Constants Variables Datatypes by Mrs. Sowmya JyothiSowmyaJyothi3
 
FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2rohassanie
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...IJNSA Journal
 
Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Vahid Farahmandian
 
Data Type in C Programming
Data Type in C ProgrammingData Type in C Programming
Data Type in C ProgrammingQazi Shahzad Ali
 
Keywords, identifiers and data type of vb.net
Keywords, identifiers and data type of vb.netKeywords, identifiers and data type of vb.net
Keywords, identifiers and data type of vb.netJaya Kumari
 

Similar to Language Model.pptx (20)

Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in C
 
CONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN CCONSTANTS, VARIABLES & DATATYPES IN C
CONSTANTS, VARIABLES & DATATYPES IN C
 
C6 agramakrishnan1
C6 agramakrishnan1C6 agramakrishnan1
C6 agramakrishnan1
 
Error Detection N Correction
Error Detection N CorrectionError Detection N Correction
Error Detection N Correction
 
CS4443 - Modern Programming Language - I Lecture (2)
CS4443 - Modern Programming Language - I  Lecture (2)CS4443 - Modern Programming Language - I  Lecture (2)
CS4443 - Modern Programming Language - I Lecture (2)
 
Engineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptxEngineering CS 5th Sem Python Module -2.pptx
Engineering CS 5th Sem Python Module -2.pptx
 
UNIT-II VISUAL BASIC.NET | BCA
UNIT-II VISUAL BASIC.NET | BCAUNIT-II VISUAL BASIC.NET | BCA
UNIT-II VISUAL BASIC.NET | BCA
 
C#
C#C#
C#
 
Data types.pdf
Data types.pdfData types.pdf
Data types.pdf
 
Numeric Data Types & Strings
Numeric Data Types & StringsNumeric Data Types & Strings
Numeric Data Types & Strings
 
Constants Variables Datatypes by Mrs. Sowmya Jyothi
Constants Variables Datatypes by Mrs. Sowmya JyothiConstants Variables Datatypes by Mrs. Sowmya Jyothi
Constants Variables Datatypes by Mrs. Sowmya Jyothi
 
FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
VB.NET Datatypes.pptx
VB.NET Datatypes.pptxVB.NET Datatypes.pptx
VB.NET Datatypes.pptx
 
DDUV.pdf
DDUV.pdfDDUV.pdf
DDUV.pdf
 
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
 
Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1
 
Data Type in C Programming
Data Type in C ProgrammingData Type in C Programming
Data Type in C Programming
 
Keywords, identifiers and data type of vb.net
Keywords, identifiers and data type of vb.netKeywords, identifiers and data type of vb.net
Keywords, identifiers and data type of vb.net
 

Recently uploaded

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 

Recently uploaded (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 

Language Model.pptx

  • 2. Agenda • Why Unstructured Data? • Character2Vector using Byte Level Encoding • Embeddings • Tips for LSTM Layer • Transfer Learning • Show code implementation with the model
  • 3. Why the hype on Unstructured Data? • Natural language processing (NLP) has become mainstream through a focus on creating value from unstructured data. • The number of firms that only use unstructured data has shot up from 2% in 2018 to 17% in 2020, and only 3% of the firms surveyed report that they do not use alternative data sources, down from 30% in 2018. • Last week, Refinitivs’ 2020 survey shows 72% of firms’ models were negatively impacted by COVID-19. Some 12% of firms declared their models obsolete, and 15% are building new ones. The main problem was the lack of agility to quickly adapt and include new data sets in models as circumstances changed.
  • 4. Benefits of Charac2vec • Having the character embedding, every single word’s vector can be formed even if it is out- of-vocabulary words (no Bag-of Words necessary). On the other hand, word embedding can only handle those seen words. • Good fits for misspelling words • Handles infrequent words better than word2vec embedding as later one suffers from lack of enough training opportunity for those rare words • Reduces model complexity and improving the performance (in terms of speed) • All this comes at a cost of training on larger sparse sequences thus longer time to train and optimize!
  • 5. Why not Byte Level Even? • When ASCII encoding is used, there is no difference between reading characters or bytes. The ASCII- way of encoding characters allows for 256 characters to be encoded and (surprise…) these 256 possible characters are stored as bytes. 256 for 8 bit slots. • I will use only 127 of these possible character that are common to the English language. (7 bit slots) • 0-31, 127  Control Characters (Nonprintable) • 32-126  Alphabets(Upper/Lower case), Numeric, symbols and signs
  • 6. Embeddings vs One- hot Encoding Binary mode returns an array denoting which tokens exist at least once in the input, while int mode replaces each token by an integer, thus preserving their order
  • 7. Embeddings Layer • Gives relationship between characters. Based on how characters accompany each other. • Dense vector representation (n-Dimensional) of float point values. Map(char/byte) to a dense vector. • Embeddings are trainable weights/parameters by the model equivalent to weights learned by dense layer. • In our case each unique character/byte is represented with an N-Dimensional vector of floating point values, where the learned embedding forms a lookup table by "looking up" each characters dense vector in the table to encode it. • A simple integer encoding of our characters is not efficient for the model to interpret since a linear classifier only learns the weights for a single feature but not the relationship (probability distribution) between each feature(characters) or there encodings. • A higher dimensional embedding can capture “fine-grained” relationships between characters, but takes more data to learn.(256-Dimensions our case)
  • 8. Tips for LSTM Inputs • The LSTM input layer must be 3D. [i.e batch_input_shape=(batch_size, n_timesteps, n_features)] • The meaning of the 3 input dimensions are: samples, time steps, and features (sequences, sequence_length, characters). • The LSTM input layer is defined by the input_shape argument on the first hidden layer. • The input_shape argument takes a tuple of two values that define the number of time steps and features. • The number of samples is assumed to be 1 or more. Specify to None for batch_input_shape otherwise or None for the input_length. • The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D. • The reshape() function takes a tuple as an argument that defines the new shape • The LSTM return the entire sequence of outputs for each sample (one vector per timestep per sample), if you set return_sequences=True. • Stateful RNN only makes sense if each input sequence in a batch starts exactly where the corresponding sequence in the previous batch left off. Our RNN model is stateless since each sample is different from the other and they dont form a text corpus but are separate headlines.
  • 9.
  • 10. Semi-Supervised (Transfer Learning) • Previously, Word2Vec take two embedding layers for each token to predict probability of a word before and after. LSTM can handle that without random sampling, just take the max logit or probability of the outcome to predict next word or character • Unlabeled data can compensate for labeled fewer data in asset pricing .
  • 11.