GPT and other Text Transformers: Black Swans and Stochastic Parrots

GPT and other Text Transformers:
Black Swans and Stochastic Parrots
Grigory Sapunov, Konstantin Savenkov (ks@inten.to)
Intento (https://inten.to)

#GALACONNECTED
•Machines in Content Creation Chain
•Pre-Trained Transformers
•Practical Applications
Agenda

#GALACONNECTED
Translation - one of the final steps of content creation, many capable
models, extensive tool support for post-editing (PE) workflows, rapidly
expanding adoption.
OCR, ASR, TTS - a decent amount of good models, relatively poor
toolchain for PE workflows.
Content authoring, quality improvement and stylization - no good
models until very recently, zero to none support for post-editing workflows
(sentence completion and predictive input in some text editors).
Machines in the Content Creation Chain

#GALACONNECTED
Pre-Trained Transformers

#GALACONNECTED
Transformers
The new type of neural networks
Transformer is a new type of neural network emerged in 2017. It is based on
attention mechanisms.
Transformers replaced previously dominating recurrent neural networks
(RNN/LSTM/GRU) in Machine Translation and many other NLP tasks.
Comparing to recurrent neural networks Transformer has a much higher
computational efficiency and can efficiently exploit modern parallel hardware
(GPU/TPU). This allows training of much larger models on huge amounts of data.
We won’t dive into the inner workings of the transformer, for this read the beautiful
article by Jay Alammar: http://jalammar.github.io/illustrated-transformer/

#GALACONNECTED
Transformers
Large models and data
http://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf

#GALACONNECTED
Large model trained on a huge amount of texts
Transformers can be pre-trained on huge amounts of texts in an unsupervised/self-supervised manner. That
means they don’t require human-labeled data, just plain texts.
Example of tasks:
● Language modeling (LM): predict the next word/token based on the history of previous words/tokens.
Used by GPT-like models.
● Masked language modeling (MLM): some words/tokens in an input sequence are corrupted (replaced
by random tokens or some special tokens) and the goal is to recover these tokens using the remaining
context. Used by BERT-like models.
● There are some other tasks: Multilingual MLM (MMLM), Translation language modeling (TLM), Next
sentence prediction (NSP), Cross-Lingual contrast (XLCO), etc.

#GALACONNECTED
Language modeling (LM)
https://jalammar.github.io/how-gpt3-works-visualizations-animations/

#GALACONNECTED
Masked language modeling (MLM)
https://jalammar.github.io/illustrated-bert/

#GALACONNECTED
Generally, there are three modes:
● Train from scratch: train a transformer model for your own task on your own dataset.
Very expensive.
● Fine-tune: fine-tune a pretrained model for your own task with a much smaller dataset. Requires much
less data and computational resources, but still requires a significant amount of AI, software and
infrastructure expertise.
● Use as is: use a pretrained model without any training (possibly through the API). The model results
might be useful as-is, e.g. BERT embeddings (outputs of the model) can be used for similarity search,
or GPT-3 can be used to resume your texts (prompts).
How to use?

#GALACONNECTED
Transformers have a limited context size (a number of tokens in can consume on the input). Typically,
BERT-like models has a limit of 512 tokens, GPT-3 has a limit of 2048 tokens. Token is typically a subword
unit.
Caveats
https://colab.research.google.com/github/huggingface/transformers/blob/master/notebooks/01-training-tokenizers.ipynb

#GALACONNECTED
As you remember, GPT-like models perform
language modeling task:
predict the next word.
The model predicts (produces) an answer
given:
● A natural language description
of the task
● No/one/a few examples of the task
So, it is called zero-, one- or few-shot
in-context learning.
There is NO model training on user side at all.
GPT-3

#GALACONNECTED
The goal is to design a prompt (task description + examples) to solve your problem.
Actually, you need “to convince” the model to produce the expected result. It is called “prompt engineering”.
The model still has limited context of 2048 tokens, so it limits the total size of prompt + generated answer.
More on GPT-3: https://blog.inten.to/gpt-3-language-models-are-few-shot-learners-a13d1ae8b1f9
GPT-3

#GALACONNECTED
These models can produce high-quality word/token (sometimes sentence) representations, which can be
used for different purposes:
● Single sentence classification
● Sentence pair classification
● Similarity search
● Sentence tagging tasks
● ...
In some cases you can use the model as-is (e.g. for similarity search), but in many cases you need to
fine-tune the model for your special task.
Typically it is not suited for content generation.
DeBERTa is being integrated into the next version of the Microsoft Turing natural language representation
model (Turing NLRv4).
BERT/RoBERTa/DeBERTa/etc
http://jalammar.github.io/illustrated-bert/

#GALACONNECTED
BERT/RoBERTa/DeBERTa/etc

#GALACONNECTED
In T5 all NLP tasks are reframed into
a unified text-to-text-format where
the input and output are always text
strings.
Can be fine-tuned to many NLP
tasks, including machine translation,
document summarization, question
answering, and classification tasks
(e.g., sentiment analysis).
mT5 is a multilingual version of the
T5 transformer.
T5/mT5
https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html

#GALACONNECTED
Practical Applications
● Finding catchy titles for long descriptions
● Explaining complex things to second-graders
● Writing descriptions based on keywords
● Script generation
● Paraphrasing
● Source quality improvement for MT and chatbots
● Image captions for accessibility

#GALACONNECTED
Basic Copywriting
Finding a title that fits description
What other titles can fit this description?
GPT and other Text Transformers: Who's Leading Whom.. and to Where?
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from
Microsoft, and T5 from Google. They are capable of transforming the text in very creative and unexpected ways, like generating a
summary of an article, explaining complex concepts in a simple language, or synthesizing realistic datasets for AI training. Unlike
more traditional Machine Learning models, they do not require vast training datasets and can start based on just a few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future
of the content creation process. Are those models ready for prime time? What will happen to the professional content creators?
Will they be able to compete against such powerful models? Will we see GPT post-editing similar to MT post-editing? We will
share some answers we have based on the extensive experimenting and the first production projects that employ this new
technology.

#GALACONNECTED
“Transformer Models in Context of Content Creation”
No cherry-picking. No edits.
Basic Copywriting

#GALACONNECTED
“The Content Creator’s New Nemesis: A Text Transformer in Your Pocket”
Basic Copywriting

#GALACONNECTED
“Generative Text Transformers: The Future of Content Creation?”
Basic Copywriting

#GALACONNECTED
“Text Transformers: Artiﬁcial Intelligence for Content Creation”
Basic Copywriting

#GALACONNECTED
“What Will Happen to Content Creation? – Deep Learning Models Are Changing the Rules”
Basic Copywriting

#GALACONNECTED
“What Will Happen to Content Creation? – Deep Learning Models Are Changing the Rules”
“Text Transformers for Content Creation: The Good, Bad, and Ugly”
Basic Copywriting

#GALACONNECTED
Text simplification
What is this talk really about?
Let’s explain it for a second grader!
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from Microsoft, and T5 from Google. They are
capable of transforming the text in very creative and unexpected ways, like generating a summary of an article, explaining complex concepts in a simple language, or
synthesizing realistic datasets for AI training. Unlike more traditional Machine Learning models, they do not require vast training datasets and can start based on just a
few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future of the content creation process. Are
those models ready for prime time? What will happen to the professional content creators? Will they be able to compete against such powerful models? Will we see
GPT post-editing similar to MT post-editing? We will share some answers we have based on the extensive experimenting and the first production projects that
employ this new technology.

#GALACONNECTED
Text simplification
What is this talk really about?
“Computers that can write stories like Mommy. And Daddy.”
Let’s explain it for a second grader!
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from Microsoft, and T5 from Google. They are
capable of transforming the text in very creative and unexpected ways, like generating a summary of an article, explaining complex concepts in a simple language, or
synthesizing realistic datasets for AI training. Unlike more traditional Machine Learning models, they do not require vast training datasets and can start based on just a
few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future of the content creation process. Are
those models ready for prime time? What will happen to the professional content creators? Will they be able to compete against such powerful models? Will we see
GPT post-editing similar to MT post-editing? We will share some answers we have based on the extensive experimenting and the first production projects that
employ this new technology.

#GALACONNECTED
Unisex, Cosplay, Masquerade,
Cotton, Black, Cartoon, Funny,
Teeth, Half, Face, Mouth, Mask
Basic Copywriting
Write a description based on keywords

#GALACONNECTED
This is a half face mask that covers the mouth and the
lower part of the face. It is made of cotton and is black
in color. It has a cartoonish design of a mouth with teeth.
Unisex, Cosplay, Masquerade,
Cotton, Black, Cartoon, Funny,
Teeth, Half, Face, Mouth, Mask
Basic Copywriting
Write a description based on keywords

#GALACONNECTED
Agent: Good afternoon! Where are you ﬂying to today?
John: New York.
Agent: May I have your passport, please?
John: Here you go.
Agent: Are you checking any bags?
John: I have these two.
Agent: OK, please place your bags on the scale.
John: I have a stopover in Salt Lake City for 6 hours– do I need to pick up my
luggage there?
Agent: No, it’ll go straight through to New York. Here are your boarding passes
– your ﬂight leaves from gate 19B and it’ll begin boarding at 6:30. Your seat
number is 11B.
John: Thanks.
Script writing
Let humans focus on diversity and subject matter
Vessel: Aircraft
Customer: John
Destination: New York
Bags: 2

#GALACONNECTED
Agent: Good afternoon! Where are you flying to today?
Ivan: Moscow.
Agent: May I have your passport, please?
Ivan: Here you go.
Agent: Are you checking any bags?
Ivan: No, I just need a boarding pass.
Agent: OK, here are your boarding passes. Your flight leaves from gate 2A and it’ll
begin boarding at 1:30. Your seat number is 21B.
Ivan: What time does the flight land?
Agent: It arrives in Moscow at 10:30 in the morning.
Ivan: Thanks.
Vessel: Aircraft
Customer: Ivan
Destination: Moscow
Bags: none
Script writing

#GALACONNECTED
Vessel: Bus
Customer: Mary
Destination: Sacramento
Bags: 1
Script writing
Agent: Good afternoon! Where are you traveling from today?
Mary: I am traveling from Sacramento.
Agent: May I have your ticket, please?
Mary: Here you go.
Agent: Thank you for choosing Greyhound. Your bus will arrive from the west at gate
2. You can pick up your luggage at the baggage claim.
Mary: Will my luggage be safe if I don’t pick it up at the baggage claim?
Agent: Yes, your luggage will be safe. You can pick it up at the baggage claim after
you arrive at your destination.
Mary: OK.
Agent: Have a great trip!

#GALACONNECTED
Paraphrasing
Gеtting more intent-speciﬁc utterances
“I have a stopover in <location>Salt Lake City</location> for <duration>6
hours</duration>– do I need to pick up my luggage there?”

#GALACONNECTED
“Please, clarify for me: am I required to pick up my luggage in <location>Salt Lake
City</location> if I have a stopover there for <duration>6 hours</duration>?”
Paraphrasing
Gеtting more intent-speciﬁc utterances
hours</duration>– do I need to pick up my luggage there?”
hours</duration> and I wonder if I need to pick up my luggage there?”
“Could you please tell me if I have to pick up my luggage in <location>Salt Lake
City</location>? I have a stopover there for <duration>6 hours</duration>.”

#GALACONNECTED
Source quality improvement
Help MT to understand your audience
IDK y ppl lik txt slng
bc its rlly hrd 2 rd +
cnfsing 2

#GALACONNECTED
cnfsing 2
MT (EN > ES)
IDK y ppl como txt slng
cnfsing 2

#GALACONNECTED
cnfsing 2
MT (EN > ES)
cnfsing 2
I don't know why
people like text-speak
because it's really
hard to read and
confusing to
comprehend.

#GALACONNECTED
cnfsing 2
MT (EN > ES)
cnfsing 2
I don't know why
people like text-speak
because it's really
hard to read and
confusing to
comprehend.
MT (EN > ES)
No sé por qué a la
gente le gusta el
lenguaje de texto
porque es realmente
difícil de leer y confuso
de comprender.

#GALACONNECTED
“I wantto cansel my subscptn to your service nd i would of
done it myself if could of ﬁguredouthow to ﬁnd the link that
doesnt work when i click on it can u tell me how to get this
done so I can just move on and not keep having it chargd to
my friggin creditcard ok? do u get it? Huh?”
Helping chatbots to understand your audience
Do you think your chatbot is happy with
this? Neither your human support agent!

#GALACONNECTED
Better, huh?
“I would like to cancel my subscription to your service, and I
would have done it myself if I could have ﬁgured out how to
ﬁnd the link that doesn't work when I click on it. Can you tell
me how to get this done so I can just move on and not keep
having it charged to my credit card”
Helping chatbots to understand your audience

#GALACONNECTED
Accessibility
Generate image captions automatically

#GALACONNECTED
Girl reading a book next to
a ﬁreplace on Christmas
Eve.
Accessibility
Generate image captions automatically

#GALACONNECTED
• The generated output must be reviewed and edited by
humans to make sure it does not contain harmful biases.
• Also, because it may end up published in the Internet and
used for further training.
• Much like with MT, eﬀective post-editing requires workforce
re-training, workﬂows and tools
• Commercial / production applications of GPT-3 must be
approved by the Open AI’s Safety Board.
Some Remarks
There must be humans in the loop

#GALACONNECTED
• Transformers are capable of authoring content at near-human
quality (especially compared to cheap outsourcing)
• They can deviate from the task (much like humans)
• They may expose biases and lack of expertise (much like humans)
• “All things post-editing” is coming.
• Proper post-editing tools, workﬂows, and business models to be
developed.
Conclusions

#GALACONNECTED
Q&A
Grigory Sapunov, Konstantin Savenkov
(ks@inten.to)
Intento (https://inten.to)

GPT and other Text Transformers: Black Swans and Stochastic Parrots

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à GPT and other Text Transformers: Black Swans and Stochastic Parrots

Similaire à GPT and other Text Transformers: Black Swans and Stochastic Parrots (20)

Plus de Konstantin Savenkov

Plus de Konstantin Savenkov (20)

Dernier

Dernier (20)

GPT and other Text Transformers: Black Swans and Stochastic Parrots