The document proposes a knowledge-driven query expansion approach for question answering (QA)-based product attribute extraction. It trains QA models using attribute-value pairs from training data as knowledge, while mimicking imperfect knowledge at test time through techniques like knowledge dropout and token mixing. This helps induce better query representations, especially for rare and ambiguous attributes. Experiments on a cleaned product attribute dataset show the proposed approach with all techniques outperforms baseline methods in both macro and micro F1 scores.
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction
1. Simple and Effective Knowledge-Driven Query Expansion
for QA-Based Product Attribute Extraction
Keiji Shinzato1
1) Rakuten Institute of Technology, Rakuten Group, Inc.
2) Institute of Industrial Science, the University of Tokyo
Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1
ACL 2022 short paper
2. 1
⾃⼰紹介
• 新⾥ 圭司
• Lead Scientist, Rakuten Institute of Technology Americas
• 経歴
• 2004 – 2006: 北陸先端科学技術⼤学院⼤学 博⼠後期課程(⿃澤研)
• 2006 – 2011: 京都⼤学⼤学院情報学研究科 特定助教・研究員(⿊橋研)
• 2011 – 2018: 楽天グループ株式会社 楽天技術研究所
• 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas
• 趣味・興味
• 料理
• クラフトビール
3. 2
Crafted from sleek
spazzolato leather
(black). This is an
elegant carryall
that's perfect for
your essentials.
10"H x 13”W x 6"D.
Large Elegant Leather Bag - BLK
Goal: Organizing Enormous Products in E-commerce
• Business contribution
• Sophisticated product search and recommendation.
• Better understanding of customers on the marketplace.
Attribute Value
Color Black
Material Leather
Height 10 inch
Width 13 inch
Depth 6 inch
Attribute value extraction
The bag image is designed by pch.vector / Freepik
4. 3
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed one thousand.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
QA-based approach
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
Adidas Running Shoes - 8.5 / White
Answer
5. 4
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed 1K.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
BERT
QA model BERT-QA
[Wang+, 2020]
Adidas Running Shoes - 8.5 / White
Answer
6. 5
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Number of Instances per Attribute on AliExpress Dataset
7. 6
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
Number of Instances per Attribute on AliExpress Dataset
8. 7
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Instances per Attribute on AliExpress Dataset
9. 8
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Labels per Attribute on AliExpress Dataset
How can we obtain effective query representation
for rare and ambiguous attributes?
10. 9
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand
11. 10
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Imperfect
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand
12. 11
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
13. 12
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
14. 13
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
Knowledge dropout
15. 14
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
Knowledge dropout
16. 15
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout
17. 16
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout
18. 17
Experimental Settings
• Perform experiments using cleaned AE-pub dataset.
• We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by
removing 736 near-duplicated tuples.
• Each entry consists of a tuple of <product title, attribute, value>.
• Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2.
Train Dev. Test
# of tuples 76,823 10,975 21,950
# of tuples with NULL 15,097 2,201 4,259
# of unique attribute-value pairs 11,819 2,680 4,431
# of unique attributes 1,801 635 872
# of unique values 9,317 2,258 3,671
Statistics of the cleaned AE-pub dataset
24. 23
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
25. 24
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
26. 25
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Query expansion can generate more informative queries than ambiguous attributes alone.
27. 26
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Query expansion is effective for rare attributes more than frequent attributes.
28. 27
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Model could use more parameters to solve the task itself by taking the internal knowledge
induced from the training data as runtime input.
29. 28
Example Outputs
Context
Query
Gold
Prediction
Attribute Values BERT-QA
BERT-QA w/ query
expansion
aeronova bicycle carbon
mtb handlebar
mountain bikes flat
handlebar mtb
integrated handlebars
with stem bike
accessories
function 1
skiing goggles,
carbon road
bicycle
handlebar,
cycling glasses,
bicycle mask,
gas mask, …
carbon mtb handlebar
bicycle carbon mtb
handlebar L
carbon mtb handlebar
J
lfp 3.2v 100ah lifepo4
prismatic cell deep cycle
diy lithium ion battery
72v 60v 48v 24v 100ah
200ah ev solar storage
battery
nominal
capacity
14ah, 40ah,
17.4ah
100ah 3.2v 100ah L 100ah J
camel outdoor softshell
men’s hiking jacket
wind- proof thermal
jacket for camping ski
thick warm coats
suitable
men, camping,
kids,
saltwater/fresh
water, women,
4-15y, mtb
cycling shoes, …
men men J camping L
30. 29
Conclusions
• Knowledge-driven query expansion for QA-based product attribute
extraction.
• We construct the knowledge from training data, and use it to induce better query
representation.
• Two tricks to mimic the imperfection of the knowledge.
• Knowledge dropout and knowledge token mixing.
• Our query expansion is effective, especially for rare and ambiguous attributes.