SlideShare une entreprise Scribd logo
1  sur  33
Simple and Effective Knowledge-Driven Query Expansion
for QA-Based Product Attribute Extraction
Keiji Shinzato1
1) Rakuten Institute of Technology, Rakuten Group, Inc.
2) Institute of Industrial Science, the University of Tokyo
Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1
ACL 2022 short paper
1
⾃⼰紹介
• 新⾥ 圭司
• Lead Scientist, Rakuten Institute of Technology Americas
• 経歴
• 2004 – 2006: 北陸先端科学技術⼤学院⼤学 博⼠後期課程(⿃澤研)
• 2006 – 2011: 京都⼤学⼤学院情報学研究科 特定助教・研究員(⿊橋研)
• 2011 – 2018: 楽天グループ株式会社 楽天技術研究所
• 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas
• 趣味・興味
• 料理
• クラフトビール
2
Crafted from sleek
spazzolato leather
(black). This is an
elegant carryall
that's perfect for
your essentials.
10"H x 13”W x 6"D.
Large Elegant Leather Bag - BLK
Goal: Organizing Enormous Products in E-commerce
• Business contribution
• Sophisticated product search and recommendation.
• Better understanding of customers on the marketplace.
Attribute Value
Color Black
Material Leather
Height 10 inch
Width 13 inch
Depth 6 inch
Attribute value extraction
The bag image is designed by pch.vector / Freepik
3
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed one thousand.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
QA-based approach
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
Adidas Running Shoes - 8.5 / White
Answer
4
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed 1K.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
BERT
QA model BERT-QA
[Wang+, 2020]
Adidas Running Shoes - 8.5 / White
Answer
5
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Number of Instances per Attribute on AliExpress Dataset
6
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
Number of Instances per Attribute on AliExpress Dataset
7
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Instances per Attribute on AliExpress Dataset
8
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Labels per Attribute on AliExpress Dataset
How can we obtain effective query representation
for rare and ambiguous attributes?
9
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand
10
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Imperfect
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand
11
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
12
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
13
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
Knowledge dropout
14
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
Knowledge dropout
15
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout
16
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout
17
Experimental Settings
• Perform experiments using cleaned AE-pub dataset.
• We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by
removing 736 near-duplicated tuples.
• Each entry consists of a tuple of <product title, attribute, value>.
• Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2.
Train Dev. Test
# of tuples 76,823 10,975 21,950
# of tuples with NULL 15,097 2,201 4,259
# of unique attribute-value pairs 11,819 2,680 4,431
# of unique attributes 1,801 635 872
# of unique values 9,317 2,258 3,671
Statistics of the cleaned AE-pub dataset
18
Baselines
• Dictionary matching
• SUOpenTag [Xu+, 2019]
• AVEQA [Wang+, 2020]
• BERT-QA [Wang+, 2020]
SUOpenTag AVEQA
19
Baselines
• Dictionary matching
• SUOpenTag [Xu+, 2019]
• AVEQA [Wang+, 2020]
• BERT-QA [Wang+, 2020]
SUOpenTag AVEQA
BERT-QA
20
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
BERT-QA +vals +drop +mixing outperformed the baseline methods.
21
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
BERT-QA +vals learns to find strings that are similar to ones retrieved from the training data.
22
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
Knowledge dropout and knowledge token mixing improve both macro and micro F1 performance.
23
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
24
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
25
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Query expansion can generate more informative queries than ambiguous attributes alone.
26
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Query expansion is effective for rare attributes more than frequent attributes.
27
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Model could use more parameters to solve the task itself by taking the internal knowledge
induced from the training data as runtime input.
28
Example Outputs
Context
Query
Gold
Prediction
Attribute Values BERT-QA
BERT-QA w/ query
expansion
aeronova bicycle carbon
mtb handlebar
mountain bikes flat
handlebar mtb
integrated handlebars
with stem bike
accessories
function 1
skiing goggles,
carbon road
bicycle
handlebar,
cycling glasses,
bicycle mask,
gas mask, …
carbon mtb handlebar
bicycle carbon mtb
handlebar L
carbon mtb handlebar
J
lfp 3.2v 100ah lifepo4
prismatic cell deep cycle
diy lithium ion battery
72v 60v 48v 24v 100ah
200ah ev solar storage
battery
nominal
capacity
14ah, 40ah,
17.4ah
100ah 3.2v 100ah L 100ah J
camel outdoor softshell
men’s hiking jacket
wind- proof thermal
jacket for camping ski
thick warm coats
suitable
men, camping,
kids,
saltwater/fresh
water, women,
4-15y, mtb
cycling shoes, …
men men J camping L
29
Conclusions
• Knowledge-driven query expansion for QA-based product attribute
extraction.
• We construct the knowledge from training data, and use it to induce better query
representation.
• Two tricks to mimic the imperfection of the knowledge.
• Knowledge dropout and knowledge token mixing.
• Our query expansion is effective, especially for rare and ambiguous attributes.
30
論⽂で触れていない話
• 評価実験と実際の利⽤シーンの乖離
• 評価実験︓先⾏研究も含め、正解属性が与えられている
• 実際の利⽤シーン︓正解属性はわからない
• QA-based modelの実⽤性
• 属性を変えて複数回モデルを⾛らせる必要がある
• どの属性について値を抽出したいのか事前に知っておく必要がある
• Eコマースサイトによってはマスターデータを参照すれば絞り込み可能
31
属性値抽出の今後
• 属性値抽出 à NERとなりがち
• NERベースの⼿法の問題
• 抽出された値の正規化が必要(D&G à Dolce & Gabbana)
• 属性値をアノテーションする場合、正解を定義するのが難しい
• 既存の商品データから学習データを⾃動⽣成すると誤ったアノテーションが含まれる
• 商品タイトル: ジャーナルスタンダード ジーンズ スタンダードフィット
• 属性︓<ブランド, ジャーナルスタンダード>, <パンツ脚幅, スタンダード>
• 値の種類がめったに増えない属性もある(e.g., ⾊、⽣産国)
• NER以外のアプローチ
• 分類として解く
• Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction
• ⽣成として解く
• 研究中
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Contenu connexe

Tendances

楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A SurveyToru Tamaki
 
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのかTechon Organization
 
IoT Platform Alliance Map 2022January
IoT Platform Alliance Map 2022JanuaryIoT Platform Alliance Map 2022January
IoT Platform Alliance Map 2022JanuaryKeiichiro Nabeno
 
先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際Tetsutaro Watanabe
 
楽天サービスとインフラ部隊
楽天サービスとインフラ部隊楽天サービスとインフラ部隊
楽天サービスとインフラ部隊Rakuten Group, Inc.
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理Takeshi Yamamuro
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてSho Takase
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Informationharmonylab
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクスRakuten Group, Inc.
 
機械学習で泣かないためのコード設計 2018
機械学習で泣かないためのコード設計 2018機械学習で泣かないためのコード設計 2018
機械学習で泣かないためのコード設計 2018Takahiro Kubo
 
協調フィルタリング入門
協調フィルタリング入門協調フィルタリング入門
協調フィルタリング入門hoxo_m
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者cvpaper. challenge
 
Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )cvpaper. challenge
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!TransformerArithmer Inc.
 

Tendances (20)

楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey
 
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
 
IoT Platform Alliance Map 2022January
IoT Platform Alliance Map 2022JanuaryIoT Platform Alliance Map 2022January
IoT Platform Alliance Map 2022January
 
先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際
 
楽天サービスとインフラ部隊
楽天サービスとインフラ部隊楽天サービスとインフラ部隊
楽天サービスとインフラ部隊
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
 
MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理MLflowによる機械学習モデルのライフサイクルの管理
MLflowによる機械学習モデルのライフサイクルの管理
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクス
 
機械学習で泣かないためのコード設計 2018
機械学習で泣かないためのコード設計 2018機械学習で泣かないためのコード設計 2018
機械学習で泣かないためのコード設計 2018
 
協調フィルタリング入門
協調フィルタリング入門協調フィルタリング入門
協調フィルタリング入門
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者
 
Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 

Similaire à Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceDVClub
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applicationsAnastasiοs Antoniadis
 
Spark Meetup July 2015
Spark Meetup July 2015Spark Meetup July 2015
Spark Meetup July 2015Debasish Das
 
Perf onjs final
Perf onjs finalPerf onjs final
Perf onjs finalqi yang
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overviewJesse Wang
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setKognitio
 
EMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for ItemisEMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for ItemisIstvan Rath
 
A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...Tetsuo Kimura
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Somnath Banerjee
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 
AKUDA Labs: Pulsar
AKUDA Labs: PulsarAKUDA Labs: Pulsar
AKUDA Labs: PulsarAKUDA Labs
 
Counterfeit Detection Services
Counterfeit Detection ServicesCounterfeit Detection Services
Counterfeit Detection ServicesUSBid Inc.
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldArmonDadgar
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalramya marichamy
 
O'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsO'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsChris Bailey
 
Searching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done RightSearching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done RightSkyhigh Networks
 
Boston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chrisBoston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chrisObsidian Software
 
Insights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorInsights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorDVClub
 

Similaire à Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction (20)

Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applications
 
Spark Meetup July 2015
Spark Meetup July 2015Spark Meetup July 2015
Spark Meetup July 2015
 
Perf onjs final
Perf onjs finalPerf onjs final
Perf onjs final
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
EMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for ItemisEMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for Itemis
 
A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
AKUDA Labs: Pulsar
AKUDA Labs: PulsarAKUDA Labs: Pulsar
AKUDA Labs: Pulsar
 
Counterfeit Detection Services
Counterfeit Detection ServicesCounterfeit Detection Services
Counterfeit Detection Services
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
O'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsO'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud Economics
 
Searching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done RightSearching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done Right
 
Boston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chrisBoston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chris
 
Insights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorInsights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network Processor
 

Plus de Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 
モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側Rakuten Group, Inc.
 
楽天のインフラ事情 2022
楽天のインフラ事情 2022楽天のインフラ事情 2022
楽天のインフラ事情 2022Rakuten Group, Inc.
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container ChallengesRakuten Group, Inc.
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Rakuten Group, Inc.
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Rakuten Group, Inc.
 

Plus de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 
モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側
 
楽天のインフラ事情 2022
楽天のインフラ事情 2022楽天のインフラ事情 2022
楽天のインフラ事情 2022
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

  • 1. Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction Keiji Shinzato1 1) Rakuten Institute of Technology, Rakuten Group, Inc. 2) Institute of Industrial Science, the University of Tokyo Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1 ACL 2022 short paper
  • 2. 1 ⾃⼰紹介 • 新⾥ 圭司 • Lead Scientist, Rakuten Institute of Technology Americas • 経歴 • 2004 – 2006: 北陸先端科学技術⼤学院⼤学 博⼠後期課程(⿃澤研) • 2006 – 2011: 京都⼤学⼤学院情報学研究科 特定助教・研究員(⿊橋研) • 2011 – 2018: 楽天グループ株式会社 楽天技術研究所 • 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas • 趣味・興味 • 料理 • クラフトビール
  • 3. 2 Crafted from sleek spazzolato leather (black). This is an elegant carryall that's perfect for your essentials. 10"H x 13”W x 6"D. Large Elegant Leather Bag - BLK Goal: Organizing Enormous Products in E-commerce • Business contribution • Sophisticated product search and recommendation. • Better understanding of customers on the marketplace. Attribute Value Color Black Material Leather Height 10 inch Width 13 inch Depth 6 inch Attribute value extraction The bag image is designed by pch.vector / Freepik
  • 4. 3 From NER-Based to QA-Based Attribute Value Extraction • Existing Named Entity Recognition (NER)-based approach to attribute value extraction suffers from data sparseness problem. • Number of classes (attributes) in attribute value extraction can exceed one thousand. • Question Answering (QA)-based approach to attribute value extraction alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020]. QA-based approach Adidas Running Shoes - 8.5 / White[SEP]Brand Context Query Adidas Running Shoes - 8.5 / White Answer
  • 5. 4 Adidas Running Shoes - 8.5 / White[SEP]Brand Context Query From NER-Based to QA-Based Attribute Value Extraction • Existing Named Entity Recognition (NER)-based approach to attribute value extraction suffers from data sparseness problem. • Number of classes (attributes) in attribute value extraction can exceed 1K. • Question Answering (QA)-based approach to attribute value extraction alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020]. BERT QA model BERT-QA [Wang+, 2020] Adidas Running Shoes - 8.5 / White Answer
  • 6. 5 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Number of Instances per Attribute on AliExpress Dataset
  • 7. 6 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes Number of Instances per Attribute on AliExpress Dataset
  • 8. 7 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes • Ambiguous attributes • function 1, suitable, sort, etc. Number of Instances per Attribute on AliExpress Dataset
  • 9. 8 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes • Ambiguous attributes • function 1, suitable, sort, etc. Number of Labels per Attribute on AliExpress Dataset How can we obtain effective query representation for rare and ambiguous attributes?
  • 10. 9 Knowledge-Driven Query Expansion for QA-based AE (1/3) Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) BERT-QA Title[SEP]Attribute[SEP]Values Context Exploit attribute values in training data as run-time knowledge to induce better query representation CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah Query Zipp Battery 12V 14AH SLA… Nominal capacity Brand
  • 11. 10 Knowledge-Driven Query Expansion for QA-based AE (1/3) Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) BERT-QA Title[SEP]Attribute[SEP]Values Context Exploit attribute values in training data as run-time knowledge to induce better query representation CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah Query Imperfect Zipp Battery 12V 14AH SLA… Nominal capacity Brand
  • 12. 11 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query
  • 13. 12 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query
  • 14. 13 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query Knowledge dropout
  • 15. 14 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query Knowledge dropout
  • 16. 15 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP][Un/seen]Attribute[SEP]Values Context Query Knowledge dropout
  • 17. 16 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah BERT-QA CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted Title[SEP][Un/seen]Attribute[SEP]Values Context Query Knowledge dropout
  • 18. 17 Experimental Settings • Perform experiments using cleaned AE-pub dataset. • We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by removing 736 near-duplicated tuples. • Each entry consists of a tuple of <product title, attribute, value>. • Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2. Train Dev. Test # of tuples 76,823 10,975 21,950 # of tuples with NULL 15,097 2,201 4,259 # of unique attribute-value pairs 11,819 2,680 4,431 # of unique attributes 1,801 635 872 # of unique values 9,317 2,258 3,671 Statistics of the cleaned AE-pub dataset
  • 19. 18 Baselines • Dictionary matching • SUOpenTag [Xu+, 2019] • AVEQA [Wang+, 2020] • BERT-QA [Wang+, 2020] SUOpenTag AVEQA
  • 20. 19 Baselines • Dictionary matching • SUOpenTag [Xu+, 2019] • AVEQA [Wang+, 2020] • BERT-QA [Wang+, 2020] SUOpenTag AVEQA BERT-QA
  • 21. 20 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) BERT-QA +vals +drop +mixing outperformed the baseline methods.
  • 22. 21 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) BERT-QA +vals learns to find strings that are similar to ones retrieved from the training data.
  • 23. 22 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) Knowledge dropout and knowledge token mixing improve both macro and micro F1 performance.
  • 24. 23 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values.
  • 25. 24 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model
  • 26. 25 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Query expansion can generate more informative queries than ambiguous attributes alone.
  • 27. 26 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Query expansion is effective for rare attributes more than frequent attributes.
  • 28. 27 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Model could use more parameters to solve the task itself by taking the internal knowledge induced from the training data as runtime input.
  • 29. 28 Example Outputs Context Query Gold Prediction Attribute Values BERT-QA BERT-QA w/ query expansion aeronova bicycle carbon mtb handlebar mountain bikes flat handlebar mtb integrated handlebars with stem bike accessories function 1 skiing goggles, carbon road bicycle handlebar, cycling glasses, bicycle mask, gas mask, … carbon mtb handlebar bicycle carbon mtb handlebar L carbon mtb handlebar J lfp 3.2v 100ah lifepo4 prismatic cell deep cycle diy lithium ion battery 72v 60v 48v 24v 100ah 200ah ev solar storage battery nominal capacity 14ah, 40ah, 17.4ah 100ah 3.2v 100ah L 100ah J camel outdoor softshell men’s hiking jacket wind- proof thermal jacket for camping ski thick warm coats suitable men, camping, kids, saltwater/fresh water, women, 4-15y, mtb cycling shoes, … men men J camping L
  • 30. 29 Conclusions • Knowledge-driven query expansion for QA-based product attribute extraction. • We construct the knowledge from training data, and use it to induce better query representation. • Two tricks to mimic the imperfection of the knowledge. • Knowledge dropout and knowledge token mixing. • Our query expansion is effective, especially for rare and ambiguous attributes.
  • 31. 30 論⽂で触れていない話 • 評価実験と実際の利⽤シーンの乖離 • 評価実験︓先⾏研究も含め、正解属性が与えられている • 実際の利⽤シーン︓正解属性はわからない • QA-based modelの実⽤性 • 属性を変えて複数回モデルを⾛らせる必要がある • どの属性について値を抽出したいのか事前に知っておく必要がある • Eコマースサイトによってはマスターデータを参照すれば絞り込み可能
  • 32. 31 属性値抽出の今後 • 属性値抽出 à NERとなりがち • NERベースの⼿法の問題 • 抽出された値の正規化が必要(D&G à Dolce & Gabbana) • 属性値をアノテーションする場合、正解を定義するのが難しい • 既存の商品データから学習データを⾃動⽣成すると誤ったアノテーションが含まれる • 商品タイトル: ジャーナルスタンダード ジーンズ スタンダードフィット • 属性︓<ブランド, ジャーナルスタンダード>, <パンツ脚幅, スタンダード> • 値の種類がめったに増えない属性もある(e.g., ⾊、⽣産国) • NER以外のアプローチ • 分類として解く • Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction • ⽣成として解く • 研究中