PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

•

5 j'aime•1,264 vues

Victor Codina

Slides of my PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

Logiciels Formation Technologie

Exploiting distributional semantics for
Content-Based and Context-Aware
Recommendation
PhD in Artificial Intelligence
Victor Codina
Advisor: Luigi Ceccaroni
Universitat Politècnica de Catalunya
June, 2014

Information and choice overload problem
2

Recommender Systems help users to find
the right items through recommendations
3

Recommender Systems are a widely
adopted technology in many domains
4

Recommender system’s components
5
Knowledge
base
Recommender
Engine
User
Interface
Item
data
User
data

Main families of recommendation models
6
Collaborative
Filtering (CF)
Content-Based
(CB) Filtering
Context-aware
Recommendation
(CARS)
Item metadata
Ratings
Context
Target
user
Target
item
Predicted rating
LIMITATION:
Low accuracy in
data-sparsity
scenarios

 Exploitation of explicit semantic relationships
 to mitigate the data-sparsity problem
Existing solution: use the knowledge
contained in domain ontologies
77
Semantically-Enhanced
CB Filtering
Semantically-Enhanced
CARS
Item ontology
attribute
similarities
Context ontology
condition
similarities
castle monastery
Historic building
is-a
sunny cloudy
Weather
is-a

 Building and mantaining ontologies is expensive
 Ontologies are bounded by fixed representations
 They may not suit the data
Limitations of domain ontologies
8
rating dataontology
≠
domain expert

 Similarities automatically derived from the data itself
 Advantages:
 Collecting rating data is cheaper than building ontologies
 Not bounded by a fixed knowledge representations
 Fine-grained semantic similarities can be identified
Key idea: exploit distributional semantics
derived from rating data
9
rating data
semantic similarities

 Question 1: Is it possible to enhance content-based
recommendation by exploiting the distributional
semantics of item attributes?
 Question 2: Is it possible to enhance contextual
recommendation by exploiting the distributional
semantics of contextual conditions?
Research questions
10

Outline
11
Novel content-based approach (SCB)
Novel context-aware approach (SPF)
Distributional Semantics

Outline
12
Distributional Semantics
Distributional hypothesis
Semantic vector representation
Distributional similarity measures
Novel content-based approach (SCB)
Novel context-aware approach (SPF)

 The meaning of a concept is captured by its usage
Distributional Hypothesis:
“concepts that share similar usages share similar meaning”
 In Linguistics usages are regions of text:
• document
• paragraph
• sentence
Distributional hypothesis
13

Word s1 s2 s3 s4 s5 s6 s7
glass 2 1 0 1 0 2 0
wine 2 1 1 0 1 2 0
spoon
0 0 1 1 0 0 2
Semantic vector representation
14
frequency-based weight
“sentence 1”

 Cosine similarity is the most popular measure
 good accuracy in high-dimensional vector spaces
 Advantage: it can be used in combination with
dimensionality reduction techniques (SVD)
Distributional similarity measure
15
Glass
Wine
Spoon

Outline
16
Novel content-based approach (SCB)
Novel context-aware approach (SPF)
Distributional Semantics

 Assumption: two attributes are similar if several
users are interested in them similarly
Attribute User1 User2 User3 User4 User5 User6 User7
action 1 -0.7 0 0.9 0.1 -1 0
Bruce
Willis
0.7 -0.8 0.5 0.8 0.4 -0.2 0
comedy -0.5 0.7 0.2 -1 0.9 0.8 0.5
Distributional semantics of item’s
attributes derived from rating data
21
User6’s degree of interest in action movies
(“-1” = strong dislike, “1” = strong like)

 Rating data set statistics before and after pruning:
Evaluation using MovieLens data set
22
Original Pruned
Users 2.113 2.113
Movies 10.197 1.646
Attributes 6 4
Attributes
values
13.367 3.105
Ratings per user 404 235
Sparsity 96% 86%

Outline
26
Novel content-based approach (SCB)
Novel context-aware approach (SPF)
Distributional Semantics

Outline
27
Novel context-aware approach (SPF)
Limitations of traditional contextual pre-filtering
Semantic pre-filtering approach
Experimental evaluation
Context-aware recommendation
Rating-based distributional semantics of conditions
Novel content-based approach (SCB)
Distributional Semantics

Context matters
28
 Assumption: user’s experience depend on context

29
Context-aware recommendation
 Context as additional dimension for estimation
 Three main context-aware recommender families
target
context
predicted
ratingPrediction
model
in-context
ratings
target
Item
target
user
Pre-filtering Post-filteringContextual modeling

 Main limitation: its lack of flexibility
 Only ratings acquired in exactly the same context are used
 Hypothesis: ratings filtering can be enhanced by
exploiting semantic similarities between contexts
Traditional contextual pre-filtering
30
local
ratings
in-context
ratings
Ratings
filtering
Prediction
model
target context
predicted
rating

 Key idea: reuse ratings acquired in similar contexts
Semantic contextual pre-filtering
31
local
ratings
Ratings
filtering
Prediction
model
≈
≠semantic
similarities
in-context
ratings
target context
global threshold
predicted
rating

Distributional semantics of contextual
conditions derived from rating data
32
 Assumption: two contexts are similar if their
composing conditions influence ratings similarly
Condition User1 User2 User3 User4 User5 User6 User7
1 -0.7 0 0.9 0.1 -0.6 0
0.7 -0.8 0.5 0.8 0.4 -0.2 0
-0.5 0.7 0.2 -1 0.9 0.8 0.5
Influence of family context in User6’s ratings
(“<0” = negative, “0” = neutral, “>0” = positive)

 Six in-context rating data sets on diverse domains:
Evaluation data sets
UMAP – June 2013, Rome, Italy 33
Datasets Ratings Conditions
Context
granularity
Music 4013 26 1
Tourism 1358 57 3
Adom 1464 14 3
Comoda 2296 49 12
Movie 2190 29 2
Library 609K 149 4

Semantic Vs. traditional pre-filtering
34
% = MAE reduction with respect to a context-free MF model
(the higher, the better)
Semantic Traditional

SPF Vs. State of the art
35
% = MAE reduction with respect to the context-free MF model
(the higher, the better)
SPF (proposed method) UI-Splitting CAMF

Main contributions
36
Novel content-based approach (SCB)
Novel context-aware approach (SPF)
Distributional Semantics

 Better accuracy than state of the art in new user scenarios
Main contributions (III)
38
Semantic Content-Based filtering (SCB)
SCB (proposed method)
38
Ranking predictionRating prediction

 Method for computing the
distributional semantics of
contextual conditions
 Novel semantic pre-filtering
method that reuses ratings in
semantically similar contexts
Main contributions (IV)
39
Semantic Contextual Pre-filtering (SPF)

 Better accuracy than state of the art
Main contributions (V)
40
Semantic Contextual Pre-filtering (SPF)
SPF

Question 1?
YES. It is possible to enhance content-based
recommendation by exploiting the distributional
semantics of item’s attributes
Question 2?
YES. It is possible to enhance context-aware
recommendation by exploiting the distributional
semantics of contextual conditions
Conclusions
41

 Conference papers:
 CCIA 2010: Codina, V. & Ceccaroni, L. Taking advantage of semantics…
 DCAI 2010: Codina, V., & Ceccaroni, L. A Recommendation System for the…
 CCIA 2011: Codina, V., & Ceccaroni, L. Extending Recommendation Systems with…
 CCIA 2012: Codina, V., & Ceccaroni, L. Semantically-Enhanced Recommenders
 CARR 2013: Codina et al. Semantically-enhanced pre-filtering for…
 UMAP 2013: Codina et al. Exploiting the Semantic Similarity of Contextual…
 RecSys 2013: Codina et al. Local Context Modeling with Semantic Pre-filtering
 Journal paper:
 UMUAI (User Modeling and User-Adapted Interaction journal): Codina et al.
Distributional Semantic Pre-filtering in Context-Aware Recommender Systems.
2012 Impact Factor: 1.600 (current status: accepted)
Publications related to the thesis
42

Contenu connexe

En vedette

Contextual eVSM: a context-aware content-based recommendation framework based...

Cataldo Musto

The lived experience of Australian nurses working in disaster environments

Jamie Ranse

PhD oral defense_Yang

Yang Cong

PhD Oral Defense of Md Kafiul Islam on "ARTIFACT CHARACTERIZATION, DETECTION ...

Md Kafiul Islam

Thesis defence

Mr.akkaluk Moppijit

Slope one recommender on hadoop

YONG ZHENG

John McGaughey - Towards integrated interpretation

The University of Western Australia

IGARSS2011_final.pptx

grssieee

Oral defense b. henry

Dr. Bernard C. Henry

Day2 5 lentswe_hydrogeology_botswana

groundwatercop

defense_2

Shobeir K. S. Mazinani

[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach

YONG ZHENG

En vedette (12)

Contextual eVSM: a context-aware content-based recommendation framework based...

The lived experience of Australian nurses working in disaster environments

PhD oral defense_Yang

PhD Oral Defense of Md Kafiul Islam on "ARTIFACT CHARACTERIZATION, DETECTION ...

Thesis defence

Slope one recommender on hadoop

John McGaughey - Towards integrated interpretation

IGARSS2011_final.pptx

Oral defense b. henry

Day2 5 lentswe_hydrogeology_botswana

defense_2

[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach

Similaire à PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

Extending Recommendation Systems With Semantics And Context Awareness

Victor Codina

Synthese Recommender System

Andre Vellino

PhD defense

Giuseppe Ricci

Social Recommendation a Review.pptx

habiba abderrahim

Content Recommendation Through Linked Data

Iacopo Vagliano

Dynamic personalized recommendation on sparse data

JPINFOTECH JAYAPRAKASH

Bayesian Phylogenetics - Systematics.pptx

RyanLong78

"Content-based RecSys: problems, challenges & research directions"-UMAP'10, I...

Giovanni Semeraro

Dynamic personalized recommendation on sparse data

JPINFOTECH JAYAPRAKASH

acmsigtalkshare-121023190142-phpapp01.pptx

dongchangim30

User Reviews in the form of ratings giving an opportunity to judge the user interest on the available products and providing a chance to recommend new similar items to the customers. Personalized recommender techniques placing vital role in this grown ecommerce century to predict the users‟ interest. Collaborative Filtering (CF) system is one of the widely used democratic recommender system where it completely rely on user ratings to provide recommendations for the users. In this paper, an enhanced Collaborative Filtering system is proposed using Kernel Weighted K-means Clustering (KWKC) approach using Radial basis Functions (RBF) for eliminate the Sparsity problem where lack of rating is the challenge of providing the accurate recommendation to the user. The proposed system having two phases of state transitions: Connected and Disconnected. During Connected state the form of transition will be „Recommended mode‟ where the active user be given with the Predicted-recommended items. In Disconnected State the form of transition will be „Learning mode‟ where the hybrid learning approach and user clusters will be used to define the similar user models. Disconnected State activities will be performed in hidden layer of RBF and Connected Sate activities will be performed in output Layer. Input Layer of RBF using original user Ratings. The proposed KWKC used to smoothen the sparse original rating matrix and define the similar user clusters. A benchmark comparative study also made with classical learning and prediction techniques in terms of accuracy and computational time. Experiential setup is made using MovieLens dataset.

An enhanced kernel weighted collaborative recommended system to alleviate spa...

IJECEIAES

Ph.D VIVA Slides presenting Discriminate2Rec, a discriminative temporal interest-based content-based recommendation framework that employs a novel three-stage preference learning model that discriminates between items’ attributes based on their influence on user temporal preferences to improve both temporal and semantical attribute-level profile coherence for more accurate recommendation. We exploit different user-dependent and item/attribute-dependent temporal dynamics to infer positive and negative user-attribute temporal interest weights. Also, we introduce a negation modelling technique to model user-attribute negative interests, which allows us to learn attribute-level coherent user profiles.

Discriminate2Rec: A Discriminative Temporal Interest-based Content-based Reco...

NajiAlbatayneh

Invited talk @Roma La Sapienza, April '07

Paolo Missier

Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...

Paolo Missier

Overview of recommender system

Stanley Wang

Rokach-GomaxSlides.pptx

Jadna Almeida

Rokach-GomaxSlides (1).pptx

Jadna Almeida

Linked Administrative Data and Adaptive Design

MickeyJackson3

Slides sem on pls-complete

Dr Hemant Sharma

GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...

ChemAxon

Similaire à PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation (20)

Extending Recommendation Systems With Semantics And Context Awareness

Synthese Recommender System

PhD defense

Social Recommendation a Review.pptx

Content Recommendation Through Linked Data

Dynamic personalized recommendation on sparse data

Bayesian Phylogenetics - Systematics.pptx

"Content-based RecSys: problems, challenges & research directions"-UMAP'10, I...

Dynamic personalized recommendation on sparse data

acmsigtalkshare-121023190142-phpapp01.pptx

An enhanced kernel weighted collaborative recommended system to alleviate spa...

Discriminate2Rec: A Discriminative Temporal Interest-based Content-based Reco...

Invited talk @Roma La Sapienza, April '07

Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...

Overview of recommender system

Rokach-GomaxSlides.pptx

Rokach-GomaxSlides (1).pptx

Linked Administrative Data and Adaptive Design

Slides sem on pls-complete

GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...

Dernier

Define the academic and professional writing..pdf

PearlKirahMaeRagusta1

+971565801893 Mtp-Kit (500MG) Prices » Dubai [(+971565801893**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Leen Whatsapp +971565801893 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971565801893''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971565801893' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Clinic in Abu Dhabi, United Arab Emirates.+971565801893

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Health

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

9953056974 Low Rate Call Girls In Saket, Delhi NCR

InShot proinshot.com stands tall among its peers as the ultimate video editing app, offering simplicity, versatility, and power in one package. With its intuitive interface and comprehensive feature set, InShot caters to both beginners and seasoned editors alike. Whether you're creating content for social media, YouTube, or personal projects, InShot empowers you to unleash your creativity and transform your videos into captivating masterpieces. Join the millions of users who trust InShot https://www.proinshot.com/ for all their video editing needs and discover the difference for yourself!

Exploring the Best Video Editing App.pdf

proinshot.com

Diamond Application Development Crafting Solutions with Precision

SolGuruz

HR Software Buyers Guide in 2024 - HRSoftware.com

Fatema Valibhai

Azure Native Qumulo scales elastically for common High Performance Compute (HPC) workloads based on application requirements for: Financial Services, Automotive, Genomics / Life Sciences, Media and Entertainment, Energy, Oil and Gas, etc. Performance can be dialed UP (and back down) much higher than the examples shown here. These slides offer a glimpse into ANQ's HPC capabilities, although at a smaller scale. We invite YOU to do your own testing (with a free ANQ trial) and work with us to test your HPC workloads in Azure.

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

ryanfarris8

Craft an AI & Machine Learning Pitch with our Editable Professional PowerPoint Template. Ignite your AI & Machine Learning pitch with our cutting-edge PowerPoint template tailored for the industry. Perfect for AI conferences, investor presentations, sales pitches to tech-focused companies, training sessions, and educational programs. - 20+ editable slides: Get a variety of options to choose from for your presentation. - Time-saving solution: Download, replace text/images with a few clicks. - User-friendly customization: Easy to use and personalize. - Modern and attractive design: Captivating visuals, sleek layout. - Tailored to your requirements: Fully alterable for customization. - Well-organized slides: Complete control over content. - Thematic specificity: Reflects healthcare industry with relevant graphics. - Showcase your business idea: Communicate value proposition effectively.

AI & Machine Learning Presentation Template

Presentation.STUDIO

Many specialized tools cater to distinct stages within the software development lifecycle (SDLC). These tools target various aspects of development, delivery, and operations, each with its unique strengths. Uniting these diverse testing needs into a single continuous testing platform presents several challenges. Such a platform must seamlessly integrate with various development tools and environments, accommodate different testing methodologies, and remain flexible to adapt to organizational processes and quality standards.

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

kalichargn70th171

Test automation is a cornerstone of software development and quality assurance in today's rapidly evolving digital landscape. Its significance cannot be overstated. Businesses can enhance efficiency, productivity, and accelerate software delivery to market through automation, streamlining testing processes effectively. This comprehensive guide addresses the best practices for test automation in 2024. It offers a detailed checklist to empower you to optimize your automation efforts and maintain a competitive edge.

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

kalichargn70th171

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

OnePlan Solutions

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

kalichargn70th171

How To Use Server-Side Rendering with Nuxt.js

Andolasoft Inc

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Philip Schwarz

5 Signs You Need a Fashion PLM Software.pdf

Wave PLM

Data spaces in distributed environments should be allowed to evolve in agile ways providing data space owners with large flexibility about which data they store. Agility and heterogeneity, however, jeopardize data exchanges because representations may build on varying ontologies and data consumers may not rely on the semantic correctness of their queries in the context of semantically heterogeneous, evolving data spaces. Graph data spaces are one example of a powerful model for representing and querying data whose semantics may change over time. To assert and enforce conditions on individual graph data spaces, shape languages (e.g SHACL) have been developed. We investigate the question of how querying and programming can be guarded by reasoning over SHACL constraints in a distributed setting and we sketch a picture of how a future landscape based on semantically heterogeneous data spaces might look like.

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Steffen Staab

A great deal of attention in medical devices has shifted towards cybersecurity with the ratification of section 524B of the FD&C act. This new law enables the FDA to enforce cybersecurity controls in any medical device that is capable of networked communications or that has software. In this webinar we will recap the process for managing vulnerabilities, identify categories of vulnerabilities and solutions and more.

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

ICS

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

ThousandEyes

At TECUNIQUE, we're a stable and steadily growing Indian software services company with over 14 years of industry experience. Specializing in offshore software development and quality assurance services, we've built a reputation for delivering unique and effective solutions to start-ups, software development companies, enterprises, and digital agencies. We pride ourselves on our commitment to excellence and innovation. By blending insightful business domain knowledge with exceptional technical prowess, we craft tailor-made solutions that meet the unique needs of our clients. Our dedicated teams are adept in specific technologies, ensuring seamless integration of skills and delivering reliable, scalable, and high-quality software solutions aligned with our clients' preferences. Bespoke Dedicated Teams: Crafted to meet your specific needs and technology preferences, our dedicated teams are committed to delivering top-notch software solutions. Offshore Software Development: Accelerate your software development and scale up quickly with our 12+ years of expertise in offshore development. Quality Assurance Services: Ensure the quality of your software products with our dedicated teams of experienced QA professionals. IT Staff Augmentation: Overcome skill gaps with our client-centric software team, offering staff augmentation services. Expert Software Services: Unlock our capabilities in custom software development, product development, and quality assurance. Mission and Vision: Our mission at TECUNIQUE is to be the catalyst for our clients' success in the dynamic domain of software development. Rooted in our core values of respect, authenticity, and responsibility, we strive to ease the software outsourcing experience, reducing both time and cost to market for our clients. We envision ourselves as the leading Indian software services company, renowned for our unwavering commitment to excellence and innovation. www.tecunique.com

TECUNIQUE: Success Stories: IT Service provider

mohitmore19

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Delhi Call girls

Dernier (20)

Define the academic and professional writing..pdf

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Exploring the Best Video Editing App.pdf

Diamond Application Development Crafting Solutions with Precision

HR Software Buyers Guide in 2024 - HRSoftware.com

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

AI & Machine Learning Presentation Template

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

How To Use Server-Side Rendering with Nuxt.js

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

5 Signs You Need a Fashion PLM Software.pdf

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

TECUNIQUE: Success Stories: IT Service provider

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

1. Exploiting distributional semantics for Content-Based and Context-Aware Recommendation PhD in Artificial Intelligence Victor Codina Advisor: Luigi Ceccaroni Universitat Politècnica de Catalunya June, 2014

2. Information and choice overload problem 2

3. Recommender Systems help users to find the right items through recommendations 3

4. Recommender Systems are a widely adopted technology in many domains 4

5. Recommender system’s components 5 Knowledge base Recommender Engine User Interface Item data User data

6. Main families of recommendation models 6 Collaborative Filtering (CF) Content-Based (CB) Filtering Context-aware Recommendation (CARS) Item metadata Ratings Context Target user Target item Predicted rating LIMITATION: Low accuracy in data-sparsity scenarios

7.  Exploitation of explicit semantic relationships  to mitigate the data-sparsity problem Existing solution: use the knowledge contained in domain ontologies 77 Semantically-Enhanced CB Filtering Semantically-Enhanced CARS Item ontology attribute similarities Context ontology condition similarities castle monastery Historic building is-a sunny cloudy Weather is-a

8.  Building and mantaining ontologies is expensive  Ontologies are bounded by fixed representations  They may not suit the data Limitations of domain ontologies 8 rating dataontology ≠ domain expert

9.  Similarities automatically derived from the data itself  Advantages:  Collecting rating data is cheaper than building ontologies  Not bounded by a fixed knowledge representations  Fine-grained semantic similarities can be identified Key idea: exploit distributional semantics derived from rating data 9 rating data semantic similarities

10.  Question 1: Is it possible to enhance content-based recommendation by exploiting the distributional semantics of item attributes?  Question 2: Is it possible to enhance contextual recommendation by exploiting the distributional semantics of contextual conditions? Research questions 10

11. Outline 11 Novel content-based approach (SCB) Novel context-aware approach (SPF) Distributional Semantics

12. Outline 12 Distributional Semantics Distributional hypothesis Semantic vector representation Distributional similarity measures Novel content-based approach (SCB) Novel context-aware approach (SPF)

13.  The meaning of a concept is captured by its usage Distributional Hypothesis: “concepts that share similar usages share similar meaning”  In Linguistics usages are regions of text: • document • paragraph • sentence Distributional hypothesis 13

14. Word s1 s2 s3 s4 s5 s6 s7 glass 2 1 0 1 0 2 0 wine 2 1 1 0 1 2 0 spoon 0 0 1 1 0 0 2 Semantic vector representation 14 frequency-based weight “sentence 1”

15.  Cosine similarity is the most popular measure  good accuracy in high-dimensional vector spaces  Advantage: it can be used in combination with dimensionality reduction techniques (SVD) Distributional similarity measure 15 Glass Wine Spoon

16. Outline 16 Novel content-based approach (SCB) Novel context-aware approach (SPF) Distributional Semantics

17. Outline 17 Novel content-based approach (SCB) Limitations of traditional item-to-user profile matching Semantic item-to-user profile matching Experimental evaluation Content-based recommendation User-dependent distributional semantics Novel context-aware approach (SPF) Distributional Semantics

18.  IDEA: “show me more of the same I’ve liked” Content-based Recommendation 18 user profile Profile Learner Profile Matching target user’s ratings item metadata target item profile predicted rating

19.  Lack of semantics exploitation  Syntactically different attribute pairs are not considered  Hypothesis: profile matching can be enhanced by exploiting similarities between attributes Traditional item-to-user profile matching 19 Item Profile User profile 0.2 1 0.5 0 1 0 0.7 0 1 0 a1 a2 a3 a4 a5 a1 a2 a3 a4 a5 score = 1 x 0.7

20.  Hypothesis: best-pairs is better for rating prediction and all-pairs is better for ranking prediction Semantic item-to-user profile matching 20 Item Profile 0.2 1 0.5 0 1 User Profile 0 0.7 0 1 0 a1 a2 a3 a4 a5 a1 a2 a3 a4 a5 All-pairs strategyBest-pairs strategy 0.2 1 0.5 0 1 0 0.7 0 1 0 a1 a2 a3 a4 a5 a1 a2 a3 a4 a5

21.  Assumption: two attributes are similar if several users are interested in them similarly Attribute User1 User2 User3 User4 User5 User6 User7 action 1 -0.7 0 0.9 0.1 -1 0 Bruce Willis 0.7 -0.8 0.5 0.8 0.4 -0.2 0 comedy -0.5 0.7 0.2 -1 0.9 0.8 0.5 Distributional semantics of item’s attributes derived from rating data 21 User6’s degree of interest in action movies (“-1” = strong dislike, “1” = strong like)

22.  Rating data set statistics before and after pruning: Evaluation using MovieLens data set 22 Original Pruned Users 2.113 2.113 Movies 10.197 1.646 Attributes 6 4 Attributes values 13.367 3.105 Ratings per user 404 235 Sparsity 96% 86%

23. Best-pairs Vs. All-pairs 23 % = Improvement with respect to the traditional CB profile matching Best-pairs All-pairs (the higher, the better) Rating prediction Ranking prediction

24. Distributional Vs. Ontology semantics 24 Ranking prediction % = Improvement with respect to the traditional CB profile matching (the higher, the better)

25. SCB Vs. State of the art 25 SCB (proposed method) SVD++ BPR-MF Rating prediction Ranking prediction % = Improvement with respect to the traditional CB profile matching

26. Outline 26 Novel content-based approach (SCB) Novel context-aware approach (SPF) Distributional Semantics

27. Outline 27 Novel context-aware approach (SPF) Limitations of traditional contextual pre-filtering Semantic pre-filtering approach Experimental evaluation Context-aware recommendation Rating-based distributional semantics of conditions Novel content-based approach (SCB) Distributional Semantics

28. Context matters 28  Assumption: user’s experience depend on context

29. 29 Context-aware recommendation  Context as additional dimension for estimation  Three main context-aware recommender families target context predicted ratingPrediction model in-context ratings target Item target user Pre-filtering Post-filteringContextual modeling

30.  Main limitation: its lack of flexibility  Only ratings acquired in exactly the same context are used  Hypothesis: ratings filtering can be enhanced by exploiting semantic similarities between contexts Traditional contextual pre-filtering 30 local ratings in-context ratings Ratings filtering Prediction model target context predicted rating

31.  Key idea: reuse ratings acquired in similar contexts Semantic contextual pre-filtering 31 local ratings Ratings filtering Prediction model ≈ ≠semantic similarities in-context ratings target context global threshold predicted rating

32. Distributional semantics of contextual conditions derived from rating data 32  Assumption: two contexts are similar if their composing conditions influence ratings similarly Condition User1 User2 User3 User4 User5 User6 User7 1 -0.7 0 0.9 0.1 -0.6 0 0.7 -0.8 0.5 0.8 0.4 -0.2 0 -0.5 0.7 0.2 -1 0.9 0.8 0.5 Influence of family context in User6’s ratings (“<0” = negative, “0” = neutral, “>0” = positive)

33.  Six in-context rating data sets on diverse domains: Evaluation data sets UMAP – June 2013, Rome, Italy 33 Datasets Ratings Conditions Context granularity Music 4013 26 1 Tourism 1358 57 3 Adom 1464 14 3 Comoda 2296 49 12 Movie 2190 29 2 Library 609K 149 4

34. Semantic Vs. traditional pre-filtering 34 % = MAE reduction with respect to a context-free MF model (the higher, the better) Semantic Traditional

35. SPF Vs. State of the art 35 % = MAE reduction with respect to the context-free MF model (the higher, the better) SPF (proposed method) UI-Splitting CAMF

36. Main contributions 36 Novel content-based approach (SCB) Novel context-aware approach (SPF) Distributional Semantics

37.  Method for computing the distributional semantics of item’s attributes  Two strategies for exploiting the semantic similarities during profile matching Main contributions (II) 37 Semantic Content-Based filtering (SCB)

38.  Better accuracy than state of the art in new user scenarios Main contributions (III) 38 Semantic Content-Based filtering (SCB) SCB (proposed method) 38 Ranking predictionRating prediction

39.  Method for computing the distributional semantics of contextual conditions  Novel semantic pre-filtering method that reuses ratings in semantically similar contexts Main contributions (IV) 39 Semantic Contextual Pre-filtering (SPF)

40.  Better accuracy than state of the art Main contributions (V) 40 Semantic Contextual Pre-filtering (SPF) SPF

41. Question 1? YES. It is possible to enhance content-based recommendation by exploiting the distributional semantics of item’s attributes Question 2? YES. It is possible to enhance context-aware recommendation by exploiting the distributional semantics of contextual conditions Conclusions 41

42.  Conference papers:  CCIA 2010: Codina, V. & Ceccaroni, L. Taking advantage of semantics…  DCAI 2010: Codina, V., & Ceccaroni, L. A Recommendation System for the…  CCIA 2011: Codina, V., & Ceccaroni, L. Extending Recommendation Systems with…  CCIA 2012: Codina, V., & Ceccaroni, L. Semantically-Enhanced Recommenders  CARR 2013: Codina et al. Semantically-enhanced pre-filtering for…  UMAP 2013: Codina et al. Exploiting the Semantic Similarity of Contextual…  RecSys 2013: Codina et al. Local Context Modeling with Semantic Pre-filtering  Journal paper:  UMUAI (User Modeling and User-Adapted Interaction journal): Codina et al. Distributional Semantic Pre-filtering in Context-Aware Recommender Systems. 2012 Impact Factor: 1.600 (current status: accepted) Publications related to the thesis 42

Notes de l'éditeur

Today I’m going to present the main contributions of my research in the field of the RSs This work has been carried out in the UPC, with the support of the KEMLG research group and it has been supervised by the Dr. LUIGI Ceccaroni
We are living in an era of information and choice overload having access to an overwhelming number of alternatives for almost every type of product or service we are interested in. Although having such a variety of options is usually seen as something beneficial, it also has the negative effect that makes harder the decision-making process, leading us to make poor decisions when we don’t have the necessary knowledge.
A natural way for solving this information overload problem is to rely on the recommendations of other people, and this simple observation was what motivated the development of RSs. Therefore, the goal of RSs is to help users to find the right items for them through recommendations adapted to their preferences. Here you can see an example of personalized movie recommendations provided by the popular movie rental service Netflix
Nowadays, the success of many popular sites in a large variety of domains strongly depend on the RSs. The Amazon, Ebay, Netflix, Spotify, Yahoo News, LinkedIn are some popular examples They use RS to add value to their information services, improving the user’s experience and as a consequence their business.
Recommender systems are composed of three main components: the knowledge base, where it is stored information about the items to recommend and historical user data, that is, previous user-item interactions that show what users liked or disliked in the past; the recommendation engine, where one or several recommendation models exploiting the knowledge base are used to make recommendations; and finally the user interface component which is responsible for presenting the recommendations in an proper way and also to collect new feedback about the recommended items. my thesis has focused on improving the accuracy of existing recommendation models.
The recommendation task is commonly formulated as a rating prediction problem, that is, the problem of estimating how much a target user will like or dislike a certain candidate item. Depending on the type of information exploited, recommendation models are commonly classified into three main families: CF approaches, which make predictions to a user based on the ratings of others, so they only require rating data; CB approaches, whose predictions are based on the metadata of the items the target user rated in the past and the candidate ones, and finally, the context-aware approaches, which in addition to the ratings also incorporate contextual information into their processes. A common limitation that share the three recommendation approaches is that they perform poorly (in terms of accuracy) in data-sparsity scenarios, and although it is a well-known limitation in the research community, still is an open and relevant issue.
A reason of this low accuracy of CB and CA approaches in data-sparsity scenarios is that their models lack semantic intelligence. Therefore, several works have address this limitation by exploiting the explicit semantics relationships about items content and contextual information available in domain ontologies. In CB approaches, these explicit similarities between item attributes is commonly used to infer new user’s interests. For example, a CB recommendation model exploiting this item ontology could infer that users that like castles also are interested in monstareies and viceversa, because these two concepts are hierarchically related In CA approaches, the hierarchical relatiionships between contextual conditions are commonly used to make generalizations of the context when it is too fine grained to make meaningful contextual recommendation
However, using ontologies as a knowledge source has its limitations. On the one hand, the process of building and maintining expressive ontologies is expensive. This limits its use in many domains and the number of publicly available domain-specific ontologies is limited. Most of them consists of general taxonomies that are limited in terms of expressivenss and richness. In addition to this, another major limitation of ontologies is the fact that they are predefined specifications of a domain based on the criteria of human experts. For this reason, it may happen that the ontology does’nt fit the data which is actually used for making recommendations, a therefore exploiting this knowledge is not really useful for improving the prediction accuracy.
In order to overcome this limitation of ontogy-based semantics, in this thesis, I have investigated the use of distributional semantics derived from rating data to improve recommendations. Differently from similarities derived from ontologies, distributinal semantic similarities are automatically derived from the data itself, and consequently this semantics source does’nt suffer from the previously mentioned limitations of ontologies. On the one hand, user data is cheaper and easier to obtain than ontologies. They are not bounded to static knowledge represntations Finallly, distributional semantic similarity measures can capture finer-grained similarities which might be only detected from the data
My research then has focused on investigating how distributional semantics derived from rating data can be exploited in existing CB and CA recommendation models in order to improve their accuracy. These are the two research questions of this thesis: To answer these questions I have implemented and empirically evaluated two recommendation models: (1) a novel content-based approach enhanced with distributional semantics of item’s attributes, and (2) a novel context-aware approach enhanced by using the distributional semantics of contextual
This is the outline I will follow during the First, before presenting each of the two proposed approaches I’m going to introduce to you the concept of distributional semantics and its mathematical foundations, which come from Computional Linguistics In the second part, I’ll present the novel content-based approach enhanced with distributional semantics and its evaluation results, State of the art will be presented in each of the sections And in the last part, I’ll talk about the novel contextual pre-filtering approach and also its performance results
In Distributional Semantics the meaning of a concept is captured by the usage or distributional properties of the concept, which are automatically derived from the corpus of data where the concept is used. The fundamental idea behind this way to extract semantic similarities between domain concepts is the so-called distributional hypothesis: which claims that concepts repeatedly co-occurring in the same context or usage tend to be related. Distributional semantics have been mainly studied in Linguistics, where usages or contexts are defined by specific regions of text that can have different granularities: for instance, the whole document, a paragraph or a sentence.
A common representation method to measure distributional similarities between words consists of employing a vector space representation of concept meaning, and then measure the similarity in terms of proximity in such vector space. This matrix shows an example of such a representation where rows represent the semantic vectors of these words, and each of the elements (the columns) indicate if the concept was used or not in the linguistic context (that in this example are supposed to be defined as text sentences). Commonly these values are calculated by means of a weighting scheme of the occurrence frequency of the concept in the specific region of text. In this example, we can see that the concept wine has a better overlap with glass than with spoon because they co-occurr more frequently.
Once computed the semantic vectors or co-occurrence matrix, then we are ready to calculate semantic similarities between words. To do so we need to employ a specific similarity measure. In the literature there are several types of similarity measures that can be used for this purpose, such as set theory measures and probabilistic measures. However, for computing similarities in the vector space, the cosine similarity is one of the most commonly used because of its proved reliability, specially when dealing with high-dimensional vector spaces. Here I’m showing in a 2D space the main idea of the cosine similarity, which is calculated as the cosine of the angle between the vectors. The smaller is the angle, the more similar are the semantic vectors. Therefore in this case the cosine similarity between glass and wine is larger than the one between wine and spoon. Additionally, it has the advantage that can be used in combination with dimensionality reduction techniques like SVD. These techniques are useful when the dimensionality of the semantic vectors is too high and sparse, because they can produce a more compact and informative semantic representation. This usually improves the accuracy of the similarity assessments.
This is the outline I will follow during the First, before presenting each of the two proposed approaches I’m going to introduce to you the concept of distributional semantics and its mathematical foundations, which come from Computional Linguistics In the second part, I’ll present the novel content-based approach enhanced with distributional semantics and its evaluation results, State of the art will be presented in each of the sections And in the last part, I’ll talk about the novel contextual pre-filtering approach and also its performance results
The main assumption of CB recommendation approaches is that users tend to like items with similar attributes to those he or she already liked in the past. As illustrated in this graphic, CB models first build a model of user’s interests in the same attribute space as items, and then use this user profile to recommend new items whose attributes match the user’s interests. In domain where explicit ratings are available, it is common the use of linear CB models, in which user profiles are represented as weighted vectors, each value indicating a quantification of the degree of interest in a certain item attribute based on the ratings given to the items containing the attribute, and then the predictions are computed by directly comparing the user and item vector representations.
Commonly the item and user profile matching is computed by means of the dot product or the cosine similarity, which are methods that only rely on the “syntactic” evidence of attribute relatedness. That is, syntactically different attributes do not contribute to the similarity value. Present example (0 means that the attribute do not appears in the profile) For example, using the dot product to compute the matching score between this user profile and item profile, only the weights of the attribute 2 would be aggregated. Therefore, they have a lack of semantics intelligence in this sense, which limit the accuracy of the prediction, especially if the user profiles are based on few ratings and consequently there is little knowledge about user’s interest My hypothesis was that traditional profile matching could be enhanced by exploiting the distributional similarities of the syntactically different item’s attributes in addition to the exact coincidences.
In particular, I proposed two profile matching strategies based on pairwise comparison that exploit the distributional semantic similarities between item’s attributes: a best-pairs and an all-pairs strategy. The best-pairs strategy, aggregates in addition to the exact attribute matchings, the best-matching attribute pairs, so each attribute in the item profile with value different from 0 is compared with only 1 attribute in the user profile different from zero. In this example… The alll-pairs strategy, as its name indicates agregates all the possible attribute pairs combinations appearing in both profiles. So in the same user and item profile comparison, the number of aggregated values is doubled. In both strategies the aggregated attribute pairs are weighted according to their semantic similarity value, so that the weaker similarities contribute less to the predicted score. I experimented with these two strategies because my hypothesis was that they might perform differently depending on the recommendation task. In particular the all-pairs strategy is supposed to perform better in ranking prediction, where what matters most is the order of the recommended items and not how similar the predicted and true ratings are. And in contrast, given that the best-pairs is more selective it should be more adequate for rating prediction, where the exact predicted score is relevant.
So far I have explained the methods for exploiting semantic similarities during the item and user profile matching. Now i’m going to talk about how we calculated these similarites based on the distributional semantics of item’ attributes derived from rating data. The main assumption of the proposed method for computin such distributional similarities is that two attributes are semantically related if several users are interested in them in a similar way Based on this assumption, to measure user-dependent distributional similarities, first we need to compute the user-dependant semantic vector where each element stores a user interest weight. That is the attribute’s semantic vectors are built with respect to the attribute-based user profiles generated by the CB profile learner. In this example I show the semantic vectors of three movie attributes with respect to six users of the system. If we analyze the number of co-occurrences between pairs of attributes, it is easy to observe that between &lt;Bruce Willis, action&gt; pair that several users tend to be interested in them similarly, and in contrast, there is only 1 case between Bruce willis and comdey. Finally, based on this semantic representation, we calculate the distributional similarity between two attributes by comparing their semantic vectors. We experimented with several measures but as expected, the Cosine similarity was the one performing better in general.
For the evaluation of SCB we used an extension of the popular movie rating data set collected by the MovieLens recommender, which contains over 10 million ratings from 2K users on 10K movies. We used this data set because it included a large variety of movie attributes such as genres, directors, actors, countries of origin, filming locations and user tags; some of them extracted from IMDb. In order to avoid the introduction of non-informative movie metadata into the CB models which could degrade predictions we discarded some of them, especially the least popular actors and user tags. We also removed all the movies with less than five tags as well as the ratings associated to them.
Here I illustrate the % improvement achieved when using the proposed pairwise strategies exploiting the user-based distributional similarities compared to the traditional profile matching strategy. Optional. The baseline and the enhanced CB approaches employed the same user profile learning method. In these experiments we employed a sophisticated user-profile learning method based on the rating average. MAE and RMSE are well-know metrics for measuring how accurate are the models predicting unknown ratings, and Recall and NDCG are metrics that measure the accuracy of the models making personalized rankings. All means that the results are averaged over all the users, and New averaged over the set of new users. In our experiments we considered as new users the 10% of users with the lowest number of ratings. We can see that in the New user scenario is where both variants has a significantly different performance and particularly effecttive. On the one hand, best-pairs strategy is better than the all-pairs one for rating prediction, and in contrast, the all-pairs clearly outperforms the best-pairs in terms of ranking precision. These results prove the hypothesis that the all-pairs strategy is more effective for ranking, given that for this task what matters most is the order of the items and not the closeness between the predicted and the true rating. And that the best-pais, which is more selective matching strategy is better for rating prediction.
Here I compare the ranking accuracy of the all-pairs strategy when exploiting different sources of semantics similarities: the blue bars correspond to the user-based distributional semantics; the yellow bars the distributional semantics derived from item-based co-occurrences; that is, in the item-based representation rating data is not considered only the item metada, and the red bars are similarities derived from an ontology. The ontology-based semantics were derived from the hierarchical relationships defined in the Amazon.com movie taxonomy. As it can be observed, using distributional semantics the overall accuracy is better than when using ontology-based semantics, being the user-based slighty better than item-based. In the new user scenario results are quite different. In this case, the item-based ones are clearly the less effective, and the user-based and ontology-based have similar accuracy results. Considering the accuracy in both sets of users, the results validate the hypothesis that user-based semantics, derived from rating data, can be more effective to improve prediction accuracy than the other types.
I this other slide I show the improvement achieved by using the proposed novel CB method (the orange bar) compared to two state-of-the-art CF approaches based on Matrix Factorization which is a popular CF method. the yellow bar correspond to SVD++, a MF model which was part of the winning solution in the Netflix prize and therefore is especially effective for rating prediction, and the red bars correspond to BPR-MF, another MF model which is designed for recommending rankings, and therefore it is not able to make rating predictions. Possibly you have noted that the gain achieved for rating prediction is much smalller than the gain achieved for ranking. This is because rating prediction the space for improvement is more limitaed as was demonstrated during the Netflix challenge, where they offered a 1M dollar prize for reducing by 10% the RMSE of their approach and 3 years of research were needed to achieve it. If we look at the overall results, the all columns we can see that the CF approaches are clearly better: SVD is the best model for rating prediction and BPR-MF for ranking. However, for new users the new CB method outperforms the best CF approach for each recommendation task. Differences are especially significant in terms MAE and NDCG. This proves that our method is an effective method for improving CB recommendation in general, and for improving state-of-the-art CF methods in data sparsity scenarios as the new user. We can see that based on all the users, the CF approaches are the most accurate
This is the outline I will follow during the First, before presenting each of the two proposed approaches I’m going to introduce to you the concept of distributional semantics and its mathematical foundations, which come from Computional Linguistics In the second part, I’ll present the novel content-based approach enhanced with distributional semantics and its evaluation results, State of the art will be presented in each of the sections And in the last part, I’ll talk about the novel contextual pre-filtering approach and also its performance results
The main assumption of CARS is that items can be experienced differently by the users depending on the current contextual situation, and as a result, user evaluations or ratings can also be different. A clear example where context matters is in the tourism domain, where the same recommendations to the same users can be considered as good or bad depending on the weather conditions.
For this reason, context-aware recommendation approaches incorporate contextual information into their processes. Typically, CARS extend existing CB and CF techniques with context-awareness, and depending on how they incorporate context into the recommendation process, three main familiies of context-aware approach can be identified: pre-filtering, post-filtering and contextual modeling. Pre-filtering approaches exploit contextual information to discard the user’s ratings that are not relevant in the context in which the user is asking for a recommendation. Then, a context-free CB or CF approach is used to make recommendations based on the subset of relevant ratings. On the contrary, post-filtering approaches use contextual information once recommendations are made by a context-free model to adjust them. For instance, by applying some kind of rescoring. Finally, contextual modeling approaches incorporate context into the recommendation model, representing user’s interests and other model parameters as a function of context. Because context-aware approaches require a large number of ratings of users for items in several context, they are more affected by the data-sparsity problem than the context-free ones. Contextual prefiltering is the approach that tipically suffers more from this limitation and for this reason my research has focus on this paradigm.
The traditional contextual pre-filtering, is known as the reduction-based approach, because for each target contextual situation it builds a strict local model, where only the ratings acquired in exactly the same situation to the target one are used for recommending. The main limitation of this approach is its lack of flexibility because it uses always the maximum level of contextualization, and therefore it fails when the target situations are too specific and not relevant, or when there are not enough ratings in that situation for generating a robust local prediction model With this example I’m showing how the traditional contextual pre-filtering works. Each of these circles represent the set of training ratings tagged with 3 syntactically different situations, s1, s2 and s3. Assuming that the target context is s3, then the method would discard all the ratings acquired in S2 and s1. Finally it builds a local prediction model based on the selected ratings. My hypothesis is that is possible to overcome this lack of flexibility exploiting the semantic similarities between contexts during the rating pre-filtering process.
To validate this hypothesis, we proposed a novel pre-filtering approach that, in addition to the ratings acquired in the exactly the same context, it also reuses ratings acquired in contexts semantically similar to the target one. Following with the same example, let’s assume now that the system knows that the target context sunny is semantically related to when users travel in family but not when users are sad. In this case the semantic pre-filtering would also reuse the ratings acquired in the famility context for building the local prediction model to make prediction in the target context sunny. Our approach employs a global similarity threshold to select those situations that are similar enough to be considered as reusable: the larger the threshold, the sharper the contextualization, that is the more similar are the local models to the strict models generated by the traditional approach
So far I have assumed the existence of semantic similarities between contexts, and now I’m going to explain how we compute these similarities with respect to the rating data. In particular our method computes distributional semantic similarities between contextual situations based on the assumption that two situations are similar if their composing conditions influence users’ ratings in a similar way. For this reason, in this case the semantic vectors of contextual conditions contains estimates of its influence on the given ratings. We estimated this influence as the average deviation between the observed ratings when the condition holds, and a context-free rating estimated by using a baseline predictor. In this case, the average deviation be calculated either from the item perspective or the user perspective, and the depending on the rating data is more appropiate than the other. here I’m showing an example using the user-based persepective. In this case this -1 is indicating that the family condition influences negatively the ratings of the user 6. Once computed the semantic vectors of the conditions, we calculate their similarities by comparing the vectors using the cosine similarity. The more similar is the influence, the more similar are the conditions. In this example, we can see that famility and sunny are more similar than sunny and sad given that there are more users where the influence is similar and in constrast between sunny and sad only 1. In the case that the situation is defined by several conditions, we compute first the semantic vector of the situation by averaging the vectors of its composing conditions, and then we compute the cosine similarity
For the evaluation we have considered 6 data sets of contextually tagged ratings on diverse domains and with different characteristics. Here I’m showing some of them: conditions refer to total number of conditions captured by the system, and context granularity is calculated as the average number of conditions per contextual situation. The larger number, the more specific are the contexts. The Music data set contains ratings for music tracks collected by an in-car music recommender. The Tourism data set contains ratings for POIs in the region of South Tyrol. Adom, Comoda and Movie are all movie rating data sets and Library is about book ratings. As you can see, Library is the biggest data set with more than 600k ratings and Comoda is the one with fine grained contextual situations.
Here I’m showing the MAE reduction with respect to a context-free MF when using the proposed semantic pre-filtering (the orage bars) and the traditional one. (the yellow ones). The larger is the percentage, the better is the rating prediction accuracy. As you can see, in all the data sets the semantic prefiltering is clearly superior to the traditional one, proving the effectiveniss of our method to exploit distributional semantic similarities between contextual situations during prefiltering to improve accuracy. The traditional prefiltering is even worse than the context-free model in some data sets. This poor performance in Tourism, Music and Comoda is due to the lack of flexibility of this approach, which allways builds a strict local model, and in some cases they are too specific that there is not enough training data to build robust local models and therefore their accuracy is worse than the global context-free model.
Here I’m showing the results of the proposed semantic pre-filtering (the orange bars) compared to two state-of-the-art context-aware approaches. The blue bars correspond to another pre-filtering approach that differently from the reduction-based approach that build local models for each target context, it modifies the original rating matrix by splitting the rating vectors associated to the users and items into virtual vectors based on the contextual condition that influences the most the rating. And then builds a global model based on new rating matrix. The red bars corresponds to CAMF, a contextual modeling approach that extends the standard MF model with additional parameters that model the influence of context with respect to the items or the users. So in this case the context is modeled as part of the MF model. An advantage of pre-filtering approaches is that they can use any context-free recommendation technique to model the local models. However, to properly compare the performance of the pre-filtering approaches to CAMF, we use them in combination with the standard context-free MF. In other words, our method uses MF to build the local models, and the global model of the splitting approach. As it can be observed, the three context-aware prediction models significantly outperform MF in all the data sets, confirming that contextual information is relevant for improving the rating predictions. On the other hand, the new method is the most effective exploiting the context since it outperforms the other approaches in all the data sets, and the differences are specially large in the Tourism, Adom, Comoda and Movie data sets.
Building block: distributional semantics Key idea: Content-based and context-aware recommendation can be enhanced by exploiting distributional semantics derived from rating data
User-based distributional semantics of attributes Based on how users are interested in them More effective than item and ontology-based 50% gain in ranking accuracy 7% gain in rating prediction
User-based distributional semantics of attributes Based on how users are interested in them More effective than item and ontology-based 50% gain in ranking accuracy 7% gain in rating prediction
Based on how conditions influence the users’ ratings
Based on how conditions influence the users’ ratings
Question 1: Is it possible to enhance CB recommendation by exploiting distributional semantic similarities between item attributes? semantic similarities between attributes are useful to enhance the profile matching Question 2: Is it possible to enhance contextual recommendation by exploiting distributional semantic similarities between contextual conditions?
Many results reported in this thesis have already been presented in several international conferences, some of the them of significant impact in the field of Recommender Systems, such as UMAP and Recsys conferences. Additionaly the main results of this thesis have also been presented in a highly ranked journal related to the field.

PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (12)

Similaire à PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

Similaire à PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation (20)

Dernier

Dernier (20)

PhD defense - Exploiting distributional semantics for content-based and context-aware recommendation

Notes de l'éditeur