SlideShare une entreprise Scribd logo
1  sur  38
TEXT
MINING
Team 4
Syed Aqib Ali
Syeda Ramsha Habib Gilani
Lateefah Omoyosola Yusuf
Rochelle Star Velasquez
TABLE OF CONTENT
1. What is Text Mining?
2. Introduction
3. Main Models Used
4. Key Contributions
5. Marketing and Non-marketing Applications
6. Limitations
7. Avenues for future research
8. Key Takeaways
WHAT IS TEXT MINING?
WHAT IS TEXT MINING?
Text mining is a process of deriving/extracting high
quality meaningful information and patterns.
Text analysis involves information retrieval, analysis
to study word frequency distributions, pattern
recognition, information extraction, data mining
techniques including link and association analysis,
visualization, and predictive analytics.
INTRODUCTION
INTRODUCTION
● A research study applying Text Mining and
Machine Learning tools.
● The authors find that loan applicants' choice
of words reveals insights into their intentions,
circumstances, and personality.
● This information is powerful in predicting
loan repayment, going beyond typical
financial and demographic factors.
Setting and Data
1. Potential borrowers submit their request for a loan for a specific
amount with a specific maximum interest rate (they are willing to pay).
2. The loan amount they wish to borrow must in (between $1,000 and
$25,000 in the data).
3. Prosper verifies all financial information, including the potential
borrower’s credit score.
Textual, Financial, and Demographic Variables
1. Textual variables:
a. The number of characters in the title and the text box.
b. The percentage of words with six or more letters.
c. SMOG: This measures writing quality by mapping it to number of years of formal
education needed to easily understand the text in first reading.
d. Count of spelling mistakes.
e. Bigrams : Two-word combinations (help to understand the context and the pattern).
2. Financial variable:
a. Loan amount, borrower’s credit grade, Debt to income ratio.
3. Demographic variables:
a. Gender, age, location, race.
PROCESS OF
TEXT MINING
The authors used something called "Term
frequency-inverse document frequency" or tf-
idf to compare how often a word is used in a
loan request to how often it's used in all the
loan requests and how long the request is.
Process 04
Process 01
tm package in r was used to select
distinct words in each loan application.
Process 02
- Porter’s stemming algorithm to collapse
variations of words into one e.g., “borrower,”
“borrowed,” “borrowing,” and “borrowers”
become “borrow” (3.5M words → 30,920 unique
words and 1052 bigrams.
PyEnchant 1.6.6 package in Python was
used to count spelling mistakes in the
loan applications. This allows them to
identify words that are misspelled and
potentially serve as a proxy for
characteristics correlated with lower
income.
Process 03
4
MAIN MODELS USED
MODEL 1 - Predictive model
Aim:
To evaluate whether the text used by borrowers in their loan application predicts
their loan default.
Machine Learning Methods:
Ensemble stacking approach
1. Train each model on the calibration data (2 logistics regression and 3 tree-
based methods).
2. Build a weighting model to combine the models calibrated in the first model.
Result
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research,
56(6), 960-980.
Result
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
MODEL 2 - Words and writing styles of default loan request
Aim:
Learn which words, writing styles, and general ideas conveyed by the text are more
likely to be associated with default loan request.
Machine Learning Methods:
1)Machine learning tools
Naive Bayes
L1 regularization binary logistic model
Word Count Dictionary (LIWC)
2) Standard Econometrics tools
Topic’s Logistic regression extracted from
a latent Dirichlet allocation (LDA) analysis
and the sub-dictionaries of the Linguistic
Inquiry.
Result
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
MODEL 3 - Potential Borrower’s Personality
Aim:
Further exploration of potential traits and states of borrowers.
Machine Learning Methods:
Applying LIWC library.
Results:
Defaulting loan requests are written in a manner consistent
with the writing styles of extroverts and liars.
KEY CONTRIBUTIONS
Analyzing applications
Borrower 1: “I am a hard working person, married for 25 years, and have
two wonderful boys. Please let me explain why I need help. I would use
the $2,000 loan to fix our roof. Thank you, God bless you, and I promise to
pay you back.”
Borrower 2: “While the past year in our new place has been more than
great, the roof is now leaking and I need to borrow $2,000 to cover the
cost of the repair. I pay all bills (e.g., car loans, cable, utilities) on time.”
Which borrower is more likely to default?
KEY CONTRIBUTIONS
Textual information
on the loan
significantly helps
predict loan default.
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
KEY CONTRIBUTIONS
Words indicative of
loan repayment.
Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of
Marketing Research, 56(6), 960-980.
KEY CONTRIBUTIONS
Loan default requests mimic the
writing styles of extroverts and liars.
KEY CONTRIBUTIONS
Evidence of people with different
educational backgrounds and
economic situations use words
differently.
KEY CONTRIBUTIONS
Evidence of supplementing
traditional measures and replacing
some aspects of it.
KEY CONTRIBUTIONS
Help lenders avoid defaulting borrowers
and help borrowers better express
themselves when requesting a loan.
MARKETING AND
NON-MARKETING
APPLICATIONS
MARKETING APPLICATIONS
• Sentiment analysis
• Brand monitoring
• Customer feedback analysis
• Churn prediction
• Predictive analysis
• Market research
• Personalized marketing
• Social media analytics
NON-MARKETING APPLICATIONS
• Psychological profiling
• Fraud detection
• Credit risk assessment
• Customer service
LIMITATIONS
LIMITATIONS
1. Text data may not be available for all loan
applications, as some borrowers may not
provide any text or may provide incomplete
or inaccurate information.
2. Text data may be subject to
interpretation and bias, as different lenders
may interpret the same text differently
based on their own biases and assumptions.
3. The use of text data to predict loan
default raises ethical and legal concerns
FURTHER RESEARCH
FURTHER RESEARCH
● The predictive ability of text analysis
regarding future behavior extended
to other behaviors and industries.
● Extension of results to other types of
communication, e.g., phone calls
and online chats.
● How word usage can change
overtime.
FURTHER RESEARCH
● Exploring the role of emotions and
mental states in financial behaviors.
● Investigate the impact of different
writing styles on loan default.
● Application of the findings to other
loan types and platforms.
● Develop more accurate and
efficient text-mining and machine
learning tools for analyzing loan
applications.
KEY TAKEAWAYS
KEY TAKEAWAYS
● Text mining and machine learning tools can be
employed to predict psychographics, including
the likelihood of future loan defaults.
KEY TAKEAWAYS
● The LIWC dictionaries associated with
extroversion and deception are significantly
correlated with default.
KEY TAKEAWAYS
● There may be variables that are affected by
both the observable text and unobservable
personality traits.
Thank you
for your
attention!

Contenu connexe

Similaire à Text Mining - Advanced Customer Analytics

Effect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private BanksEffect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private Banks
ijtsrd
 
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANALRunning Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
MalikPinckney86
 
Propose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxPropose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docx
briancrawford30935
 
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docxMODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
raju957290
 
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
ijtsrd
 

Similaire à Text Mining - Advanced Customer Analytics (20)

Adithya Resume
Adithya ResumeAdithya Resume
Adithya Resume
 
NEIL MANOJ C (2247224) (PPT).pptx
NEIL MANOJ C (2247224) (PPT).pptxNEIL MANOJ C (2247224) (PPT).pptx
NEIL MANOJ C (2247224) (PPT).pptx
 
3-Project_FIN_955PROJECT_LAST VERSION (1)
3-Project_FIN_955PROJECT_LAST VERSION (1)3-Project_FIN_955PROJECT_LAST VERSION (1)
3-Project_FIN_955PROJECT_LAST VERSION (1)
 
03_AJMS_298_21.pdf
03_AJMS_298_21.pdf03_AJMS_298_21.pdf
03_AJMS_298_21.pdf
 
MTBiz August-September 2016
MTBiz August-September 2016MTBiz August-September 2016
MTBiz August-September 2016
 
Effect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private BanksEffect of Customer Relationship Management in Public and Private Banks
Effect of Customer Relationship Management in Public and Private Banks
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood PredictionApplying Convolutional-GRU for Term Deposit Likelihood Prediction
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
 
A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.
A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.
A STUDY ON ISLAMIC CREDIT CARDS HOLDERS.
 
Financial Text Analysis
Financial Text AnalysisFinancial Text Analysis
Financial Text Analysis
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANALRunning Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
Running Head CONSUMER BEHAVIOR ANALYSISCONSUMER BEHAVIOR ANAL
 
DB_Assgn 3
DB_Assgn 3DB_Assgn 3
DB_Assgn 3
 
Propose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docxPropose a Human Resource Management strategy and specific organiza.docx
Propose a Human Resource Management strategy and specific organiza.docx
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for Microcredit
 
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docxMODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
MODULE 1 COURSE PROJECT1MODULE 1 COURSE PROJECT2.docx
 
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
A Study on Consumer Preference towards Four Wheeler Loans with Reference to C...
 
B510519.pdf
B510519.pdfB510519.pdf
B510519.pdf
 
Consumers Buying Behaviors’ Loans and Credits: A Situationer
Consumers Buying Behaviors’ Loans and Credits: A SituationerConsumers Buying Behaviors’ Loans and Credits: A Situationer
Consumers Buying Behaviors’ Loans and Credits: A Situationer
 

Plus de Aqib Syed

Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Aqib Syed
 

Plus de Aqib Syed (20)

KNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdf
KNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdfKNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdf
KNOWLEDGE BASED ENTREPRENEURSHIP - ALT Business Plan59cc9dee8.pdf
 
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
Challenged-Based Learning Project on IVAR IKS (Digitalisation and sustainabil...
 
E Scooters in Scandinavia and Sustainability
E Scooters in Scandinavia and SustainabilityE Scooters in Scandinavia and Sustainability
E Scooters in Scandinavia and Sustainability
 
The Great Leader Muhammad Ali Jinnah
The Great Leader Muhammad Ali JinnahThe Great Leader Muhammad Ali Jinnah
The Great Leader Muhammad Ali Jinnah
 
Sir Syed Ahmed Khan Bahadur -History of Pakistan
Sir Syed Ahmed Khan Bahadur -History of PakistanSir Syed Ahmed Khan Bahadur -History of Pakistan
Sir Syed Ahmed Khan Bahadur -History of Pakistan
 
Pakistan Resolution 1940 -History of Pakistan
Pakistan Resolution 1940 -History of PakistanPakistan Resolution 1940 -History of Pakistan
Pakistan Resolution 1940 -History of Pakistan
 
Rise of Mughal Empire (1625-1707)- History of SubContinent
Rise of Mughal Empire (1625-1707)-  History of SubContinentRise of Mughal Empire (1625-1707)-  History of SubContinent
Rise of Mughal Empire (1625-1707)- History of SubContinent
 
Decline of Mughals (1707-1857) -History of SubContinent
Decline of Mughals (1707-1857) -History of SubContinentDecline of Mughals (1707-1857) -History of SubContinent
Decline of Mughals (1707-1857) -History of SubContinent
 
Allama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinent
Allama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinentAllama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinent
Allama Muhammad Iqbal as a Dreamer of Pakistan- History of SubContinent
 
East Pakistan Separation- History of SubContinent
East Pakistan  Separation- History of SubContinentEast Pakistan  Separation- History of SubContinent
East Pakistan Separation- History of SubContinent
 
General Muhammad Zia Ul Haq - Dictatorship in Pakistan
General Muhammad Zia Ul Haq - Dictatorship in PakistanGeneral Muhammad Zia Ul Haq - Dictatorship in Pakistan
General Muhammad Zia Ul Haq - Dictatorship in Pakistan
 
Zulfiqar Ali Bhutto- A Politician
Zulfiqar Ali Bhutto- A Politician Zulfiqar Ali Bhutto- A Politician
Zulfiqar Ali Bhutto- A Politician
 
Ashoka- The Great _History of Subcontinent
Ashoka- The Great _History of SubcontinentAshoka- The Great _History of Subcontinent
Ashoka- The Great _History of Subcontinent
 
Perception and Marketing- Consumer Behavior
Perception and Marketing- Consumer BehaviorPerception and Marketing- Consumer Behavior
Perception and Marketing- Consumer Behavior
 
Learning, Memory and Retrieval
Learning, Memory and RetrievalLearning, Memory and Retrieval
Learning, Memory and Retrieval
 
Exposure, Attention and Interpretation -Consumer Behavior
Exposure, Attention and Interpretation -Consumer BehaviorExposure, Attention and Interpretation -Consumer Behavior
Exposure, Attention and Interpretation -Consumer Behavior
 
Emotions and Marketing Strategy- Cosnumer Behavior
Emotions and Marketing Strategy- Cosnumer BehaviorEmotions and Marketing Strategy- Cosnumer Behavior
Emotions and Marketing Strategy- Cosnumer Behavior
 
Attitude - Consumer Behavior
Attitude - Consumer BehaviorAttitude - Consumer Behavior
Attitude - Consumer Behavior
 
Measuring Sources of Brand Equity -Brand Management
Measuring Sources of Brand Equity -Brand ManagementMeasuring Sources of Brand Equity -Brand Management
Measuring Sources of Brand Equity -Brand Management
 
Social Media Marketing - Brand Management
Social Media Marketing - Brand ManagementSocial Media Marketing - Brand Management
Social Media Marketing - Brand Management
 

Dernier

NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
Khaled Al Awadi
 
What is social media.pdf Social media refers to digital platforms and applica...
What is social media.pdf Social media refers to digital platforms and applica...What is social media.pdf Social media refers to digital platforms and applica...
What is social media.pdf Social media refers to digital platforms and applica...
AnaBeatriz125525
 

Dernier (20)

TriStar Gold Corporate Presentation May 2024
TriStar Gold Corporate Presentation May 2024TriStar Gold Corporate Presentation May 2024
TriStar Gold Corporate Presentation May 2024
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
 
What is social media.pdf Social media refers to digital platforms and applica...
What is social media.pdf Social media refers to digital platforms and applica...What is social media.pdf Social media refers to digital platforms and applica...
What is social media.pdf Social media refers to digital platforms and applica...
 
Powers and Functions of CPCB - The Water Act 1974.pdf
Powers and Functions of CPCB - The Water Act 1974.pdfPowers and Functions of CPCB - The Water Act 1974.pdf
Powers and Functions of CPCB - The Water Act 1974.pdf
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.
 
Team-Spandex-Northern University-CS1035.
Team-Spandex-Northern University-CS1035.Team-Spandex-Northern University-CS1035.
Team-Spandex-Northern University-CS1035.
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptxUnveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
 
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
 
Elevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesElevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO Services
 
Sedex Members Ethical Trade Audit (SMETA) Measurement Criteria
Sedex Members Ethical Trade Audit (SMETA) Measurement CriteriaSedex Members Ethical Trade Audit (SMETA) Measurement Criteria
Sedex Members Ethical Trade Audit (SMETA) Measurement Criteria
 
New Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9loNew Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9lo
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
Creative Ideas for Interactive Team Presentations
Creative Ideas for Interactive Team PresentationsCreative Ideas for Interactive Team Presentations
Creative Ideas for Interactive Team Presentations
 
Hyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings releaseHyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings release
 
Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)
 
Event Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybridEvent Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybrid
 
LinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptxLinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptx
 
How to Maintain Healthy Life style.pptx
How to Maintain  Healthy Life style.pptxHow to Maintain  Healthy Life style.pptx
How to Maintain Healthy Life style.pptx
 

Text Mining - Advanced Customer Analytics

  • 1. TEXT MINING Team 4 Syed Aqib Ali Syeda Ramsha Habib Gilani Lateefah Omoyosola Yusuf Rochelle Star Velasquez
  • 2. TABLE OF CONTENT 1. What is Text Mining? 2. Introduction 3. Main Models Used 4. Key Contributions 5. Marketing and Non-marketing Applications 6. Limitations 7. Avenues for future research 8. Key Takeaways
  • 3. WHAT IS TEXT MINING?
  • 4.
  • 5. WHAT IS TEXT MINING? Text mining is a process of deriving/extracting high quality meaningful information and patterns. Text analysis involves information retrieval, analysis to study word frequency distributions, pattern recognition, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics.
  • 7. INTRODUCTION ● A research study applying Text Mining and Machine Learning tools. ● The authors find that loan applicants' choice of words reveals insights into their intentions, circumstances, and personality. ● This information is powerful in predicting loan repayment, going beyond typical financial and demographic factors.
  • 8. Setting and Data 1. Potential borrowers submit their request for a loan for a specific amount with a specific maximum interest rate (they are willing to pay). 2. The loan amount they wish to borrow must in (between $1,000 and $25,000 in the data). 3. Prosper verifies all financial information, including the potential borrower’s credit score.
  • 9. Textual, Financial, and Demographic Variables 1. Textual variables: a. The number of characters in the title and the text box. b. The percentage of words with six or more letters. c. SMOG: This measures writing quality by mapping it to number of years of formal education needed to easily understand the text in first reading. d. Count of spelling mistakes. e. Bigrams : Two-word combinations (help to understand the context and the pattern). 2. Financial variable: a. Loan amount, borrower’s credit grade, Debt to income ratio. 3. Demographic variables: a. Gender, age, location, race.
  • 10. PROCESS OF TEXT MINING The authors used something called "Term frequency-inverse document frequency" or tf- idf to compare how often a word is used in a loan request to how often it's used in all the loan requests and how long the request is. Process 04 Process 01 tm package in r was used to select distinct words in each loan application. Process 02 - Porter’s stemming algorithm to collapse variations of words into one e.g., “borrower,” “borrowed,” “borrowing,” and “borrowers” become “borrow” (3.5M words → 30,920 unique words and 1052 bigrams. PyEnchant 1.6.6 package in Python was used to count spelling mistakes in the loan applications. This allows them to identify words that are misspelled and potentially serve as a proxy for characteristics correlated with lower income. Process 03 4
  • 12. MODEL 1 - Predictive model Aim: To evaluate whether the text used by borrowers in their loan application predicts their loan default. Machine Learning Methods: Ensemble stacking approach 1. Train each model on the calibration data (2 logistics regression and 3 tree- based methods). 2. Build a weighting model to combine the models calibrated in the first model.
  • 13. Result Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 14. Result Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 15. MODEL 2 - Words and writing styles of default loan request Aim: Learn which words, writing styles, and general ideas conveyed by the text are more likely to be associated with default loan request. Machine Learning Methods: 1)Machine learning tools Naive Bayes L1 regularization binary logistic model Word Count Dictionary (LIWC) 2) Standard Econometrics tools Topic’s Logistic regression extracted from a latent Dirichlet allocation (LDA) analysis and the sub-dictionaries of the Linguistic Inquiry.
  • 16. Result Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 17. MODEL 3 - Potential Borrower’s Personality Aim: Further exploration of potential traits and states of borrowers. Machine Learning Methods: Applying LIWC library. Results: Defaulting loan requests are written in a manner consistent with the writing styles of extroverts and liars.
  • 19. Analyzing applications Borrower 1: “I am a hard working person, married for 25 years, and have two wonderful boys. Please let me explain why I need help. I would use the $2,000 loan to fix our roof. Thank you, God bless you, and I promise to pay you back.” Borrower 2: “While the past year in our new place has been more than great, the roof is now leaking and I need to borrow $2,000 to cover the cost of the repair. I pay all bills (e.g., car loans, cable, utilities) on time.” Which borrower is more likely to default?
  • 20. KEY CONTRIBUTIONS Textual information on the loan significantly helps predict loan default. Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 21. KEY CONTRIBUTIONS Words indicative of loan repayment. Source: Netzer, O., Lemaire, A., & Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960-980.
  • 22. KEY CONTRIBUTIONS Loan default requests mimic the writing styles of extroverts and liars.
  • 23. KEY CONTRIBUTIONS Evidence of people with different educational backgrounds and economic situations use words differently.
  • 24. KEY CONTRIBUTIONS Evidence of supplementing traditional measures and replacing some aspects of it.
  • 25. KEY CONTRIBUTIONS Help lenders avoid defaulting borrowers and help borrowers better express themselves when requesting a loan.
  • 27. MARKETING APPLICATIONS • Sentiment analysis • Brand monitoring • Customer feedback analysis • Churn prediction • Predictive analysis • Market research • Personalized marketing • Social media analytics
  • 28. NON-MARKETING APPLICATIONS • Psychological profiling • Fraud detection • Credit risk assessment • Customer service
  • 30. LIMITATIONS 1. Text data may not be available for all loan applications, as some borrowers may not provide any text or may provide incomplete or inaccurate information. 2. Text data may be subject to interpretation and bias, as different lenders may interpret the same text differently based on their own biases and assumptions. 3. The use of text data to predict loan default raises ethical and legal concerns
  • 32. FURTHER RESEARCH ● The predictive ability of text analysis regarding future behavior extended to other behaviors and industries. ● Extension of results to other types of communication, e.g., phone calls and online chats. ● How word usage can change overtime.
  • 33. FURTHER RESEARCH ● Exploring the role of emotions and mental states in financial behaviors. ● Investigate the impact of different writing styles on loan default. ● Application of the findings to other loan types and platforms. ● Develop more accurate and efficient text-mining and machine learning tools for analyzing loan applications.
  • 35. KEY TAKEAWAYS ● Text mining and machine learning tools can be employed to predict psychographics, including the likelihood of future loan defaults.
  • 36. KEY TAKEAWAYS ● The LIWC dictionaries associated with extroversion and deception are significantly correlated with default.
  • 37. KEY TAKEAWAYS ● There may be variables that are affected by both the observable text and unobservable personality traits.