Producing direct value for businesses via quantitative models.
New analytical tools such as Looker allow data analysts to speed up the dirty work around building data models—making it less painful to clean data, explore predictive factors, and evaluate results.
In this educational webinar from Data Science Central (DSC), Justin Palmer of LendingHome, a mortgage banking and marketing platform, joins Colin Zima, Chief Analytics Officer at Looker. Using a public-domain FAA dataset and the LendingHome platform as examples, they dig into the data modeling process and offer ideas for improvements.
- See more at: http://try.looker.com/resources/improving-data-modeling-workflow#sthash.2rGxwhJ7.dpuf
2. LendingHome is the most advanced
mortgage marketplace platform in the
world
What Is LendingHome?
3. What is LendingHome?
Simple, efficient borrower experience
Investors are matched to safe, high-yield loans
World-class mortgage ops process driven by
transparent analytics
Statistical models for credit, underwriting, pricing,
sales, marketing
4. LendingHome: Mortgage Marketplace
• Fastest loan funded in 72 hours
• Borrowers prequalify in 3 minutes
Investors are matched to safe, high-yield loans
• Line-item accounting for payout tracking
• Proceeds from loans wired in under a second
5. Looker is a data exploration solution
that operates in the database
to enable organizations to explore data
in all its detail.
What Is Looker?
7. LendingHome: Operations
World-class operational process driven by
transparent analytics
• Integrations with over 20 vendors
• Rigorous 96-item checklist
• Looker-driven workflow measures quality and
timing of entire process
• Highly complex reporting, dashboards are
easy in Looker
10. The Challenge
Data scientists create value by creating
actionable models
More time spend preparing data and evaluating
results than using heavy data science
techniques
How to speed up analytical cycles
12. Our Example
38 million flights between 2000 and 2005
Carrier information, departure and arrival
location, manifest data, aircraft data
Modeling on-time rates
17. Pulling Data
Analytics allow data scientists to lever analytical
modeling to easily grab reshaped data
- Time-zone functions
- Sub-select functions
- Cleaned/filtered data
18. LendingHome: Quant
Modeling
• Credit models learned over 25M loans, 4B
payments, macroeconomic factors
• Scoring transparent to borrowers, investors
• Finds and predict borrower conversion from
180M RE transactions
• Feature extraction, data exploration, train/test
splits, model analysis in Looker
Blank = nothing to do
Query over those 96 checklist items, 20+ vendors
View into many standard 800+ page closing packages distilled to one dashboard
Final query > 1300 lines of sql
~100 of 1300 lines
Tedious SQL case encapsulated…
Nested inside counts
Encapsulated subselects
-> Not doable by hand
Slicing up data to examine trends.
Midnight flights cannot really be this bad, and why are there so many
The same cleaning applies to problems like dates, missing data, grouping data – modeling in database helps!
Correlated/co-dependent variable selection (day of week + time of day) can be especially difficult. Quick analytical exploration can reveal whether these variables need to be included.
FICO is mostly normal… for default modeling, unclear what to use.
- proportion within bucket?
- mixture model?
- what about function of loss? Of principal? Are those correlated?
Easy to split by state, hpi within time window, loan type in looker (no script, no query editing, just click)
Easy to overlay model scores, compare models vis a vis factors
Noisy features are easy to see. No way DTI is < 10% for approx half of borrowers.
Drillable, iterative confusion matrix … in a few clicks
Explore factors’ impact by adding variables (can’t show this) -> biz/finance users can peek into the model this way
Filter on version -> one algorithm/dataset pair
Easily visualize default rate vs return rate, stratify by loan size, pre/post crisis, etc, etc
We just got access to database of plane types, now rather than porting everything into our modeling software, we can run analytics again to examine whether the data (like plane size), appears to have significant bias in our forecasts. Since we didn’t have this data at model-time, quick analytical views can be an effective way to quickly evaluate whether there is pickup to be had re-modeling.
In closing – simple tools when you can, analytics are simple