SlideShare a Scribd company logo
1 of 25
Download to read offline
Needle in the Haystack
Mining for Actionable Information in the Noisy Web
●
●
●
○
○
○
●
●
●
Source: Mashable
Input Processing Output
Input Processing Output
Text data
-Ambiguity in Text
-Time Consuming
-Dedicated Infrastructure
-Manual Labor
Input
-Storage
-Backup
-Management
Processing Output
-Return on Investment
Why Bother?
Early Access:
-Low exposed sources
-Faster than popular media
-Order of hours
Trading Strategies:
-General sentiment
-Trend following
...
One Big Problem
Big Data is getting Bigger
20+ million sources
200+ TB
200,000+ GB
7+ billion articles
1 Accern Year =
But Alpha-to-Noise Ratio is Decreasing
Daily 4M analyzed,
14K delivered*
Daily 6M analyzed,
17K delivered*
2014
2016
*approximately
How to increase this ratio?
Increase alpha/relevant information
and/or
Decrease noise
3 high-level, practical points.
1. Reliable training data for every model in your system.
a. Requires a lot of manual labor and heuristics.
2. Sequencing noise filters in the order of increasing
computational complexity and cost.
a. Will reduce latency and infrastructure cost.
3. Relevancy module -- secret ingredient.
a. Define what is relevant to you. (e.g. M&A, Rumors, etc.)
Accern Noise Cancellation Pipeline (simplified)
Bad Data
Blacklisted Sources
Data from
20 million
sources
Structured Spam
Pattern Matching.
Spam Classifiers
Ensemble Learning
Language Rules
Semantic Analysis
Financial Mapping
Taxonomies
Relevance Scoring
Secret Ingredient
Analytics Pipeline Spam Reduction
Relevancy
Less than 3% noise
Performance comparison
Strategy: Long-only -- Buy stocks if long condition matches.
Backtest Companies: S&P 500
Benchmark: S&P 500 (SPY)
Metrics:
Average Daily Sentiment
Average of calculated sentiment of related articles in last 24 hours.
Impact
Probability*100 that the article may impact the stock prices by equal to, or more
than, 1% (increase or decrease) by the end of the trading day. Utilizes historical
information.
Performance comparison
Sentiment
Stanford CoreNLP
Python NLTK
Impact
CoreNLP Entity
Training Data:
Enron Spam Dataset
Sentiment
Stanford CoreNLP
Python NLTK
Impact
Accern Impact
Training Data:
Enron Spam Dataset
Sentiment
Accern Sentiment
Impact
Accern Impact
Training Data:
Historical data
processed using Accern Noise
Cancellation pipeline.
*Models can only trade on common entities for fair comparison.
**Each model can use the best configuration.
***Trading Period: July 1, 2013 - July 1, 2015
Standard Model Standard+Accern Model Accern Model
Standard Model
Using standard spam filters for noise cancellation, standard sentiment
model, and impact calculated using standard entity extraction.
Using standard spam filters for noise cancellation, standard
sentiment model, and Accern impact model.
Standard+Accern Model
Using Accern Noise Cancellation pipeline for noise
reduction, Accern sentiment and Accern impact models.
Accern Model
One more strategy: Drift Following
● Number of long/shorts determine the
weights of securities.
● Use 40 day article sentiment average.
● Weekly holding period.
Alpha Stream Strategies
Transparency
app.accern.com
Thank You
www.accern.com
Anshul Vikram Pandey
anshul@accern.com

More Related Content

What's hot

Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data
Pactera_US
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 

What's hot (20)

Fraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On GraphsFraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On Graphs
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
 
Key Failure Factors of Building a Data Science Team
Key Failure Factors of Building a Data Science TeamKey Failure Factors of Building a Data Science Team
Key Failure Factors of Building a Data Science Team
 
Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data
 
Business Opportunities, Challenges, Strategies and Execution in Big Data Era ...
Business Opportunities, Challenges, Strategies and Execution in Big Data Era...Business Opportunities, Challenges, Strategies and Execution in Big Data Era...
Business Opportunities, Challenges, Strategies and Execution in Big Data Era ...
 
Building a Data Driven Business
Building a Data Driven BusinessBuilding a Data Driven Business
Building a Data Driven Business
 
Big Data and Analytics - 2016 CFO
Big Data and Analytics - 2016 CFOBig Data and Analytics - 2016 CFO
Big Data and Analytics - 2016 CFO
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and ChallengesReal-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and Challenges
 
Outlier and fraud detection using Hadoop
Outlier and fraud detection using HadoopOutlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
 
AI projects - Lifecyle & Best Practices
AI projects - Lifecyle & Best PracticesAI projects - Lifecyle & Best Practices
AI projects - Lifecyle & Best Practices
 
How artificial intelligence (AI) can help maximize customer intelligence ROI
How artificial intelligence (AI) can help maximize customer intelligence ROIHow artificial intelligence (AI) can help maximize customer intelligence ROI
How artificial intelligence (AI) can help maximize customer intelligence ROI
 
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
Big image analytics for (Re-) insurer
 Big image analytics for (Re-) insurer Big image analytics for (Re-) insurer
Big image analytics for (Re-) insurer
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
 
Quantamental Investing - Merging Machine Learning, Fundamentals, & Insight
Quantamental Investing - Merging Machine Learning, Fundamentals, & InsightQuantamental Investing - Merging Machine Learning, Fundamentals, & Insight
Quantamental Investing - Merging Machine Learning, Fundamentals, & Insight
 

Viewers also liked

Viewers also liked (19)

Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
 
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
 
Meb Faber at QuantCon 2016
Meb Faber at QuantCon 2016Meb Faber at QuantCon 2016
Meb Faber at QuantCon 2016
 
Machine Learning Based Cryptocurrency Trading by Arshak Navruzyan at QuantCon...
Machine Learning Based Cryptocurrency Trading by Arshak Navruzyan at QuantCon...Machine Learning Based Cryptocurrency Trading by Arshak Navruzyan at QuantCon...
Machine Learning Based Cryptocurrency Trading by Arshak Navruzyan at QuantCon...
 
Welcome to QuantCon 2016 John "fawce" Fawcett, Founder and CEO of Quantopian
Welcome to QuantCon 2016 John "fawce" Fawcett, Founder and CEO of QuantopianWelcome to QuantCon 2016 John "fawce" Fawcett, Founder and CEO of Quantopian
Welcome to QuantCon 2016 John "fawce" Fawcett, Founder and CEO of Quantopian
 
Improving Predictability of Oil via Reuters News Text by Sameena Shah at Quan...
Improving Predictability of Oil via Reuters News Text by Sameena Shah at Quan...Improving Predictability of Oil via Reuters News Text by Sameena Shah at Quan...
Improving Predictability of Oil via Reuters News Text by Sameena Shah at Quan...
 
Light up Your Dark Data by Lance Ransom at QuantCon 2016
Light up Your Dark Data by Lance Ransom at QuantCon 2016Light up Your Dark Data by Lance Ransom at QuantCon 2016
Light up Your Dark Data by Lance Ransom at QuantCon 2016
 
Financial Engineering and Its Discontents by Emanuel Derman at QuantCon 2016
Financial Engineering and Its Discontents by Emanuel Derman at QuantCon 2016Financial Engineering and Its Discontents by Emanuel Derman at QuantCon 2016
Financial Engineering and Its Discontents by Emanuel Derman at QuantCon 2016
 
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
Honey, I Deep-shrunk the Sample Covariance Matrix! by Erk Subasi at QuantCon ...
 
The Evolution of Social Listening for Capital Markets by Chris Camillo at Qua...
The Evolution of Social Listening for Capital Markets by Chris Camillo at Qua...The Evolution of Social Listening for Capital Markets by Chris Camillo at Qua...
The Evolution of Social Listening for Capital Markets by Chris Camillo at Qua...
 
Deep Value and the Aquirer's Multiple by Tobias Carlisle for QuantCon 2016
Deep Value and the Aquirer's Multiple by Tobias Carlisle for QuantCon 2016Deep Value and the Aquirer's Multiple by Tobias Carlisle for QuantCon 2016
Deep Value and the Aquirer's Multiple by Tobias Carlisle for QuantCon 2016
 
Trading Strategies Based on Market Impact of Macroeconomic Announcements by A...
Trading Strategies Based on Market Impact of Macroeconomic Announcementsby A...Trading Strategies Based on Market Impact of Macroeconomic Announcementsby A...
Trading Strategies Based on Market Impact of Macroeconomic Announcements by A...
 
Quantitative Trading in Eurodollar Futures Market by Edith Mandel at QuantCon...
Quantitative Trading in Eurodollar Futures Market by Edith Mandel at QuantCon...Quantitative Trading in Eurodollar Futures Market by Edith Mandel at QuantCon...
Quantitative Trading in Eurodollar Futures Market by Edith Mandel at QuantCon...
 
Combining the Best Stock Selection Factors by Patrick O'Shaughnessy at QuantC...
Combining the Best Stock Selection Factors by Patrick O'Shaughnessy at QuantC...Combining the Best Stock Selection Factors by Patrick O'Shaughnessy at QuantC...
Combining the Best Stock Selection Factors by Patrick O'Shaughnessy at QuantC...
 
From Backtesting to Live Trading by Vesna Straser at QuantCon 2016
From Backtesting to Live Trading by Vesna Straser at QuantCon 2016From Backtesting to Live Trading by Vesna Straser at QuantCon 2016
From Backtesting to Live Trading by Vesna Straser at QuantCon 2016
 
Statistics - The Missing Link Between Technical Analysis and Algorithmic Trad...
Statistics - The Missing Link Between Technical Analysis and Algorithmic Trad...Statistics - The Missing Link Between Technical Analysis and Algorithmic Trad...
Statistics - The Missing Link Between Technical Analysis and Algorithmic Trad...
 
The Sustainable Active Investing Framework: Simple, but Not Easy by Wesley Gr...
The Sustainable Active Investing Framework: Simple, but Not Easy by Wesley Gr...The Sustainable Active Investing Framework: Simple, but Not Easy by Wesley Gr...
The Sustainable Active Investing Framework: Simple, but Not Easy by Wesley Gr...
 
Dual Momentum Investing by Gary Antonacci QuantCon 2016
Dual Momentum Investing by Gary Antonacci QuantCon 2016Dual Momentum Investing by Gary Antonacci QuantCon 2016
Dual Momentum Investing by Gary Antonacci QuantCon 2016
 
Market Timing, Big Data, and Machine Learning by Xiao Qiao at QuantCon 2016
Market Timing, Big Data, and Machine Learning by Xiao Qiao at QuantCon 2016Market Timing, Big Data, and Machine Learning by Xiao Qiao at QuantCon 2016
Market Timing, Big Data, and Machine Learning by Xiao Qiao at QuantCon 2016
 

Similar to Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016

COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoCOSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
Amazon Web Services
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
 
DISCUSSION 1The Internet of Things (IoT) is based upon emerging .docx
DISCUSSION 1The Internet of Things (IoT) is based upon emerging .docxDISCUSSION 1The Internet of Things (IoT) is based upon emerging .docx
DISCUSSION 1The Internet of Things (IoT) is based upon emerging .docx
elinoraudley582231
 

Similar to Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016 (20)

Natural Language Processing for Annual Report in Australia
Natural Language Processing for Annual Report in AustraliaNatural Language Processing for Annual Report in Australia
Natural Language Processing for Annual Report in Australia
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
 
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoCOSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Denver Big Data Analytics Day
Denver Big Data Analytics DayDenver Big Data Analytics Day
Denver Big Data Analytics Day
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environment
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
 
Customer Decision Support System
Customer Decision Support SystemCustomer Decision Support System
Customer Decision Support System
 
Footfallcam Analysis
Footfallcam AnalysisFootfallcam Analysis
Footfallcam Analysis
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
 
Leverage Machine Data
Leverage Machine DataLeverage Machine Data
Leverage Machine Data
 
DISCUSSION 1The Internet of Things (IoT) is based upon emerging .docx
DISCUSSION 1The Internet of Things (IoT) is based upon emerging .docxDISCUSSION 1The Internet of Things (IoT) is based upon emerging .docx
DISCUSSION 1The Internet of Things (IoT) is based upon emerging .docx
 
Machine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWER
 
Opra W2&4 Tech Essentials
Opra W2&4 Tech EssentialsOpra W2&4 Tech Essentials
Opra W2&4 Tech Essentials
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
 

More from Quantopian

"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red..."From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
Quantopian
 

More from Quantopian (20)

Being open (source) in the traditionally secretive field of quant finance.
Being open (source) in the traditionally secretive field of quant finance.Being open (source) in the traditionally secretive field of quant finance.
Being open (source) in the traditionally secretive field of quant finance.
 
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
 
Tearsheet feedback webinar 10.10.18
Tearsheet feedback webinar 10.10.18Tearsheet feedback webinar 10.10.18
Tearsheet feedback webinar 10.10.18
 
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,..."Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
 
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese..."Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
 
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob..."Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
 
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas..."Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
 
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
 
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin..."Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
 
"How to Run a Quantitative Trading Business in China with Python" by Xiaoyou ...
"How to Run a Quantitative Trading Business in China with Python" by Xiaoyou ..."How to Run a Quantitative Trading Business in China with Python" by Xiaoyou ...
"How to Run a Quantitative Trading Business in China with Python" by Xiaoyou ...
 
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
 
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D..."From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
 
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo..."Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
 
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes..."Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
 
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos..."Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
 
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red..."From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
 
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael..."Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
 
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ..."A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
 
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ..."Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
 
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C..."Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
 

Recently uploaded

VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
Adnet Communications
 

Recently uploaded (20)

Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
 
WhatsApp 📞 Call : 9892124323 ✅Call Girls In Chembur ( Mumbai ) secure service
WhatsApp 📞 Call : 9892124323  ✅Call Girls In Chembur ( Mumbai ) secure serviceWhatsApp 📞 Call : 9892124323  ✅Call Girls In Chembur ( Mumbai ) secure service
WhatsApp 📞 Call : 9892124323 ✅Call Girls In Chembur ( Mumbai ) secure service
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
The Economic History of the U.S. Lecture 25.pdf
The Economic History of the U.S. Lecture 25.pdfThe Economic History of the U.S. Lecture 25.pdf
The Economic History of the U.S. Lecture 25.pdf
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
 
Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024
 

Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016

  • 1. Needle in the Haystack Mining for Actionable Information in the Noisy Web
  • 6. -Ambiguity in Text -Time Consuming -Dedicated Infrastructure -Manual Labor Input -Storage -Backup -Management Processing Output -Return on Investment
  • 7. Why Bother? Early Access: -Low exposed sources -Faster than popular media -Order of hours Trading Strategies: -General sentiment -Trend following ...
  • 9. Big Data is getting Bigger 20+ million sources 200+ TB 200,000+ GB 7+ billion articles 1 Accern Year =
  • 10. But Alpha-to-Noise Ratio is Decreasing Daily 4M analyzed, 14K delivered* Daily 6M analyzed, 17K delivered* 2014 2016 *approximately
  • 11. How to increase this ratio? Increase alpha/relevant information and/or Decrease noise
  • 12. 3 high-level, practical points. 1. Reliable training data for every model in your system. a. Requires a lot of manual labor and heuristics. 2. Sequencing noise filters in the order of increasing computational complexity and cost. a. Will reduce latency and infrastructure cost. 3. Relevancy module -- secret ingredient. a. Define what is relevant to you. (e.g. M&A, Rumors, etc.)
  • 13. Accern Noise Cancellation Pipeline (simplified) Bad Data Blacklisted Sources Data from 20 million sources Structured Spam Pattern Matching. Spam Classifiers Ensemble Learning Language Rules Semantic Analysis Financial Mapping Taxonomies Relevance Scoring Secret Ingredient Analytics Pipeline Spam Reduction Relevancy Less than 3% noise
  • 14. Performance comparison Strategy: Long-only -- Buy stocks if long condition matches. Backtest Companies: S&P 500 Benchmark: S&P 500 (SPY) Metrics: Average Daily Sentiment Average of calculated sentiment of related articles in last 24 hours. Impact Probability*100 that the article may impact the stock prices by equal to, or more than, 1% (increase or decrease) by the end of the trading day. Utilizes historical information.
  • 15. Performance comparison Sentiment Stanford CoreNLP Python NLTK Impact CoreNLP Entity Training Data: Enron Spam Dataset Sentiment Stanford CoreNLP Python NLTK Impact Accern Impact Training Data: Enron Spam Dataset Sentiment Accern Sentiment Impact Accern Impact Training Data: Historical data processed using Accern Noise Cancellation pipeline. *Models can only trade on common entities for fair comparison. **Each model can use the best configuration. ***Trading Period: July 1, 2013 - July 1, 2015 Standard Model Standard+Accern Model Accern Model
  • 16. Standard Model Using standard spam filters for noise cancellation, standard sentiment model, and impact calculated using standard entity extraction.
  • 17. Using standard spam filters for noise cancellation, standard sentiment model, and Accern impact model. Standard+Accern Model
  • 18. Using Accern Noise Cancellation pipeline for noise reduction, Accern sentiment and Accern impact models. Accern Model
  • 19. One more strategy: Drift Following ● Number of long/shorts determine the weights of securities. ● Use 40 day article sentiment average. ● Weekly holding period.
  • 21.
  • 22.
  • 25. Thank You www.accern.com Anshul Vikram Pandey anshul@accern.com