Death of the Search Startup

•Télécharger en tant que PPT, PDF•

2 j'aime•1,081 vues

The document discusses how building a general purpose search engine is an extremely expensive endeavor that would cost a minimum of $100 million. It breaks down the various costs that contribute to this total, including storage and crawling of web data, ensuring result relevance, performance requirements, and hiring necessary personnel over several years. Through examples of past search startups, the author argues that any entrepreneur claiming they can create a full search engine for under $100 million is underestimating the challenges and expenses involved.

Technologie

The Death of the Search Startup ,[object Object],[object Object]

Search Startups are Dead ,[object Object],[object Object]

An Equation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hardware People

Freshness: Things change quickly Photo credit

Adding it all up ,[object Object],[object Object],[object Object]

Thanks! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Recommandé

dotFM 2010gtbundy

Közösségi média tárlatvezetésIstvan Szecsey

Venkat Rajaraman - Su-kamguest17a0764

A királyi palota egykor és mapmkik

Koshy Cherail - AEEE guest17a0764

Gyűjtemények feldolgozása a KataLIN programmalIstvan Szecsey

A 16. századi egri vár 3D-s virtuális rekonstrukciójaIstvan Szecsey

Data Con LA 2018 - What I’ve Learned About How Machines Learn by Luis Bitenco...Data Con LA

Recommandé

dotFM 2010gtbundy

Közösségi média tárlatvezetésIstvan Szecsey

Venkat Rajaraman - Su-kamguest17a0764

A királyi palota egykor és mapmkik

Koshy Cherail - AEEE guest17a0764

Gyűjtemények feldolgozása a KataLIN programmalIstvan Szecsey

A 16. századi egri vár 3D-s virtuális rekonstrukciójaIstvan Szecsey

Data Con LA 2018 - What I’ve Learned About How Machines Learn by Luis Bitenco...Data Con LA

Let's Talk: fundamentals of conversational designNikita Lukianets

SEO in the Age of Entities: Using Schema.org for FindabilityJonathon Colman

Crowdsourcing & Gamification Yefeng Liu

Intro to Python for Data ScienceTJ Stalcup

Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad

Talent Sleuthing in the Intelligence Community - Jo Weech; recruitDC Spring 2018RecruitDC

Edtech summit 2018 - Unlearning to learnShah Widjaja

Bigger than Any One: Solving Large Scale Data Problems with People and MachinesTyler Bell

Spohrer Ntegra 20230324 v12.pptxISSIP

ChatGPT OpenAI Primer for BusinessDion Hinchcliffe

Starting AI tomorrow: are you ready? - Christel Schoger (Google)metapeople NL

Ruby, Turkee and Mechanical TurkJim Jones

Intro to Python for Data ScienceTJ Stalcup

UX STRAT 2014: Jim Kalbach, "Applying 'Jobs to be Done' to UX Strategy"UX STRAT

Mass declassification sept 23 2010v2.1Jeff Jonas

Knowledge Integration in PracticePeter Mika

Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa

Data Science in the Real World: Making a Difference Srinath Perera

Friendsters @ Work (SDForum)Joe McCarthy

Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson

Commit 2024 - Secret Management made easyAlfredo García Lavilla

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Contenu connexe

Similaire à Death of the Search Startup

Let's Talk: fundamentals of conversational designNikita Lukianets

SEO in the Age of Entities: Using Schema.org for FindabilityJonathon Colman

Crowdsourcing & Gamification Yefeng Liu

Intro to Python for Data ScienceTJ Stalcup

Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad

Talent Sleuthing in the Intelligence Community - Jo Weech; recruitDC Spring 2018RecruitDC

Edtech summit 2018 - Unlearning to learnShah Widjaja

Bigger than Any One: Solving Large Scale Data Problems with People and MachinesTyler Bell

Spohrer Ntegra 20230324 v12.pptxISSIP

ChatGPT OpenAI Primer for BusinessDion Hinchcliffe

Starting AI tomorrow: are you ready? - Christel Schoger (Google)metapeople NL

Ruby, Turkee and Mechanical TurkJim Jones

Intro to Python for Data ScienceTJ Stalcup

UX STRAT 2014: Jim Kalbach, "Applying 'Jobs to be Done' to UX Strategy"UX STRAT

Mass declassification sept 23 2010v2.1Jeff Jonas

Knowledge Integration in PracticePeter Mika

Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa

Data Science in the Real World: Making a Difference Srinath Perera

Friendsters @ Work (SDForum)Joe McCarthy

Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson

Similaire à Death of the Search Startup (20)

Let's Talk: fundamentals of conversational design

SEO in the Age of Entities: Using Schema.org for Findability

Crowdsourcing & Gamification

Intro to Python for Data Science

Metadata in a Crowd: Shared Knowledge Production

Talent Sleuthing in the Intelligence Community - Jo Weech; recruitDC Spring 2018

Edtech summit 2018 - Unlearning to learn

Bigger than Any One: Solving Large Scale Data Problems with People and Machines

Spohrer Ntegra 20230324 v12.pptx

ChatGPT OpenAI Primer for Business

Starting AI tomorrow: are you ready? - Christel Schoger (Google)

Ruby, Turkee and Mechanical Turk

Intro to Python for Data Science

UX STRAT 2014: Jim Kalbach, "Applying 'Jobs to be Done' to UX Strategy"

Mass declassification sept 23 2010v2.1

Knowledge Integration in Practice

Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)

Data Science in the Real World: Making a Difference

Friendsters @ Work (SDForum)

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Dernier

Commit 2024 - Secret Management made easyAlfredo García Lavilla

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Story boards and shot lists for my a level piececharlottematthew16

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Dernier (20)

Commit 2024 - Secret Management made easy

What's New in Teams Calling, Meetings and Devices March 2024

TeamStation AI System Report LATAM IT Salaries 2024

Story boards and shot lists for my a level piece

Designing IA for AI - Information Architecture Conference 2024

Are Multi-Cloud and Serverless Good or Bad?

Developer Data Modeling Mistakes: From Postgres to NoSQL

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

"Debugging python applications inside k8s environment", Andrii Soldatenko

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

SIP trunking in Janus @ Kamailio World 2024

How AI, OpenAI, and ChatGPT impact business and software.

Unleash Your Potential - Namagunga Girls Coding Club

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Vertex AI Gemini Prompt Engineering Tips

Human Factors of XR: Using Human Factors to Design XR Systems

Dev Dives: Streamline document processing with UiPath Studio Web

DevoxxFR 2024 Reproducible Builds with Apache Maven

Connect Wave/ connectwave Pitch Deck Presentation

Gen AI in Business - Global Trends Report 2024.pdf

Death of the Search Startup

3. ∑ ≠ Photo Credit

7. The Web is big

8. It’s (even) bigger than you think

9. Freshness: Things change quickly Photo credit

10. Serving relevant results

11. Out with the bad

12. And then all the other stuff

13. Speed is everything Photo Credit

14. Divide and conquer

15. * The Johnson Coefficient Ĵ

16. People Photo credit

17.

18. Case Study 1: SearchMe

19. Case Study 2: Powerset

20. Keep Hope Alive

21.

22.

Notes de l'éditeur

Search Startups are Dead Entrepreneurs tend to think that there’s always a way to innovate out of a problem In this case, however, I’m going to show you that there are systematic reasons for why there cannot be a general purpose search engines that compete with Google and Bing.
I’ve worked for three search startups – SideStep, Kosmix, and Powerset – and I still don’t have a Gulfstream This is sort of an exercise in apologetics: it’s really not my fault that I don’t have mountains of cash from my stock options
There are good reasons about switching costs and marketing that a new search engine can’t pop up, but that’s not what I’ll focus on. It’s all about the mighty greenback: building a search engine is a really expensive proposal.
It goes without saying that the numbers herein are not the opinion of my employer and are speculative, but they are informed by experience . I’ve made a lot of estimations in Excel to come up with these numbers and I’m pretty confident that I’m in the right ballpark
The equation has two major components: hardware and people. In the following slides, I’ll explain the components going into hardware and people and, in the processs, show you how complicated and expensive a search engine is to build.
Last year, Google estimated that the Web is over 1T documents. That’s really expensive to store
It’s not just the Web page you have to store. There’s links, anchor text, and, since you’re a smarty-pants startup, you’ll probably be extracting all kinds of smart metadata on any page.
Keep in mind that the Web is constantly changing. New pages are being added, pages already crawled are changing, and making sure you have the latest copy of the Web on hand is really important.
At bare minimum, you need results that are as relevant as Bing or Google. To do that, you’ll need lots of servers to run relevance experiments. You’ll need lots of storage for huge amounts of clickstream data.
I know there aren’t any black hat SEO folks in this crowd, but there’s a constant battle with site-owners who don’t have the best interests of users at heart and are willing to do things to game search results.
No search engine is complete without lots of ancillary data: weather, stock quotes, images, maps, Twitter, Facebook. Licensing the content or building the vertical is very expensive and you’re not a true replacement without it.
One of the most expensive components of a search engine is runtime. When you do a search in Bing, results come back from thousands, or possibly billions, of Web pages in less than a second? How does that happen? Lots, and lots, and lots of servers.
All search engines use some kind of divide and conquer algorithm that federates your search to thousands of machines. That means that for any query, there are thousands of machines involved. When you have millions of users, serving search results gets very expensive.
At Powerset, we estimated that our index was 10-20 times the size of a typical keyword index. The Johnson coefficient represents the tax on storing, relevance and runtime that you’d have at an innovative search engine.
250 people for 2 years.