구글을 지탱하는 기술

•Télécharger en tant que PPTX, PDF•

1 j'aime•378 vues

sid choi

구글을 지탱하는 기술

Technologie Business

1. First Appearance of Google
2. Main Concepts
3. Search Engine Structure
- ‘s Roll
- Back-end Structure
- Index Structure
4. Total Structure

First Appearance of Google

• Why?
Get useful results

• Who?
Sergey Brin & Larry Page

Main Concepts

Hardware expands

Ranking Function
– Page Rank
– Anchor Text
– Word

Search Engine Structure

Internet
Search Engine

Search Engine
Structure

Search Server’s Roll

• 통신 관리 Back-
Search
Index
Server end
• 요청 해석하여 처리할 내용 판단

• 인덱스에서 필요한 정보 찾아냄

• 결과를 편집해 이용자에게 보냄

Search Engine
Structure

Back-end’s Roll

• Crawling

•Web page 수집해 오는 기술
Back-
Search
Index
Server end
•많은 시간 -> 복수의 crawler 사용

•수집한 것을 Repository에 보관

• Creating Index

•Repository에 저장된 web page
로 Index를 만들어 냄

•구조분석, 단어처리, 링크 처리
랭킹 등

Search Engine
Structure

Index’s Roll

• 주어진 Data를 안전하게 저장 Back-
Search
Index
Server end
• 요청 받은 Data를 찾아냄

• Search Engine의 Data Base 역
할

Search Engine
Structure
Back-end Structure

Crawling

Web page 수집해 오는 기술

초기 Google 2400만개 Web Page 등록

초당 avg40page를 유지하기 위해선
동시에 수백 개의 download유지

-> 현재는??

구글 검색했을 때 3,070,000,000개 결과

Search Engine
Structure
Back-end Structure
URL
server
crawler
Crawler

crawler
URL server 가 전체 crawler 지휘

각 crawler는 지시에 따라 crawler
Internet
Web Page download

Repository에 임시 저장

• docID – 고유 숫자 값
Repository
• url – URL
• text – 압축물
• etc. – date, page length…

Search Engine
Structure
Back-end Structure
URL
server
crawler
Crawler

crawler
주소해석이 시간 많이 소요
-> 내부에 DNS cache 관리
crawler
Internet
Repository에 저장후
URL server가 다음주소 할당

Repository

Search Engine
Structure
Back-end Structure
docID Sejong.ac.k
url r
<html>
1
<head>
Creating Index <title>세종대학교</title>
</body>
<h1>학사정보<h1>
세종대학교
Title
….
기타 …
Analyzing Web Page structures

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

DocIndex URLlist
URLlist
– url을 key로 사용 docID url title etc. url docID
– docID를 가져오기 위함

Search Engine Lexicon
Structure
word wordID
Back-end Structure
세종 101
Barrels
대학교 102
학사 201
Creating Index 정보 202

Barrels
docID wordID#1 Position#1 Size#1 Etc.#1
Word Index
Position#2 Size#2 Etc.#2

Lexicon wordID#2 Position#1 Size#1 Etc.#1
– word -> wordID
Position#2 Size#2 Etc.#2

…
Barrels
– docID wordID position size etc.

Inverted Index
– wordID를 Key로 사용

Search Engine
Structure
Back-end Structure

docID Sejong.ac.k
docID 3
Creating Index url r
url Cyworld.com
1

Link

Link Index

URLlist
URLlist
Links Links
Sejong.ac.kr 1 1 3
Cyworld.com 3
Anchortext
- A information of linked page

Search Engine
Structure
Back-end Structure

Creating Index

Ranking Index

Page Rank - Link
Web Page 사이의 link를 일종의 투표처럼 분석
-> 더 많은 link를 받은 문서 = 더 좋은 문서
Anchortext
Word - Barrels

Search Engine
Structure
DocIndex
Index Structure

Lexicon

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

Lexicon
– word -> wordID

Barrels
Barrels
– storages

Total Structure

User

Index Back-end Internet

crawler
DocIndex
Search
Server crawler

Lexicon
crawler

Structure
URL
server
word
Barrels
Barrels
Barrels Repository

Link
URLlist

Ranking
Links

Recommandé

电子商务网站前端开放实战macji

D2-超级旺铺supershop

Theroyalconnection1week hrhCruzeiro Safaris

SPSPhilly - SharePoint 2010 Tips & Tricks of the Trade - Avoiding Administrat...Scott Hoag

Ea e biodiversidade pedrini ambiente e educação 2006AlexandredeGusmaoPedrini

온톨로지 개념 및 표현언어Dongbum Kim

3 구글의 분산 스토리지(1)guest5c3f0b1

Google3guest484775

Recommandé

电子商务网站前端开放实战macji

D2-超级旺铺supershop

Theroyalconnection1week hrhCruzeiro Safaris

SPSPhilly - SharePoint 2010 Tips & Tricks of the Trade - Avoiding Administrat...Scott Hoag

Ea e biodiversidade pedrini ambiente e educação 2006AlexandredeGusmaoPedrini

온톨로지 개념 및 표현언어Dongbum Kim

3 구글의 분산 스토리지(1)guest5c3f0b1

Google3guest484775

Microsoft SharePoint Server 2007ITDogadjaji.com

Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleIE Group

Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersDan Usher

SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...Dan Usher

E Pi Server Easy Search Technical Overviewguru122

E Pi Server Easy Search Technical Overviewguestd9aa5

Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...Deploy Software Solutions ("Deploy Solutions")

Websites On SpeedTom Croucher

SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersDan Usher

Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Cengage Learning

Working With RailsDali Wang

GoogleConveyUX

Website architecture 2013Stoney deGeyter

The things we found in your websitehernanibf

Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot

Share Point2007 Best Practices FinalMarianne Sweeny

REST Introduction (PHP London)Paul James

Session6Denise Garofalo

Project Tools in Web Developmentkmloomis

BADCamp 2008 DB SyncShaun Haber

벤치마킹sid choi

Meconomysid choi

Contenu connexe

Similaire à 구글을 지탱하는 기술

Microsoft SharePoint Server 2007ITDogadjaji.com

Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleIE Group

Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersDan Usher

SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...Dan Usher

E Pi Server Easy Search Technical Overviewguru122

E Pi Server Easy Search Technical Overviewguestd9aa5

Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...Deploy Software Solutions ("Deploy Solutions")

Websites On SpeedTom Croucher

SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersDan Usher

Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Cengage Learning

Working With RailsDali Wang

GoogleConveyUX

Website architecture 2013Stoney deGeyter

The things we found in your websitehernanibf

Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot

Share Point2007 Best Practices FinalMarianne Sweeny

REST Introduction (PHP London)Paul James

Session6Denise Garofalo

Project Tools in Web Developmentkmloomis

BADCamp 2008 DB SyncShaun Haber

Similaire à 구글을 지탱하는 기술 (20)

Microsoft SharePoint Server 2007

Stephen McHenry - Chanecellor of Site Reliability Engineering, Google

Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders

SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...

E Pi Server Easy Search Technical Overview

Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...

Websites On Speed

SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders

Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5

Working With Rails

Google

Website architecture 2013

The things we found in your website

Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide

Share Point2007 Best Practices Final

REST Introduction (PHP London)

Session6

Project Tools in Web Development

BADCamp 2008 DB Sync

Plus de sid choi

벤치마킹sid choi

Meconomysid choi

웹 기획, 사용자를 배려하는 합리적인 생각sid choi

Google을 지탱하는 기술4sid choi

Google을 지탱하는 기술5sid choi

Google을 지탱하는 기술3sid choi

벤치 마킹sid choi

미코노미sid choi

웹기획, 사용자를 배려하는sid choi

Google을 지탱하는 기술2sid choi

구글을지탱하는기술sid choi

구글을 지탱하는 기술sid choi

구글을지탱하는기술sid choi

Plus de sid choi (16)

벤치마킹

Meconomy

웹 기획, 사용자를 배려하는 합리적인 생각

Google을 지탱하는 기술4

Google을 지탱하는 기술5

Google을 지탱하는 기술3

벤치 마킹

미코노미

웹기획, 사용자를 배려하는

Google을 지탱하는 기술2

구글을지탱하는기술

구글을 지탱하는 기술

구글을지탱하는기술

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

CNv6 Instructor Chapter 6 Quality of Service

Axa Assurance Maroc - Insurer Innovation Award 2024

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Boost PC performance: How more available memory can improve productivity

Driving Behavioral Change for Information Management through Data-Driven Gree...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

08448380779 Call Girls In Friends Colony Women Seeking Men

Scaling API-first – The story of a global engineering organization

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Handwritten Text Recognition for manuscripts and early printed texts

Powerful Google developer tools for immediate impact! (2023-24 C)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

Automating Google Workspace (GWS) & more with Apps Script

[2024]Digital Global Overview Report 2024 Meltwater.pdf

구글을 지탱하는 기술

2. 구글을 지탱하는 기술

3. 구글을 지탱하는 기술 – chapter1.ppt

4. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure

5. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page

6. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word

7. Search Engine Structure Internet Search Engine

8. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄

9. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등

10. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할

11. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과

12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…

13. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository

14. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함

15. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용

16. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page

17. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels

18. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages

19. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links

20. Thanks for your attention