SlideShare a Scribd company logo
1 of 21
구글을 지탱하는 기술
구글을 지탱하는 기술 – chapter1.ppt
1. First Appearance of Google
2. Main Concepts
3. Search Engine Structure
    - ‘s Roll
    - Back-end Structure
    - Index Structure
4. Total Structure
First Appearance of Google


• Why?
           Get useful results


• Who?
           Sergey Brin & Larry Page
Main Concepts



Hardware expands


Ranking Function
         – Page Rank
         – Anchor Text
         – Word
Search Engine Structure




                      Internet
    Search Engine
Search Engine
Structure



Search Server’s Roll



• 통신 관리                                 Back-
                       Search
                                Index
                       Server            end
• 요청 해석하여 처리할 내용 판단

• 인덱스에서 필요한 정보 찾아냄

• 결과를 편집해 이용자에게 보냄
Search Engine
Structure



Back-end’s Roll

• Crawling

     •Web page 수집해 오는 기술
                                                  Back-
                                 Search
                                          Index
                                 Server            end
     •많은 시간 -> 복수의 crawler 사용

     •수집한 것을 Repository에 보관


• Creating Index

     •Repository에 저장된 web page
     로 Index를 만들어 냄

     •구조분석, 단어처리, 링크 처리
      랭킹 등
Search Engine
Structure



Index’s Roll



• 주어진 Data를 안전하게 저장                             Back-
                               Search
                                        Index
                               Server            end
• 요청 받은 Data를 찾아냄

• Search Engine의 Data Base 역
할
Search Engine
Structure
Back-end Structure



Crawling

Web page 수집해 오는 기술



초기 Google 2400만개 Web Page 등록

초당 avg40page를 유지하기 위해선
동시에 수백 개의 download유지

-> 현재는??

구글 검색했을 때 3,070,000,000개 결과
Search Engine
Structure
Back-end Structure
                               URL
                              server
                                                     crawler
Crawler

                                          crawler
URL server 가 전체 crawler 지휘

각 crawler는 지시에 따라             crawler
                                                           Internet
Web Page download

Repository에 임시 저장

• docID – 고유 숫자 값
                                        Repository
• url  – URL
• text – 압축물
• etc. – date, page length…
Search Engine
Structure
Back-end Structure
                       URL
                      server
                                             crawler
Crawler

                                  crawler
주소해석이 시간 많이 소요
-> 내부에 DNS cache 관리
                      crawler
                                                   Internet
Repository에 저장후
URL server가 다음주소 할당



                                Repository
Search Engine
Structure
Back-end Structure
                                                         docID   Sejong.ac.k
                                                          url         r
                                        <html>
                                                           1
                                        <head>
Creating Index                  <title>세종대학교</title>
                                        </body>
                                   <h1>학사정보<h1>
                                                                 세종대학교
                                                         Title
                                           ….
                                                         기타        …
Analyzing Web Page structures


DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

                                       DocIndex              URLlist
URLlist
– url을 key로 사용                    docID url title etc.     url docID
– docID를 가져오기 위함
Search Engine                           Lexicon
Structure
                                     word    wordID
Back-end Structure
                                     세종       101
                                                                      Barrels
                                     대학교      102
                                     학사       201
Creating Index                       정보       202


                                                         Barrels
                                     docID    wordID#1   Position#1   Size#1    Etc.#1
Word Index
                                                         Position#2   Size#2    Etc.#2

Lexicon                                       wordID#2   Position#1   Size#1    Etc.#1
 – word -> wordID
                                                         Position#2   Size#2    Etc.#2

                                                            …
Barrels
 – docID wordID position size etc.

Inverted Index
 – wordID를 Key로 사용
Search Engine
Structure
Back-end Structure


                                 docID    Sejong.ac.k
                                                               docID       3
Creating Index                    url          r
                                                                url    Cyworld.com
                                   1

                                                        Link

Link Index


URLlist
                                          URLlist
Links                                                                Links
                                 Sejong.ac.kr       1              1     3
                                 Cyworld.com        3
Anchortext
- A information of linked page
Search Engine
Structure
Back-end Structure



Creating Index



Ranking Index


Page Rank - Link
                       Web Page 사이의 link를 일종의 투표처럼 분석
                       -> 더 많은 link를 받은 문서 = 더 좋은 문서
Anchortext
Word       - Barrels
Search Engine
Structure
                      DocIndex
Index Structure


                       Lexicon

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용


Lexicon
– word -> wordID


                        Barrels
Barrels
– storages
Total Structure

User

         Index                   Back-end           Internet


                                  crawler
         DocIndex
Search
Server                            crawler

          Lexicon
                                  crawler

                     Structure
                                                         URL
                                                        server
                       word
         Barrels
          Barrels
           Barrels               Repository

                       Link
                                              URLlist

                     Ranking
                                    Links
Thanks for your attention
구글을지탱하는기술

More Related Content

Viewers also liked

구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술sid choi
 
Railsガイドを支える技術 (30分版)
Railsガイドを支える技術 (30分版)Railsガイドを支える技術 (30分版)
Railsガイドを支える技術 (30分版)Yohei Yasukawa
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술parkpaul
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4sid choi
 
구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable혜웅 박
 
H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래
H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래
H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래KTH
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술semi06
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술JooWan
 
구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색혜웅 박
 

Viewers also liked (10)

구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
Railsガイドを支える技術 (30分版)
Railsガイドを支える技術 (30分版)Railsガイドを支える技術 (30分版)
Railsガイドを支える技術 (30分版)
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4
 
구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable구글을 지탱하는 기술 요약 - Bigtable
구글을 지탱하는 기술 요약 - Bigtable
 
H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래
H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래
H3 2011 Google을 통해 살펴보는 분산 파일 시스템의 현재와 미래
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색구글을 지탱하는 기술 요약 - Google 검색
구글을 지탱하는 기술 요약 - Google 검색
 

Similar to 구글을지탱하는기술

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007ITDogadjaji.com
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleIE Group
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersDan Usher
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...Dan Usher
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overviewguru122
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overviewguestd9aa5
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersDan Usher
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Cengage Learning
 
Working With Rails
Working With RailsWorking With Rails
Working With RailsDali Wang
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013Stoney deGeyter
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your websitehernanibf
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices FinalMarianne Sweeny
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)Paul James
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Developmentkmloomis
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB SyncShaun Haber
 

Similar to 구글을지탱하는기술 (20)

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
Google
GoogleGoogle
Google
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Session6
Session6Session6
Session6
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
 

More from sid choi

벤치마킹
벤치마킹벤치마킹
벤치마킹sid choi
 
웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각sid choi
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5sid choi
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹sid choi
 
미코노미
미코노미미코노미
미코노미sid choi
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는sid choi
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 

More from sid choi (12)

벤치마킹
벤치마킹벤치마킹
벤치마킹
 
Meconomy
MeconomyMeconomy
Meconomy
 
웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹
 
미코노미
미코노미미코노미
미코노미
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

구글을지탱하는기술

  • 1.
  • 3. 구글을 지탱하는 기술 – chapter1.ppt
  • 4. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure
  • 5. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page
  • 6. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word
  • 7. Search Engine Structure Internet Search Engine
  • 8. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄
  • 9. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등
  • 10. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할
  • 11. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과
  • 12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…
  • 13. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository
  • 14. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함
  • 15. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용
  • 16. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page
  • 17. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels
  • 18. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages
  • 19. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links
  • 20. Thanks for your attention