SlideShare une entreprise Scribd logo
1  sur  32
Google Code Search
‫ارائه‬‫دهنده‬:‫زویچی‬ ‫منا‬
Why code Search?
Large amounts of source code is added consistently
online.
Documentation of the source code is generally not
attached to it.
Unorganized and distributed among different sources
Many versions of software systems, needed for
similarity analysis
Cont..
Enormous source code (Github alone has
approximately 10 million projects).
Very large and complex.
Most of the query systems available online; use
either keyword or meta Information based search.
Not easy to search and analyze.
What did CodeSearch do for programmers?
The CodeSearch service was a unique tool as it indexed open source code in the wild.
Codesearch is one of the most valuable tools in existence for all software developers,
specifically:
 When an API is poorly documented, you could find sample bits of code that used the API.
 When an API error codes was poorly documented, you could find sample bits of code that
handled it.
 When an API was difficult to use (and the world is packed with those), you could find sample
bits of code that used it.
 When you quickly wanted to learn a language, you knew you could find quality code with
simple searches.
 When you wanted to find different solutions to everyday problems dealing with protocols,
new specifications, evolving standards and trends. You could turn to CodeSearch.
Cont..
 When you were faced with an obscure error message, an obscure token, an obscure
return value or other forms of poor coding, you would find sample bits of code that
solved this problem.
 When dealing with proprietary protocols or just poorly documented protocols, you
could find how they worked in minutes.
 When you were trying to debug yet another broken standard or yet another poorly
specified standard, you knew you could turn quickly to CodeSearch to find the
answers to your problems (memories of OAuth and IMAP flash in my head).
 When learning a new programming language or trying to improve your skills on a
new programming language, you could use CodeSearch to learn the idioms and the
best (and worst practices).
 When building a new version of a library, either in a new language, making a fluent
version, making an open source version, building a more complete version you would
just go to Codesearch to find answers to how other people did things.
Google Code Search
Developer(s) Google
Initial release October 5, 2006
Development status Discontinued
Operating system Any (web-based application)
Type Code search engine
Website http://www.google.com/cod
esearch(archived version
from 2010)
Google Code Search
 Features included the ability to
search using
operators,namely lang:, package:,
license: and file:.
 The code available for searching was in
various formats including tar.gz, .tar.bz2,
.tar, and .zip , CVS , Subversion
,git and Mercurial repositories.
How Google Code Search Worked
Introduction
 Code Search was Google's first and only search engine to accept regular
expression queries, which was geekily great but a very small niche.When we
started Code Search, a Google search for “regular expression search engine”
turned up sites where you typed “phone number” and got back “(d{3})
d{3}-d{4}”.
 Google open sourced the regular expression engine I wrote for Code
Search, RE2, in March 2010. Code Search and RE2 have been a great vehicle
for educating people about how to do regular expression search safely.
Regular expression
in theoretical computer science a sequence of characters that define
a search pattern. Usually this pattern is then used by string searching
algorithms for "find" or "find and replace" operations on strings.
Basic concepts
A regular expression, often called a pattern, is an expression used to specify
a set of strings required for a particular purpose.
For example, the set containing the three strings "Handel", "Händel", and
"Haendel" can be specified by the pattern
H(ä|ae?)ndel ;
we say that this pattern matches each of the three strings.
Indexed Word Search
o The key data structure is called a posting list or inverted index, which lists, for every possible search
term, the documents that contain that term.
consider these three very short documents:
1) Google Code Search
2) Google Code Project Hosting
3) Google Web Search
o The inverted index for these three documents looks like:
Code: {1, 2}
Google: {1, 2, 3}
Hosting: {2}
Project: {2}
Search: {1, 3}
Web: {3}
Cont..
o To support phrases, full-text search implementations usually record each occurrence of a word
in the posting list, along with its position:
 An alternate way to support phrases is to treat them as AND queries to identify a set of candidate
documents and then filter out non-matching documents after loading the document bodies from disk.
In practice, phrases built out of common words like “to be or not to be” make this approach
unattractive. Storing the position information in the index entries makes the index bigger but avoids
loading a document from disk unless it is guaranteed to be a match.
Code: {(1, 2), (2, 2)}
Google: {(1, 1), (2, 1), (3, 1)}
Hosting: {(2, 4)}
Project: {(2, 3)}
Search: {(1, 3), (3, 4)}
Web: {(3, 2)}
Indexed Regular Expression Search
 we can use an old information retrieval trick and build an index of n-grams, substrings of
length n
o the document set:
(1) Google Code Search
(2) Google Code Project Hosting
(3) Google Web Search
o has this trigram index:
_Co: {1, 2} Sea: {1, 3} e_W: {3} ogl: {1, 2, 3} _Ho: {2} Web: {3} ear: {1, 3} oje: {2} _Pr: {2} arc: {1, 3}
eb_: {3} oog: {1, 2, 3} _Se: {1, 3} b_S: {3} ect: {2} ost: {2} _We: {3} ct_: {2} gle: {1, 2, 3} rch: {1, 3}
Cod: {1, 2} de_: {1, 2} ing: {2} roj: {2} Goo: {1, 2, 3} e_C: {1, 2} jec: {2} sti: {2} Hos: {2} e_P: {2}
le_: {1, 2, 3} t_H: {2} Pro: {2} e_S: {1} ode: {1, 1} tin: {2}
Cont..
Trigram index
_Co: {1, 2} Sea: {1, 3} e_W: {3} ogl: {1, 2, 3} _Ho: {2} Web: {3} ear: {1, 3} oje: {2} _Pr: {2} arc: {1, 3}
eb_: {3} oog: {1, 2, 3} _Se: {1, 3} b_S: {3} ect: {2} ost: {2} _We: {3} ct_: {2} gle: {1, 2, 3} rch: {1, 3}
Cod: {1, 2} de_: {1, 2} ing: {2} roj: {2} Goo: {1, 2, 3} e_C: {1, 2} jec: {2} sti: {2} Hos: {2} e_P: {2}
le_: {1, 2, 3} t_H: {2} Pro: {2} e_S: {1} ode: {1, 1} tin: {2}
oGiven a regular expression such as /Google.*Search/, we can build a query of ANDs
and ORs that gives the trigrams that must be present in any text matching the regular
expression. In this case, the query is
Goo AND oog AND ogl AND gle AND Sea AND ear AND arc AND rch
Cont..
o The rules follow from the meaning of the regular expressions:
‘’ (empty string)
emptyable(‘’) = true
exact(‘’) = {‘’}
prefix(‘’) = {‘’}
suffix(‘’) = {‘’}
match(‘’) =
ANY (special
query: match
all documents)
c (single character)
emptyable(c) = false
exact(c) = {c}
prefix(c) = {c}
suffix(c) = {c}
match(c) = ANY
e? (zero or one)
emptyable(e?) = true
exact(e?) = exact(e) ∪ {‘’}
prefix(e?) = {‘’}
suffix(e?) = {‘’}
match(e?) = ANY
e* (zero or more)
emptyable(e*) = true
exact(e*) = unknown
prefix(e*) = {‘’}
suffix(e*) = {‘’}
match(e*) = ANY
Cont..
e+ (one or more)
emptyable(e+) = emptyable(e)
exact(e+) = unknown
prefix(e+) = prefix(e)
suffix(e+) = suffix(e)
match(e+) = match(e)
e1 | e2 (alternation)
emptyable(e1 | e2) =
emptyable(e1) or
emptyable(e2)
exact(e1 | e2) = exact(e1) ∪ exact(e2)
prefix(e1 | e2) = prefix(e1) ∪ prefix(e2)
suffix(e1 | e2) = suffix(e1) ∪ suffix(e2)
match(e1 | e2) = match(e1) OR match(e2)
e1 e2 (concatenation)
emptyable(e1e2) = emptyable(e1) and emptyable(e2)
exact(e1e2) = exact(e1) × exact(e2), if both are known
or unknown, otherwise
prefix(e1e2) = exact(e1) × prefix(e2), if exact(e1) is known
or prefix(e1) ∪ prefix(e2), if emptyable(e1)
or prefix(e1), otherwise
suffix(e1e2) = suffix(e1) × exact(e2), if exact(e2) is known
or suffix(e2) ∪ suffix(e1), if emptyable(e2)
or suffix(e2), otherwise
match(e1e2) = match(e1) AND match(e2)
Cont..
Single string
•Trigram(ab)=ANY
•Trigram(abc)=abc
•Trigram(abcd)=abc AND bcd
Set of strings
•Trigram({ab})=trigram(ab)=ab
•Trigram({abcd})=trigram(abcd)
•Trigram({ab,abcd})=trigram(ab) OR trigram(abcd)
At any time, set match(e) = match(e) AND trigrams(prefix(e)).
At any time, set match(e) = match(e) AND trigrams(suffix(e)).
At any time, set match(e) = match(e) AND trigrams(exact(e)).
Implementation
Cont..
Discontinuation
In October 2011, Google announced that Code Search was to be shut down along with the
Code Search API. The service remained online until March 2013, and it now returns a 404.
The Best Alternatives to Google Code for
Your Programming Projects
GitHub is the juggernaut in this arena, obviously, and the
web's most popular code repository.
Well known to nearly everyone who deals in the world of code, GitHub looks to help
developers build software through collaboration. As the “world’s largest open source
community,” GitHub allows users to share their projects “with the world, get feedback, and
contribute to millions of repositories.” What some developers may not know is that GitHub
also offers private repositories with upgraded plans.
GitHub
Key Features:
 Review changes, comment on lines of code, report issues, and plan with discussion tools
 Use organization accounts to communicate easily with teams
 Integration with several applications and tools
 Field-tested tools for any project, public or private
 Integrated issue tracking
 Use your go-to SVN tools to checkout, branch, and commit to GitHub repositories
CodePlex
CodePlex is Microsoft’s free open source project hosting site. With CodePlex,
users can create, share, collaborate and download from the project to the
software phase.
Key Features:
Source code control
Project discussions
Wiki pages
Feature/issue tracking
Cost: FREE
BitBucket
Bitbucket, from Atlassian, offers unlimited private code repositories for Git or
Mercurial. Offering lightweight code review, Bitbucket is one of the most
popular source code repository hosts out there.
Key Features:
Built with small teams in mind, so you can consolidate sure management,
invite members, and share repositories
Review changes on a fork or branch easily with pull requests
In-line comments allow users to have discussions within the source code
Track every commit to an issue in JIRA
General information
Name Manager Established Server side:
all Free
software
Client side:
All-free JS
code
Developed
and/or used
CDE
Require free
software on
registration
Ad-free notes
Bitbucket Atlassian 2008 No No Unknown No Yes
Denies
service to
Cuba, Iran,
North Korea,
Sudan, Syria
GitHub GitHub, Inc 2008-04 No No Unknown No Yes
List of
government
takedown
requests
CodePlex Microsoft 2006-05 No Unknown Unknown No Yes
Project must
be OSS
licensed
Features
Name Code
Revie
w
Bug
Trackin
g
Web
Hostin
g
Wiki Transla
tion
System
Shell
server
Mailin
g List
Forum Person
al
Branch
Private
Branch
Annou
nce
Build
Sysye
m
Team Releas
e
Binarie
s
Self-
hostin
g
Bitbuc
ket
Yes Yes Yes Yes No No No No Yes Yes No No Yes Yes
Comm
ercially
(Stash)
GitHub Yes Yes Yes Yes No No No No Yes Yes Yes
3rd-
party
(e.g. Tr
avis CI,
Appve
yor
and
others)
Yes Yes
Comm
ercially
(GitHu
b
Enterp
rise)
CodePl
ex
No Yes No Yes No No Yes Yes No No No No No Yes No
Popularity
Name Users Projects Alex rank
Bitbucket Unknown Unknown 834 as of 22 June 2016
CodePlex Unknown 107,712
2,689 as of 22 June
2016
GitHub 15,000,000 38,000,000 53 as of 19 August 2016
Google Code Unknown 250,000+
N/A (subdomain not
tracked)
Available version control systems
Name CVS Git Mercurial SVN Bazaar TFS Arch Perforce Fossil
Bitbucket No Yes Yes No No No No No No
CodePlex No Yes Yes Yes No Yes No No No
GitHub No Yes No Partial No No No No No
References
[1] https://en.wikipedia.org/wiki/Google_Code_Search
[2] https://blog.profitbricks.com/top-source-code-repository-hosts/
[3] http://lifehacker.com/the-best-alternatives-to-google-code-for-your-programmi-1691688947
[4] http://tirania.org/blog/archive/2011/Nov-29.html
[5] https://en.wikipedia.org/wiki/Regular_expression
[6] https://swtch.com/~rsc/regexp/regexp4.html
[7] https://web.archive.org/web/20101112131244/http://www.google.com//codesearch
[8] http://swtch.com/~rsc/regexp/regexp1.html

Contenu connexe

Tendances

Search engines coh m
Search engines coh mSearch engines coh m
Search engines coh m
cpcmattc
 
Slides
SlidesSlides
Slides
butest
 
Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014
Fasihul Kabir
 
RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang Spec
Jing Kang
 

Tendances (17)

BD-ACA week5
BD-ACA week5BD-ACA week5
BD-ACA week5
 
Text analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco ControlText analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco Control
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Boolean operators
Boolean operatorsBoolean operators
Boolean operators
 
Python cheat-sheet
Python cheat-sheetPython cheat-sheet
Python cheat-sheet
 
Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006
 
Search engines coh m
Search engines coh mSearch engines coh m
Search engines coh m
 
seo tutorial
seo tutorialseo tutorial
seo tutorial
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation Pipelines
 
Slides
SlidesSlides
Slides
 
Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014
 
RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang Spec
 

Similaire à Google code search

SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
Alex Sumner
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)
Ravi Okade
 
This project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdfThis project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdf
eyewaregallery
 

Similaire à Google code search (20)

Tools to Find Source Code on the Web
Tools to Find Source Code on the WebTools to Find Source Code on the Web
Tools to Find Source Code on the Web
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
Improving your team’s source code searching capabilities
Improving your team’s source code searching capabilitiesImproving your team’s source code searching capabilities
Improving your team’s source code searching capabilities
 
Improving your team's source code searching capabilities - Voxxed Thessalonik...
Improving your team's source code searching capabilities - Voxxed Thessalonik...Improving your team's source code searching capabilities - Voxxed Thessalonik...
Improving your team's source code searching capabilities - Voxxed Thessalonik...
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
.gradle 파일 정독해보기
.gradle 파일 정독해보기.gradle 파일 정독해보기
.gradle 파일 정독해보기
 
Google Searchology
Google SearchologyGoogle Searchology
Google Searchology
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
 
Introduction to programming using c
Introduction to programming using cIntroduction to programming using c
Introduction to programming using c
 
Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)
 
Mufix Network Programming Lecture
Mufix Network Programming LectureMufix Network Programming Lecture
Mufix Network Programming Lecture
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 
This project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdfThis project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdf
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Dernier (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Google code search

  • 2. Why code Search? Large amounts of source code is added consistently online. Documentation of the source code is generally not attached to it. Unorganized and distributed among different sources Many versions of software systems, needed for similarity analysis
  • 3. Cont.. Enormous source code (Github alone has approximately 10 million projects). Very large and complex. Most of the query systems available online; use either keyword or meta Information based search. Not easy to search and analyze.
  • 4. What did CodeSearch do for programmers? The CodeSearch service was a unique tool as it indexed open source code in the wild. Codesearch is one of the most valuable tools in existence for all software developers, specifically:  When an API is poorly documented, you could find sample bits of code that used the API.  When an API error codes was poorly documented, you could find sample bits of code that handled it.  When an API was difficult to use (and the world is packed with those), you could find sample bits of code that used it.  When you quickly wanted to learn a language, you knew you could find quality code with simple searches.  When you wanted to find different solutions to everyday problems dealing with protocols, new specifications, evolving standards and trends. You could turn to CodeSearch.
  • 5. Cont..  When you were faced with an obscure error message, an obscure token, an obscure return value or other forms of poor coding, you would find sample bits of code that solved this problem.  When dealing with proprietary protocols or just poorly documented protocols, you could find how they worked in minutes.  When you were trying to debug yet another broken standard or yet another poorly specified standard, you knew you could turn quickly to CodeSearch to find the answers to your problems (memories of OAuth and IMAP flash in my head).  When learning a new programming language or trying to improve your skills on a new programming language, you could use CodeSearch to learn the idioms and the best (and worst practices).  When building a new version of a library, either in a new language, making a fluent version, making an open source version, building a more complete version you would just go to Codesearch to find answers to how other people did things.
  • 6. Google Code Search Developer(s) Google Initial release October 5, 2006 Development status Discontinued Operating system Any (web-based application) Type Code search engine Website http://www.google.com/cod esearch(archived version from 2010) Google Code Search  Features included the ability to search using operators,namely lang:, package:, license: and file:.  The code available for searching was in various formats including tar.gz, .tar.bz2, .tar, and .zip , CVS , Subversion ,git and Mercurial repositories.
  • 7.
  • 8. How Google Code Search Worked Introduction  Code Search was Google's first and only search engine to accept regular expression queries, which was geekily great but a very small niche.When we started Code Search, a Google search for “regular expression search engine” turned up sites where you typed “phone number” and got back “(d{3}) d{3}-d{4}”.  Google open sourced the regular expression engine I wrote for Code Search, RE2, in March 2010. Code Search and RE2 have been a great vehicle for educating people about how to do regular expression search safely.
  • 9. Regular expression in theoretical computer science a sequence of characters that define a search pattern. Usually this pattern is then used by string searching algorithms for "find" or "find and replace" operations on strings. Basic concepts A regular expression, often called a pattern, is an expression used to specify a set of strings required for a particular purpose. For example, the set containing the three strings "Handel", "Händel", and "Haendel" can be specified by the pattern H(ä|ae?)ndel ; we say that this pattern matches each of the three strings.
  • 10. Indexed Word Search o The key data structure is called a posting list or inverted index, which lists, for every possible search term, the documents that contain that term. consider these three very short documents: 1) Google Code Search 2) Google Code Project Hosting 3) Google Web Search o The inverted index for these three documents looks like: Code: {1, 2} Google: {1, 2, 3} Hosting: {2} Project: {2} Search: {1, 3} Web: {3}
  • 11. Cont.. o To support phrases, full-text search implementations usually record each occurrence of a word in the posting list, along with its position:  An alternate way to support phrases is to treat them as AND queries to identify a set of candidate documents and then filter out non-matching documents after loading the document bodies from disk. In practice, phrases built out of common words like “to be or not to be” make this approach unattractive. Storing the position information in the index entries makes the index bigger but avoids loading a document from disk unless it is guaranteed to be a match. Code: {(1, 2), (2, 2)} Google: {(1, 1), (2, 1), (3, 1)} Hosting: {(2, 4)} Project: {(2, 3)} Search: {(1, 3), (3, 4)} Web: {(3, 2)}
  • 12. Indexed Regular Expression Search  we can use an old information retrieval trick and build an index of n-grams, substrings of length n o the document set: (1) Google Code Search (2) Google Code Project Hosting (3) Google Web Search o has this trigram index: _Co: {1, 2} Sea: {1, 3} e_W: {3} ogl: {1, 2, 3} _Ho: {2} Web: {3} ear: {1, 3} oje: {2} _Pr: {2} arc: {1, 3} eb_: {3} oog: {1, 2, 3} _Se: {1, 3} b_S: {3} ect: {2} ost: {2} _We: {3} ct_: {2} gle: {1, 2, 3} rch: {1, 3} Cod: {1, 2} de_: {1, 2} ing: {2} roj: {2} Goo: {1, 2, 3} e_C: {1, 2} jec: {2} sti: {2} Hos: {2} e_P: {2} le_: {1, 2, 3} t_H: {2} Pro: {2} e_S: {1} ode: {1, 1} tin: {2}
  • 13. Cont.. Trigram index _Co: {1, 2} Sea: {1, 3} e_W: {3} ogl: {1, 2, 3} _Ho: {2} Web: {3} ear: {1, 3} oje: {2} _Pr: {2} arc: {1, 3} eb_: {3} oog: {1, 2, 3} _Se: {1, 3} b_S: {3} ect: {2} ost: {2} _We: {3} ct_: {2} gle: {1, 2, 3} rch: {1, 3} Cod: {1, 2} de_: {1, 2} ing: {2} roj: {2} Goo: {1, 2, 3} e_C: {1, 2} jec: {2} sti: {2} Hos: {2} e_P: {2} le_: {1, 2, 3} t_H: {2} Pro: {2} e_S: {1} ode: {1, 1} tin: {2} oGiven a regular expression such as /Google.*Search/, we can build a query of ANDs and ORs that gives the trigrams that must be present in any text matching the regular expression. In this case, the query is Goo AND oog AND ogl AND gle AND Sea AND ear AND arc AND rch
  • 14. Cont.. o The rules follow from the meaning of the regular expressions: ‘’ (empty string) emptyable(‘’) = true exact(‘’) = {‘’} prefix(‘’) = {‘’} suffix(‘’) = {‘’} match(‘’) = ANY (special query: match all documents) c (single character) emptyable(c) = false exact(c) = {c} prefix(c) = {c} suffix(c) = {c} match(c) = ANY e? (zero or one) emptyable(e?) = true exact(e?) = exact(e) ∪ {‘’} prefix(e?) = {‘’} suffix(e?) = {‘’} match(e?) = ANY e* (zero or more) emptyable(e*) = true exact(e*) = unknown prefix(e*) = {‘’} suffix(e*) = {‘’} match(e*) = ANY
  • 15. Cont.. e+ (one or more) emptyable(e+) = emptyable(e) exact(e+) = unknown prefix(e+) = prefix(e) suffix(e+) = suffix(e) match(e+) = match(e) e1 | e2 (alternation) emptyable(e1 | e2) = emptyable(e1) or emptyable(e2) exact(e1 | e2) = exact(e1) ∪ exact(e2) prefix(e1 | e2) = prefix(e1) ∪ prefix(e2) suffix(e1 | e2) = suffix(e1) ∪ suffix(e2) match(e1 | e2) = match(e1) OR match(e2) e1 e2 (concatenation) emptyable(e1e2) = emptyable(e1) and emptyable(e2) exact(e1e2) = exact(e1) × exact(e2), if both are known or unknown, otherwise prefix(e1e2) = exact(e1) × prefix(e2), if exact(e1) is known or prefix(e1) ∪ prefix(e2), if emptyable(e1) or prefix(e1), otherwise suffix(e1e2) = suffix(e1) × exact(e2), if exact(e2) is known or suffix(e2) ∪ suffix(e1), if emptyable(e2) or suffix(e2), otherwise match(e1e2) = match(e1) AND match(e2)
  • 16. Cont.. Single string •Trigram(ab)=ANY •Trigram(abc)=abc •Trigram(abcd)=abc AND bcd Set of strings •Trigram({ab})=trigram(ab)=ab •Trigram({abcd})=trigram(abcd) •Trigram({ab,abcd})=trigram(ab) OR trigram(abcd) At any time, set match(e) = match(e) AND trigrams(prefix(e)). At any time, set match(e) = match(e) AND trigrams(suffix(e)). At any time, set match(e) = match(e) AND trigrams(exact(e)).
  • 19. Discontinuation In October 2011, Google announced that Code Search was to be shut down along with the Code Search API. The service remained online until March 2013, and it now returns a 404.
  • 20. The Best Alternatives to Google Code for Your Programming Projects GitHub is the juggernaut in this arena, obviously, and the web's most popular code repository. Well known to nearly everyone who deals in the world of code, GitHub looks to help developers build software through collaboration. As the “world’s largest open source community,” GitHub allows users to share their projects “with the world, get feedback, and contribute to millions of repositories.” What some developers may not know is that GitHub also offers private repositories with upgraded plans.
  • 21. GitHub Key Features:  Review changes, comment on lines of code, report issues, and plan with discussion tools  Use organization accounts to communicate easily with teams  Integration with several applications and tools  Field-tested tools for any project, public or private  Integrated issue tracking  Use your go-to SVN tools to checkout, branch, and commit to GitHub repositories
  • 22.
  • 23.
  • 24. CodePlex CodePlex is Microsoft’s free open source project hosting site. With CodePlex, users can create, share, collaborate and download from the project to the software phase. Key Features: Source code control Project discussions Wiki pages Feature/issue tracking Cost: FREE
  • 25.
  • 26. BitBucket Bitbucket, from Atlassian, offers unlimited private code repositories for Git or Mercurial. Offering lightweight code review, Bitbucket is one of the most popular source code repository hosts out there. Key Features: Built with small teams in mind, so you can consolidate sure management, invite members, and share repositories Review changes on a fork or branch easily with pull requests In-line comments allow users to have discussions within the source code Track every commit to an issue in JIRA
  • 27.
  • 28. General information Name Manager Established Server side: all Free software Client side: All-free JS code Developed and/or used CDE Require free software on registration Ad-free notes Bitbucket Atlassian 2008 No No Unknown No Yes Denies service to Cuba, Iran, North Korea, Sudan, Syria GitHub GitHub, Inc 2008-04 No No Unknown No Yes List of government takedown requests CodePlex Microsoft 2006-05 No Unknown Unknown No Yes Project must be OSS licensed
  • 29. Features Name Code Revie w Bug Trackin g Web Hostin g Wiki Transla tion System Shell server Mailin g List Forum Person al Branch Private Branch Annou nce Build Sysye m Team Releas e Binarie s Self- hostin g Bitbuc ket Yes Yes Yes Yes No No No No Yes Yes No No Yes Yes Comm ercially (Stash) GitHub Yes Yes Yes Yes No No No No Yes Yes Yes 3rd- party (e.g. Tr avis CI, Appve yor and others) Yes Yes Comm ercially (GitHu b Enterp rise) CodePl ex No Yes No Yes No No Yes Yes No No No No No Yes No
  • 30. Popularity Name Users Projects Alex rank Bitbucket Unknown Unknown 834 as of 22 June 2016 CodePlex Unknown 107,712 2,689 as of 22 June 2016 GitHub 15,000,000 38,000,000 53 as of 19 August 2016 Google Code Unknown 250,000+ N/A (subdomain not tracked)
  • 31. Available version control systems Name CVS Git Mercurial SVN Bazaar TFS Arch Perforce Fossil Bitbucket No Yes Yes No No No No No No CodePlex No Yes Yes Yes No Yes No No No GitHub No Yes No Partial No No No No No
  • 32. References [1] https://en.wikipedia.org/wiki/Google_Code_Search [2] https://blog.profitbricks.com/top-source-code-repository-hosts/ [3] http://lifehacker.com/the-best-alternatives-to-google-code-for-your-programmi-1691688947 [4] http://tirania.org/blog/archive/2011/Nov-29.html [5] https://en.wikipedia.org/wiki/Regular_expression [6] https://swtch.com/~rsc/regexp/regexp4.html [7] https://web.archive.org/web/20101112131244/http://www.google.com//codesearch [8] http://swtch.com/~rsc/regexp/regexp1.html