SlideShare une entreprise Scribd logo
1  sur  102
Télécharger pour lire hors ligne
Matt Ellis

@citizenmattHow to parse a file
DON’T
@citizenmatt
Why would we write a parser?
• Speed, efficiency
• Reduce dependencies
• Custom or simple formats
• Things that aren’t files - DSLs

Command line options, HTTP headers, stdout, natural language commands

E.g. YouTrack queries
• When we’re just as interested in the structure of a file

as its contents
Matt Ellis
Developer advocate

JetBrains

@citizenmatt
@citizenmatt
@citizenmatt
PSI
Features
Project Model
Base Platform
JetBrains IDE
architecture (kinda)
@citizenmatt
Unity and ShaderLab
@citizenmatt
What are we trying to build?
@citizenmatt
How to parse a file for an IDE
@citizenmatt
Hand rolled parser
var	c	=	ReadChar();

switch	(c)	{

		case	's':

				c	=	ReadChar();

				switch	(c)	{

						case	'h':

								//	Parse	rest	of	"Shader",	then	sub-elements,	…

								//	Create	syntax	tree	node(s)	…

								break;	
						default:

								SyntaxError();

								break;

				}

				break;	
		case	'p':

				//	Parse	rest	of	"Properties",	then	sub-elements,	…

				//	Create	syntax	tree	node(s)	…

				break;

}
@citizenmatt
Back endFront end
Compiler pipeline
Lexical
analysis
Syntactic
analysis
Semantic
analysis
Code
optimisation
Code
generation
@citizenmatt
IDE pipeline
Lexical
analysis
Syntactic
analysis
Semantic
analysis
@citizenmatt
IDE pipeline
Parser
Program
structureLexer
@citizenmatt
Lexers
@citizenmatt
What is a lexer (aka scanner)?
• Performs lexical analysis

Lexical - relating to the words or vocabulary of a language
• Converts a string into a stream of tokens

Identifier, comment, string literal, braces, parentheses, whitespace, etc.
• Tokens are lightweight - typically integer values

(ReSharper uses singleton object instances)
• Parser pattern matches over tokens

Integer or object reference comparisons
@citizenmatt
Lexer output
//	Colored	vertex	lighting	
Shader	"MyShader"	
{	
		//	a	single	color	property	
		Properties	{	
				_Color	("Main	Color",	Color)	=	(1,	.5,.5,1)	
		}	
		//	define	one	subshader	
		SubShader	
		{	
				//	a	single	pass	in	our	subshader	
				Pass	
				{	
						Material	
						{	
								Diffuse	[_Color]	
						}	
						Lighting	On	
				}	
		}	
}	
0000:	END_OF_LINE_COMMENT	'//	Colored	vertex	lighting'	
0026:	NEW_LINE	'rn'	
0028:	SHADER_KEYWORD	'Shader'	
0034:	WHITESPACE	'	'	
0035:	STRING_LITERAL	'"MyShader"'	
0045:	NEW_LINE	'rn'	
0047:	LBRACE	'{'	
0048:	NEW_LINE	'rn'	
0050:	WHITESPACE	'		'	
0052:	END_OF_LINE_COMMENT	'//	a	single	color	property'	
0078:	NEW_LINE	'rn'	
0080:	WHITESPACE	'		'	
0082:	PROPERTIES_KEYWORD	'Properties'	
0092:	WHITESPACE	'	'	
0093:	LBRACE	'{'	
0094:	NEW_LINE	'rn'	
0096:	WHITESPACE	'				'	
0100:	IDENTIFIER	'_Color'	
0106:	WHITESPACE	'	'	
0107:	LPAREN	'('	
0108:	STRING_LITERAL	'"Main	Color"'	
0120:	COMMA	','	
0121:	WHITESPACE	'	'	
0122:	COLOR_KEYWORD	'Color'	
0127:	RPAREN	')'	
0128:	WHITESPACE	'	'	
0129:	EQUALS	'='	
0130:	WHITESPACE	'	'	
0131:	LPAREN	'('	
…
@citizenmatt
Lexers are a solved problem
Use a lexer generator

lex (1975), flex, CsLex, FsLex, JFLex, etc.
@citizenmatt
Anatomy of a lexer input file
User code (e.g. using directives)
%%	
directives

set up namespaces, class names, interfaces

declare regex macros

declare states
%%	
rules and actions

<state> rule { action }
@citizenmatt
ShaderLab lexer
Demo
@citizenmatt
How does it work?
• Lexer generates source code
• Rules (regexes) converted into single Finite State Machine

All regexes combined, matched at same time
• Encoded in state transition tables
• Lookup based on state and input char
• Very fast
• Not very maintainable

Seriously
@citizenmatt
a(b|c)d*e+
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
It gets better
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
@citizenmatt
Parsing
@citizenmatt
What is a parser?
• Performs syntactic analysis

Verifies and matches syntax of a file
• Pattern matching on stream of tokens from lexer

Can look at token offsets and text, too
• Syntax is described by a grammar
• Grammar is represented as a recursive hierarchy of rules

Top level is the whole file, composing down to structures and tokens
@citizenmatt
Example grammar
shaderFile:

		SHADER_KEYWORD

		STRING_LITERAL

		LBRACE

		propertiesBlock?

		tagsBlock?

		…

		RBRACE

;	
propertiesBlock:

		PROPERTIES_KEYWORD

		LBRACE

		property*

		RBRACE

;	
tagsBlock:

		TAGS_KEYWORD

		LBRACE

		tag*

		RBRACE

;
Shader	"MyShader"

{

		Properties	{	…	}

		Tags	{	…	}

		…

}
@citizenmatt
Parsing is NOT a solved problem
Well, it is, kinda. There are just lots of solutions
@citizenmatt
Types of parsers
• Top down/recursive descent

Match the root of the tree, recursively split up into child elements
• Bottom up/recursive ascent

Start with matching the leaves of the tree, combine into larger
constructs as you go
@citizenmatt
Top down parser
parseShaderLabFile()

		parseShaderCommand()

				match(SHADER_KEYWORD)

				parseShaderValue()

						parseShaderValueName()

								match(STRING_LITERAL)

						match(LBRACE)

						if	(tokenType	==	PROPERTIES_KEYWORD)

								parsePropertiesCommand()

						…

						match(RBRACE)
@citizenmatt
Bottom up parser
Shift/Reduce algorithm
Match token

Shift token onto stack (e.g. INTEGER, OP_PLUS, INTEGER)

Reduce larger construct (e.g. INTEGER + INTEGER becomes EXPRESSION)
@citizenmatt
Building a parser
• Hand rolled

Mechanical process to build. Easy to understand

Usually top down/recursive descent

Can use grammar to build syntax tree classes
• Parser generators

yacc/bison, ANTLR, etc.

Usually bottom up. Can be hard to debug - table driven
• ReSharper mostly uses top-down procedural parsers

Generated and hand rolled

Mainly historical. Easier to maintain, easier error recovery, etc.
@citizenmatt
Parser combinators
• Build a parser by combining other, simpler parsers
• Monads!

Think linq - similar idea, similar ease of use, similar cost
@citizenmatt
FParsec for F#
//	pstring	-	parse	a	string

//	pfloat	-	parse	a	float

//	spaces1	-	parse	one	or	more	whitespace	chars

		

let	pforward	=	(pstring	"fd"	<|>	pstring	“forward”)	>>.	spaces1	>>.	pfloat	
															|>>	fun	n	->	Forward(int	n)	
let	pleft	=	(pstring	"left"	<|>	pstring	"lt")	>>.	spaces1	>>.	pfloat		
												|>>	fun	x	->	Left(int	-x)	
let	pright	=	(pstring	"right"	<|>	pstring	"right")	>>.	spaces1	>>.	pfloat		
													|>>	fun	x	->	Right(int	x)	
let	pcommand	=	pforward	<|>	pleft	<|>	pright
Phil Trelford - http://trelford.com/blog/post/FParsec.aspx
@citizenmatt
Sprache for C#
Parser<string>	identifier	=	
				from	leading	in	Parse.WhiteSpace.Many()	
				from	first	in	Parse.Letter.Once()	
				from	rest	in	Parse.LetterOrDigit.Many()	
				from	trailing	in	Parse.WhiteSpace.Many()	
				select	new	string(first.Concat(rest).ToArray());	
var	id	=	identifier.Parse("	abc123		");	
Assert.AreEqual("abc123",	id);
@citizenmatt
Problem #1
Whitespace and comments
@citizenmatt
We’d expect this to work:
shaderBlock:

		SHADER_KEYWORD

		STRING_LITERAL

		LBRACE

		…

		RBRACE

;
Shader	"MyShader"

{

		…

}
@citizenmatt
But this is the actual input…
Shaderrn

··········"MyShader"rn

·······n

/*	Cool	shader!	*/n

{···…········}rn
@citizenmatt
Which lexes as…
SHADER_KEYWORD

NEW_LINE

WHITESPACE

STRING_LITERAL

NEW_LINE

WHITESPACE

NEW_LINE

COMMENT

NEW_LINE

LBRACE

WHITESPACE

…

WHITESPACE

RBRACE
Shaderrn

··········"MyShader"rn

·······n

/*	Cool	shader!	*/n

{···…········}rn
@citizenmatt
Which doesn’t match the grammar
shaderBlock:

		SHADER_KEYWORD

		STRING_LITERAL

		LBRACE

		…

		RBRACE

;
SHADER_KEYWORD

NEW_LINE

WHITESPACE

STRING_LITERAL

NEW_LINE

WHITESPACE

NEW_LINE

COMMENT

NEW_LINE

LBRACE

WHITESPACE

…

WHITESPACE

RBRACE
@citizenmatt
• Filter whitespace and comments from the stream of tokens

ReSharper’s tokens have IsFiltered property
• Decorator pattern

Wrap original lexer, swallow filtered tokens
Filtering lexers
Filtering
lexer
Lexer
Parser
Program
structure
@citizenmatt
What are we building?
Is it safe to lose the whitespace?
@citizenmatt
IDE requirements, Part 1
• Code editor features

Syntax highlighting, code folding, etc.
• Syntax error highlighting
• Inspections
• Refactoring
• Formatting
• Etc.
@citizenmatt
IDE requirements, Part 1
• Need to work with the contents and structure of a file
• Contents give us semantic information
• Structure allows us to report inspections, refactor, etc.

Map the semantics back to the file
• Need to represent the structure of the file
• Syntax tree is obvious choice

Inspections walk the tree, refactorings rewrite the tree
@citizenmatt
Abstract Syntax Trees
1
+
2 3
+ 1
+
5
6
= =
@citizenmatt
Concrete Parse Trees
2 WS
+
WS 3
// …
+
NL1
WS
@citizenmatt
Side problem #1
No guidance for designing parse trees!
@citizenmatt
Back to Filtering Lexers
• If we filter tokens out, we have to add them back again
• We need a Missing Tokens Inserter to add whitespace
and comments back into parse tree
Filtering
lexer
Lexer
Parser
Concrete
parse tree
Missing
tokens
inserter
@citizenmatt
Missing Tokens Inserter
• Walk leaf elements of tree

Tokens
• Advances (cached) lexer for each leaf element
• Check current lexer token has same offset as leaf
element
• If not, create leaf element and insert into tree
@citizenmatt
Problem #2
What about significant whitespace?
@citizenmatt
How do we parse this?
There are no end of scope markers!

And we’ve filtered out the whitespace!
let	ArraySample()	=	
		let	numLetters	=	26	
		let	results	=	Array.create	numLetters	0	
		let	data	=	"The	quick	brown	fox"	
		for	i	=	0	to	data.Length	-	1	do	
				let	c	=	data.Chars(i)	
				let	c	=	Char.ToUpper(c)	
				if	c	>=	'A'	&&	c	<=	'Z'	then	
						let	i	=	Char.code	c	-	Char.code	'A'	
						results.[i]	<-	results.[i]	+	1	
		printf	"done!n"
@citizenmatt
Insert zero-width tokens
• Another lexer decorator
• Keeps track of whitespace before it’s filtered
• Inserts “invisible” tokens into token stream

indicating indent/outdent or block start/end

Possibly also token to indicate invalid indentation
• Token is zero-width. Doesn’t affect parse tree
• Parser can match these invisible tokens in grammar
@citizenmatt
Lexer flexibility
It’s just nice to say
@citizenmatt
Altering tokens
• F# example: 2. and [2..0] ambiguous
• Original lexer matches 2. as FLOAT 

and 2.. as INT_DOT_DOT
• Another lexer decorator

Augment generated rules with custom code
• Decorator recognises INT_DOT_DOT 

Splits into two tokens for parser
@citizenmatt
When regexes aren’t enough
• ShaderLab nested comments
• Not possible to match with regex

Don’t even try
• Rule to match start of comment - /*

Finish lexing by hand, counting start and end comment chars

Ignore START_COMMENT and return different token - COMMENT
• It doesn’t have to be completely machine generated
/*	This	/*	is	*/	valid	*/
@citizenmatt
Problem #3
Pre-processor tokens
@citizenmatt
Pre-processor tokens
• Pre-processor tokens can
appear anywhere
• How do you add them to
the grammar/parser?
• ShaderLab has CGPROGRAM
and CGINCLUDE which are
essentially pre-processor
tokens
• (Also nested language - Cg)
@citizenmatt
Parsing pre-processor tokens
• Two pass parsing
• First pass parses pre-processor tokens
• Filtering lexer strips pre-processor tokens
• Parse normally
• Parsed pre-processor tree nodes inserted as missing
tokens
Parsing pre-processor tokens
Including

pre-processor
tokens
Filtering
lexer
Lexer
Parser
Concrete
parse
tree
Missing
tokens
inserter
Pre-processor
parser
Filtering
lexer
@citizenmatt
Problem #4
IDEs impose constraints
@citizenmatt
IDE Requirements, Part 2
• Error highlighting

The code is broken every time you type
• Incremental lexing + parsing

Performance
• Version tolerance

E.g. multiple versions of C#
• Nested/composable languages
@citizenmatt
Problem #5
Error handling
@citizenmatt
Error handling
@citizenmatt
Error handling is more of an art than a science
@citizenmatt
What happens when there’s an error?
• The parser adds an error element into the tree
• Error element spans whatever has been parsed so far

Might just be unexpected token, or incorrect element construct
• Highlighting the error in the editor is trivial

Inspection simply looks for error element, adds highlight
@citizenmatt
How do we find an error?
• Error start is obvious

mismatched rule, unexpected token
• Where does the error stop?

Off by one token could affect rest of file
• IDE must try to recover

How?
@citizenmatt
Error recovery
• Panic mode

Eat tokens until finds a “follows” token
• Token insertion/removal/substitution
• Error rules in grammar
@citizenmatt
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color)	=	(1,1,1,1)

				_PropName	SyntaxErrorPanicMode	=	(1,1,1,1)

				_Recovered("Real2",	Color)	=	(1,	1,	1,	1)

		}

}
Panic mode
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color)	=	(1,1,1,1)

				_PropName	SyntaxError	_AttemptedRecovery	=	(1,1,1,1)

				_Recovered("Real2",	Color)	=	(1,	1,	1,	1)

		}

}
@citizenmatt
• Expected RPAREN got EQUALS

Assume RPAREN missing (insert it), EQUALS matches, continue
Token insertion
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color	=	(1,1,1,1)

		}

}
@citizenmatt
• Token insertion fails

Inserting EQUALS doesn’t sync back up
• Expected EQUALS got extra RPAREN

Skip RPAREN (remove it), EQUALS matches, continue
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color))	=	(1,1,1,1)

		}

}
Token removal
@citizenmatt
Error production rules
• Create a rule that anticipates an error
• E.g. consume any tokens that shouldn’t be there
emptyBlock:

		LBRACE

		errorElementWithoutRBrace*	
		RBRACE

	;
@citizenmatt
Problem #6
Incremental lexing and parsing
@citizenmatt
What’s the problem?
• Don’t parse entire file on every change
• Only reparse smallest subtree that encloses change

Block nodes (method bodies, classes, etc. Not if, for, etc.)
• Avoid re-lexing the entire file, too
@citizenmatt
Incremental lexing
• Requires a cache of the original token stream

Token type, offsets and state of lexer (int)
• Copy cached tokens up to change position
• Restart lexer at change position with known state from
cache
• Lex until we can match tail of cached tokens
@citizenmatt
Incremental parsing
• Walk up syntax tree, find nearest element that can
reparse and that encompasses change

E.g. method/class body
• Find start of block

E.g. opening LBRACE ‘{‘
• Use updated cached lexer to find end of block

E.g. closing RBRACE ‘}’
• Parse block, add new element into tree

Uses custom entry point into parser
@citizenmatt
Problem #7
Composable languages
@citizenmatt
Three types
• Injected languages

E.g. self-contained islands in a string literal (regex)
• Inherited languages

E.g. TypeScript is a superset of JavaScript
• Nested languages

E.g. JavaScript/CSS nested inside HTML. Razor and C#
@citizenmatt
Injected languages
• Build a parse tree for the contents of another node

E.g. ShaderLab CG_PROGRAM, regular expressions, …
• Provides syntax highlighting, code completion, etc.
• Attaches a new parse tree to the node of another tree
• Changes to injected tree persisted to string and pushed
as change to the owning tree
• Changes to owning tree cause full reparse of injected
language
@citizenmatt
Inherited languages
• E.g. TypeScript is a superset of JavaScript
• TypeScriptParser derives from JavaScriptParser

Share a lexer
• Custom hand rolled parsers

Recursive descent
• Easier to inherit and override key methods

Gang of Four Template pattern
• Also XamlParser, MSBuildParser, WebConfigParser

Custom XML parsers
@citizenmatt
Nested languages
• E.g. .aspx, .cshtml - HTML superset, with C# “islands”
• ReSharper parses .aspx/.cshtml file

Builds parse tree for ASPX/Razor syntax
• HTML superset requires lexer superset
• HtmlCompoundLexer lexes “outer” language’s tokens

When encounters HTML, switches to standard HTML lexer
• How to handle C# islands?
@citizenmatt
Secondary documents
• ASPX/Razor - C# islands
• Create secondary in-memory C# file

Mirrors what gets generated when .aspx file is compiled
• Maps C# islands in .aspx to in-memory C# file
• Inspections, code completion, etc. work through the
mapping
@citizenmatt
How do you parse a file?
@citizenmatt
DON’T
@citizenmatt
Links
https://github.com/JetBrains/resharper-unity
Generating Fast, Error Recovering Parsers

http://www.dtic.mil/dtic/tr/fulltext/u2/a196581.pdf
Effective and Comfortable Error Recovery in Recursive Descent Parsers

http://www.cocolab.com/products/cocktail/doc.pdf/ell.pdf
The Definitive ANTLR4 Reference - Terrence Parr

Contenu connexe

Similaire à How to Parse a File (DDD North 2017)

Everything is composable
Everything is composableEverything is composable
Everything is composableVictor Igor
 
input output Organization
input output Organizationinput output Organization
input output OrganizationAcad
 
A Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsA Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsSandra Long
 
AiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAjayRawat829497
 
Deductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functionsDeductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functionsDenis Efremov
 
CBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCECBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCEGautham Rajesh
 
C Programming Interview Questions
C Programming Interview QuestionsC Programming Interview Questions
C Programming Interview QuestionsGradeup
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Grace Yang
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone? DVClub
 
Gate Previous Years Papers
Gate Previous Years PapersGate Previous Years Papers
Gate Previous Years PapersRahul Jain
 
Embedded SW Interview Questions
Embedded SW Interview Questions Embedded SW Interview Questions
Embedded SW Interview Questions PiTechnologies
 
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfLDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfVedant Gavhane
 
Junaid program assignment
Junaid program assignmentJunaid program assignment
Junaid program assignmentJunaid Ahmed
 

Similaire à How to Parse a File (DDD North 2017) (20)

Everything is composable
Everything is composableEverything is composable
Everything is composable
 
input output Organization
input output Organizationinput output Organization
input output Organization
 
A Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsA Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring Problems
 
AiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdf
 
Deductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functionsDeductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functions
 
Automatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge ExpressionAutomatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge Expression
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Placement paper
Placement paperPlacement paper
Placement paper
 
CBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCECBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCE
 
B010430814
B010430814B010430814
B010430814
 
C Programming Interview Questions
C Programming Interview QuestionsC Programming Interview Questions
C Programming Interview Questions
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone?
 
Gate Previous Years Papers
Gate Previous Years PapersGate Previous Years Papers
Gate Previous Years Papers
 
Embedded SW Interview Questions
Embedded SW Interview Questions Embedded SW Interview Questions
Embedded SW Interview Questions
 
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfLDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
 
1
11
1
 
Junaid program assignment
Junaid program assignmentJunaid program assignment
Junaid program assignment
 

Plus de citizenmatt

Rider - Taking ReSharper out of Process
Rider - Taking ReSharper out of ProcessRider - Taking ReSharper out of Process
Rider - Taking ReSharper out of Processcitizenmatt
 
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)citizenmatt
 
.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchestercitizenmatt
 
.NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016).NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016)citizenmatt
 
.NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016).NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016)citizenmatt
 
.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UGcitizenmatt
 
.Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015).Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015)citizenmatt
 
C# 6.0 - DotNetNotts
C# 6.0 - DotNetNottsC# 6.0 - DotNetNotts
C# 6.0 - DotNetNottscitizenmatt
 
What's New in ReSharper 9?
What's New in ReSharper 9?What's New in ReSharper 9?
What's New in ReSharper 9?citizenmatt
 

Plus de citizenmatt (9)

Rider - Taking ReSharper out of Process
Rider - Taking ReSharper out of ProcessRider - Taking ReSharper out of Process
Rider - Taking ReSharper out of Process
 
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
 
.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester
 
.NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016).NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016)
 
.NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016).NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016)
 
.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG
 
.Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015).Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015)
 
C# 6.0 - DotNetNotts
C# 6.0 - DotNetNottsC# 6.0 - DotNetNotts
C# 6.0 - DotNetNotts
 
What's New in ReSharper 9?
What's New in ReSharper 9?What's New in ReSharper 9?
What's New in ReSharper 9?
 

Dernier

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

How to Parse a File (DDD North 2017)