Notation, Regular Expressions in Lexical Specification, Error Handling, Finite Automata State Graphs, Epsilon Moves, Deterministic and Non-Deterministic Automata, Table Implementation of a DFA
The purpose of types:
To define what the program should do.
e.g. read an array of integers and return a double
To guarantee that the program is meaningful.
that it does not add a string to an integer
that variables are declared before they are used
To document the programmer's intentions.
better than comments, which are not checked by the compiler
To optimize the use of hardware.
reserve the minimal amount of memory, but not more
use the most appropriate machine instructions.
This document discusses parsing and context-free grammars. It defines parsing as verifying that tokens generated by a lexical analyzer follow syntactic rules of a language using a parser. Context-free grammars are defined using terminals, non-terminals, productions and a start symbol. Top-down and bottom-up parsing are introduced. Techniques for grammar analysis and improvement like left factoring, eliminating left recursion, calculating first and follow sets are explained with examples.
The document discusses congestion control in computer networks. It defines congestion as occurring when the load on a network is greater than the network's capacity. Congestion control aims to control congestion and keep the load below capacity. The document outlines two categories of congestion control: open-loop control, which aims to prevent congestion; and closed-loop control, which detects congestion and takes corrective action using feedback from the network. Specific open-loop techniques discussed include admission control, traffic shaping using leaky bucket and token bucket algorithms, and traffic scheduling.
This produced by straight forward compiling algorithms made to run faster or less space or both. This improvement is achieved by program transformations that are traditionally called optimizations.compiler that apply-code improving transformation are called optimizing compilers.
Semantic nets were originally proposed in the 1960s as a way to represent the meaning of English words using nodes, links, and link labels. Nodes represent concepts, objects, or situations, links express relationships between nodes, and link labels specify particular relations. Semantic nets can represent data through examples, perform intersection searches to find relationships between objects, partition networks to distinguish individual from general statements, and represent non-binary predicates. While semantic nets provide a visual way to organize knowledge, they can have issues with inheritance and placing facts appropriately.
The network layer is responsible for routing packets from the source to destination. The routing algorithm is the piece of software that decides where a packet goes next (e.g., which output line, or which node on a broadcast channel).For connectionless networks, the routing decision is made for each datagram. For connection-oriented networks, the decision is made once, at circuit setup time.
Routing Issues
The routing algorithm must deal with the following issues:
Correctness and simplicity: networks are never taken down; individual parts (e.g., links, routers) may fail, but the whole network should not.
Stability: if a link or router fails, how much time elapses before the remaining routers recognize the topology change? (Some never do..)
Fairness and optimality: an inherently intractable problem. Definition of optimality usually doesn't consider fairness. Do we want to maximize channel usage? Minimize average delay?
When we look at routing in detail, we'll consider both adaptive--those that take current traffic and topology into consideration--and nonadaptive algorithms.
The document discusses three address code, which is an intermediate code used by optimizing compilers. Three address code breaks expressions down into separate instructions that use at most three operands. Each instruction performs an assignment or binary operation on the operands. The code is implemented using quadruple, triple, or indirect triple representations. Quadruple representation stores each instruction in four fields for the operator, two operands, and result. Triple avoids temporaries by making two instructions. Indirect triple uses pointers to freely reorder subexpressions.
The purpose of types:
To define what the program should do.
e.g. read an array of integers and return a double
To guarantee that the program is meaningful.
that it does not add a string to an integer
that variables are declared before they are used
To document the programmer's intentions.
better than comments, which are not checked by the compiler
To optimize the use of hardware.
reserve the minimal amount of memory, but not more
use the most appropriate machine instructions.
This document discusses parsing and context-free grammars. It defines parsing as verifying that tokens generated by a lexical analyzer follow syntactic rules of a language using a parser. Context-free grammars are defined using terminals, non-terminals, productions and a start symbol. Top-down and bottom-up parsing are introduced. Techniques for grammar analysis and improvement like left factoring, eliminating left recursion, calculating first and follow sets are explained with examples.
The document discusses congestion control in computer networks. It defines congestion as occurring when the load on a network is greater than the network's capacity. Congestion control aims to control congestion and keep the load below capacity. The document outlines two categories of congestion control: open-loop control, which aims to prevent congestion; and closed-loop control, which detects congestion and takes corrective action using feedback from the network. Specific open-loop techniques discussed include admission control, traffic shaping using leaky bucket and token bucket algorithms, and traffic scheduling.
This produced by straight forward compiling algorithms made to run faster or less space or both. This improvement is achieved by program transformations that are traditionally called optimizations.compiler that apply-code improving transformation are called optimizing compilers.
Semantic nets were originally proposed in the 1960s as a way to represent the meaning of English words using nodes, links, and link labels. Nodes represent concepts, objects, or situations, links express relationships between nodes, and link labels specify particular relations. Semantic nets can represent data through examples, perform intersection searches to find relationships between objects, partition networks to distinguish individual from general statements, and represent non-binary predicates. While semantic nets provide a visual way to organize knowledge, they can have issues with inheritance and placing facts appropriately.
The network layer is responsible for routing packets from the source to destination. The routing algorithm is the piece of software that decides where a packet goes next (e.g., which output line, or which node on a broadcast channel).For connectionless networks, the routing decision is made for each datagram. For connection-oriented networks, the decision is made once, at circuit setup time.
Routing Issues
The routing algorithm must deal with the following issues:
Correctness and simplicity: networks are never taken down; individual parts (e.g., links, routers) may fail, but the whole network should not.
Stability: if a link or router fails, how much time elapses before the remaining routers recognize the topology change? (Some never do..)
Fairness and optimality: an inherently intractable problem. Definition of optimality usually doesn't consider fairness. Do we want to maximize channel usage? Minimize average delay?
When we look at routing in detail, we'll consider both adaptive--those that take current traffic and topology into consideration--and nonadaptive algorithms.
The document discusses three address code, which is an intermediate code used by optimizing compilers. Three address code breaks expressions down into separate instructions that use at most three operands. Each instruction performs an assignment or binary operation on the operands. The code is implemented using quadruple, triple, or indirect triple representations. Quadruple representation stores each instruction in four fields for the operator, two operands, and result. Triple avoids temporaries by making two instructions. Indirect triple uses pointers to freely reorder subexpressions.
Artificial Intelligence: Introduction, Typical Applications. State Space Search: Depth Bounded
DFS, Depth First Iterative Deepening. Heuristic Search: Heuristic Functions, Best First Search,
Hill Climbing, Variable Neighborhood Descent, Beam Search, Tabu Search. Optimal Search: A
*
algorithm, Iterative Deepening A*
, Recursive Best First Search, Pruning the CLOSED and OPEN
Lists
The document discusses the branch and bound algorithm for solving the 15-puzzle problem. It describes the key components of branch and bound including live nodes, e-nodes, and dead nodes. It also defines the cost function used to evaluate nodes as the sum of the path length and number of misplaced tiles. The algorithm generates all possible child nodes from the current node and prunes the search tree by comparing node costs to avoid exploring subtrees without solutions.
The document discusses the three phases of analysis in compiling a source program:
1) Linear analysis involves grouping characters into tokens with collective meanings like identifiers and operators.
2) Hierarchical analysis groups tokens into nested structures with collective meanings like expressions, represented by parse trees.
3) Semantic analysis checks that program components fit together meaningfully through type checking and ensuring operators have permitted operand types.
Analogical reasoning is a powerful learning tool that involves abstracting structural similarities between problems to apply solutions from known problems to new ones. The process involves developing mappings between instances and retrieving, reusing, revising, and retaining experiences. Transformational analogy transforms a previous solution by making substitutions for the new problem, while derivational analogy considers the detailed problem-solving histories to apply analogies.
2. forward chaining and backward chainingmonircse2
This document provides an overview of forward chaining and backward chaining in artificial intelligence. Forward chaining applies inference rules to existing data to derive new facts until reaching a goal. It proceeds from known facts to a conclusion. Backward chaining starts from a goal and moves backward to determine what facts must be true to satisfy the goal. It uses modus ponens inference to break the goal into subgoals until reaching initial conditions. Examples are given to illustrate the difference between the two approaches.
The document discusses the key issues in designing a code generator, which is the final phase of a compiler that generates target code from an optimized intermediate representation. It outlines several important considerations for the code generator, including the input representation, memory management, instruction selection, register allocation, evaluation order, and different approaches to code generation. The overall goal is to generate correct and high-quality target code that runs efficiently.
This document discusses various heuristic search techniques, including generate-and-test, hill climbing, best first search, and simulated annealing. Generate-and-test involves generating possible solutions and testing them until a solution is found. Hill climbing iteratively improves the current state by moving in the direction of increased heuristic value until no better state can be found or a goal is reached. Best first search expands the most promising node first based on heuristic evaluation. Simulated annealing is based on hill climbing but allows moves to worse states probabilistically to escape local maxima.
This document discusses various techniques for optimizing computer code, including:
1. Local optimizations that improve performance within basic blocks, such as constant folding, propagation, and elimination of redundant computations.
2. Global optimizations that analyze control flow across basic blocks, such as common subexpression elimination.
3. Loop optimizations that improve performance of loops by removing invariant data and induction variables.
4. Machine-dependent optimizations like peephole optimizations that replace instructions with more efficient alternatives.
The goal of optimizations is to improve speed and efficiency while preserving program meaning and correctness. Optimizations can occur at multiple stages of development and compilation.
The document discusses issues in code generation by a compiler. It defines code generation as converting an intermediate representation into executable machine code. The code generator accesses symbol tables and performs multiple passes over intermediate forms. Key issues addressed include the input to the code generator, generating code for the target machine, memory management, instruction selection, register allocation, and optimization techniques like reordering independent instructions to improve efficiency.
GPSS is one of the earliest discrete event simulation languages developed in the 1960s. It uses a network of blocks to model systems, with each block performing a specific function. Transactions representing entities move through the blocks. Common blocks include Generate to create transactions, Queue to queue transactions, and Advance to impose delays. GPSS is not programmed like other languages but rather models the system as a network of interconnected blocks through which transactions flow.
The document discusses different types of parsing including:
1) Top-down parsing which starts at the root node and builds the parse tree recursively, requiring backtracking for ambiguous grammars.
2) Bottom-up parsing which starts at the leaf nodes and applies grammar rules in reverse to reach the start symbol using shift-reduce parsing.
3) LL(1) and LR parsing which are predictive parsing techniques using parsing tables constructed from FIRST and FOLLOW sets to avoid backtracking.
The document discusses different knowledge representation schemes used in artificial intelligence systems. It describes semantic networks, frames, propositional logic, first-order predicate logic, and rule-based systems. For each technique, it provides facts about how knowledge is represented and examples to illustrate their use. The goal of knowledge representation is to encode knowledge in a way that allows inferencing and learning of new knowledge from the facts stored in the knowledge base.
Three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves training a model using labeled input/output data where the desired outputs are provided, allowing the model to map inputs to outputs. Unsupervised learning involves discovering hidden patterns in unlabeled data and grouping similar data points together. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment by receiving rewards or punishments for actions.
Lexical Analysis, Tokens, Patterns, Lexemes, Example pattern, Stages of a Lexical Analyzer, Regular expressions to the lexical analysis, Implementation of Lexical Analyzer, Lexical analyzer: use as generator.
The document discusses various algorithms for achieving distributed mutual exclusion and process synchronization in distributed systems. It covers centralized, token ring, Ricart-Agrawala, Lamport, and decentralized algorithms. It also discusses election algorithms for selecting a coordinator process, including the Bully algorithm. The key techniques discussed are using logical clocks, message passing, and quorums to achieve mutual exclusion without a single point of failure.
This document discusses string matching algorithms. It defines string matching as finding a pattern within a larger text or string. It then summarizes two common string matching algorithms: the naive algorithm and Rabin-Karp algorithm. The naive algorithm loops through all possible shifts of the pattern and directly compares characters. Rabin-Karp also shifts the pattern but compares hash values of substrings first before checking individual characters to reduce comparisons. The document provides examples of how each algorithm works on sample strings.
A rule-based system uses predefined rules to make logical deductions and choices to perform automated actions. It consists of a database of rules representing knowledge, a database of facts as inputs, and an inference engine that controls the process of deriving conclusions by applying rules to facts. A rule-based system mimics human decision making by applying rules in an "if-then" format to incoming data to perform actions, but unlike AI it does not learn or adapt on its own.
The document provides information about regular expressions and finite automata. It discusses how regular expressions are used to describe programming language tokens. It explains how regular expressions map to languages and the basic operations used to build regular expressions like concatenation, alternation, and Kleene closure. The document also discusses deterministic finite automata (DFAs), non-deterministic finite automata (NFAs), and algorithms for converting regular expressions to NFAs and DFAs. It covers minimizing DFAs and using finite automata for lexical analysis in scanners.
This document provides an introduction to lexical analysis and the Lex tool. It discusses the phases of a compiler including lexical analysis. The objectives are to understand lexical analysis, the role of lexical analysis, and input buffering. It then reviews concepts like compilers, scanners, parsers, and grammars. The remainder of the document discusses the architecture of a lexical analyzer, the Lex tool, structure of Lex programs, finite automata, deterministic and nondeterministic automata, conversion of regular expressions to finite automata, and provides example code. Homework questions are also provided at the end.
Artificial Intelligence: Introduction, Typical Applications. State Space Search: Depth Bounded
DFS, Depth First Iterative Deepening. Heuristic Search: Heuristic Functions, Best First Search,
Hill Climbing, Variable Neighborhood Descent, Beam Search, Tabu Search. Optimal Search: A
*
algorithm, Iterative Deepening A*
, Recursive Best First Search, Pruning the CLOSED and OPEN
Lists
The document discusses the branch and bound algorithm for solving the 15-puzzle problem. It describes the key components of branch and bound including live nodes, e-nodes, and dead nodes. It also defines the cost function used to evaluate nodes as the sum of the path length and number of misplaced tiles. The algorithm generates all possible child nodes from the current node and prunes the search tree by comparing node costs to avoid exploring subtrees without solutions.
The document discusses the three phases of analysis in compiling a source program:
1) Linear analysis involves grouping characters into tokens with collective meanings like identifiers and operators.
2) Hierarchical analysis groups tokens into nested structures with collective meanings like expressions, represented by parse trees.
3) Semantic analysis checks that program components fit together meaningfully through type checking and ensuring operators have permitted operand types.
Analogical reasoning is a powerful learning tool that involves abstracting structural similarities between problems to apply solutions from known problems to new ones. The process involves developing mappings between instances and retrieving, reusing, revising, and retaining experiences. Transformational analogy transforms a previous solution by making substitutions for the new problem, while derivational analogy considers the detailed problem-solving histories to apply analogies.
2. forward chaining and backward chainingmonircse2
This document provides an overview of forward chaining and backward chaining in artificial intelligence. Forward chaining applies inference rules to existing data to derive new facts until reaching a goal. It proceeds from known facts to a conclusion. Backward chaining starts from a goal and moves backward to determine what facts must be true to satisfy the goal. It uses modus ponens inference to break the goal into subgoals until reaching initial conditions. Examples are given to illustrate the difference between the two approaches.
The document discusses the key issues in designing a code generator, which is the final phase of a compiler that generates target code from an optimized intermediate representation. It outlines several important considerations for the code generator, including the input representation, memory management, instruction selection, register allocation, evaluation order, and different approaches to code generation. The overall goal is to generate correct and high-quality target code that runs efficiently.
This document discusses various heuristic search techniques, including generate-and-test, hill climbing, best first search, and simulated annealing. Generate-and-test involves generating possible solutions and testing them until a solution is found. Hill climbing iteratively improves the current state by moving in the direction of increased heuristic value until no better state can be found or a goal is reached. Best first search expands the most promising node first based on heuristic evaluation. Simulated annealing is based on hill climbing but allows moves to worse states probabilistically to escape local maxima.
This document discusses various techniques for optimizing computer code, including:
1. Local optimizations that improve performance within basic blocks, such as constant folding, propagation, and elimination of redundant computations.
2. Global optimizations that analyze control flow across basic blocks, such as common subexpression elimination.
3. Loop optimizations that improve performance of loops by removing invariant data and induction variables.
4. Machine-dependent optimizations like peephole optimizations that replace instructions with more efficient alternatives.
The goal of optimizations is to improve speed and efficiency while preserving program meaning and correctness. Optimizations can occur at multiple stages of development and compilation.
The document discusses issues in code generation by a compiler. It defines code generation as converting an intermediate representation into executable machine code. The code generator accesses symbol tables and performs multiple passes over intermediate forms. Key issues addressed include the input to the code generator, generating code for the target machine, memory management, instruction selection, register allocation, and optimization techniques like reordering independent instructions to improve efficiency.
GPSS is one of the earliest discrete event simulation languages developed in the 1960s. It uses a network of blocks to model systems, with each block performing a specific function. Transactions representing entities move through the blocks. Common blocks include Generate to create transactions, Queue to queue transactions, and Advance to impose delays. GPSS is not programmed like other languages but rather models the system as a network of interconnected blocks through which transactions flow.
The document discusses different types of parsing including:
1) Top-down parsing which starts at the root node and builds the parse tree recursively, requiring backtracking for ambiguous grammars.
2) Bottom-up parsing which starts at the leaf nodes and applies grammar rules in reverse to reach the start symbol using shift-reduce parsing.
3) LL(1) and LR parsing which are predictive parsing techniques using parsing tables constructed from FIRST and FOLLOW sets to avoid backtracking.
The document discusses different knowledge representation schemes used in artificial intelligence systems. It describes semantic networks, frames, propositional logic, first-order predicate logic, and rule-based systems. For each technique, it provides facts about how knowledge is represented and examples to illustrate their use. The goal of knowledge representation is to encode knowledge in a way that allows inferencing and learning of new knowledge from the facts stored in the knowledge base.
Three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves training a model using labeled input/output data where the desired outputs are provided, allowing the model to map inputs to outputs. Unsupervised learning involves discovering hidden patterns in unlabeled data and grouping similar data points together. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment by receiving rewards or punishments for actions.
Lexical Analysis, Tokens, Patterns, Lexemes, Example pattern, Stages of a Lexical Analyzer, Regular expressions to the lexical analysis, Implementation of Lexical Analyzer, Lexical analyzer: use as generator.
The document discusses various algorithms for achieving distributed mutual exclusion and process synchronization in distributed systems. It covers centralized, token ring, Ricart-Agrawala, Lamport, and decentralized algorithms. It also discusses election algorithms for selecting a coordinator process, including the Bully algorithm. The key techniques discussed are using logical clocks, message passing, and quorums to achieve mutual exclusion without a single point of failure.
This document discusses string matching algorithms. It defines string matching as finding a pattern within a larger text or string. It then summarizes two common string matching algorithms: the naive algorithm and Rabin-Karp algorithm. The naive algorithm loops through all possible shifts of the pattern and directly compares characters. Rabin-Karp also shifts the pattern but compares hash values of substrings first before checking individual characters to reduce comparisons. The document provides examples of how each algorithm works on sample strings.
A rule-based system uses predefined rules to make logical deductions and choices to perform automated actions. It consists of a database of rules representing knowledge, a database of facts as inputs, and an inference engine that controls the process of deriving conclusions by applying rules to facts. A rule-based system mimics human decision making by applying rules in an "if-then" format to incoming data to perform actions, but unlike AI it does not learn or adapt on its own.
The document provides information about regular expressions and finite automata. It discusses how regular expressions are used to describe programming language tokens. It explains how regular expressions map to languages and the basic operations used to build regular expressions like concatenation, alternation, and Kleene closure. The document also discusses deterministic finite automata (DFAs), non-deterministic finite automata (NFAs), and algorithms for converting regular expressions to NFAs and DFAs. It covers minimizing DFAs and using finite automata for lexical analysis in scanners.
This document provides an introduction to lexical analysis and the Lex tool. It discusses the phases of a compiler including lexical analysis. The objectives are to understand lexical analysis, the role of lexical analysis, and input buffering. It then reviews concepts like compilers, scanners, parsers, and grammars. The remainder of the document discusses the architecture of a lexical analyzer, the Lex tool, structure of Lex programs, finite automata, deterministic and nondeterministic automata, conversion of regular expressions to finite automata, and provides example code. Homework questions are also provided at the end.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
This document discusses non-deterministic finite automata (NFAs). It defines NFAs as recognizers for languages that can have multiple transition actions for the same input symbol. It describes Thompson's construction algorithm for building an NFA from a regular expression in steps, introducing at most two new states per operation. It also explains how to simulate an NFA on an input string by computing the epsilon-closure of states after reading each symbol.
Introduction to the theory of computationprasadmvreddy
This document provides an introduction and overview of topics in the theory of computation including automata, computability, and complexity. It discusses the following key points in 3 sentences:
Automata theory, computability theory, and complexity theory examine the fundamental capabilities and limitations of computers. Different models of computation are introduced including finite automata, context-free grammars, and Turing machines. The document then provides definitions and examples of regular languages and context-free grammars, the basics of finite automata and regular expressions, properties of regular languages, and limitations of finite state machines.
The document discusses lexical analysis and lexical analyzer generators. It begins by explaining that lexical analysis separates a program into tokens, which simplifies parser design and implementation. It then covers topics like token attributes, patterns and lexemes, regular expressions for specifying patterns, converting regular expressions to nondeterministic finite automata (NFAs) and then deterministic finite automata (DFAs). The document provides examples and algorithms for these conversions to generate a lexical analyzer from token specifications.
The document discusses lexical analysis in compiler design. It covers the role of the lexical analyzer, tokenization, and representation of tokens using finite automata. Regular expressions are used to formally specify patterns for tokens. A lexical analyzer generator converts these specifications into a finite state machine (FSM) implementation to recognize tokens in the input stream. The FSM is typically a deterministic finite automaton (DFA) for efficiency, even though a nondeterministic finite automaton (NFA) may require fewer states.
This document discusses regular languages and finite automata. It begins by defining regular languages and expressions, and describing the equivalence between non-deterministic finite automata (NFAs) and deterministic finite automata (DFAs). It then discusses converting between regular expressions, NFAs with epsilon transitions, NFAs without epsilon transitions, and DFAs. The document provides examples of regular expressions and conversions between different representations. It concludes by describing the state elimination, formula, and Arden's methods for converting a DFA to a regular expression.
The document discusses lexical analysis, which is the first stage of syntax analysis for programming languages. It covers terminology, using finite automata and regular expressions to describe tokens, and how lexical analyzers work. Lexical analyzers extract lexemes from source code and return tokens to the parser. They are often implemented using finite state machines generated from regular grammar descriptions of the lexical patterns in a language.
The document discusses the design of lexical analyzer generators and state minimization of deterministic finite automata (DFAs). It covers the process of converting regular expressions to nondeterministic finite automata (NFAs) and then converting NFAs to equivalent DFAs. The key steps are: (1) defining an NFA for each regular expression operator, (2) simulating the NFA to obtain the states of the equivalent DFA, where each DFA state represents a set of NFA states, and (3) constructing the DFA transitions based on the reachable NFA states. Implementing the minimized DFA as a table makes lexical analysis very efficient.
This document summarizes Chapter 5 of a textbook on nondeterministic finite automata (NFAs). It discusses how NFAs relax the requirement that deterministic finite automata (DFAs) have exactly one transition from every state on every input symbol. NFAs allow multiple transitions from a state on a single symbol, making them more flexible but also nondeterministic. The chapter defines key concepts like spontaneous transitions, nondeterminism, the 5-tuple representation of an NFA, and how the language an NFA accepts is defined in terms of all possible sequences of transitions. It provides examples to illustrate these concepts.
recognizer for a language, Deterministic finite automata, Non-deterministic finite automata, conversion of NFA to DFA, Regular Expression to NFA, Thomsons Construction
Lexical analysis is the first phase of compilation where the character stream is converted to tokens. It must be fast. It separates concerns by having a scanner handle tokenization and a parser handle syntax trees. Regular expressions are used to specify patterns for tokens. A regular expression specification can be converted to a finite state automaton and then to a deterministic finite automaton to build a scanner that efficiently recognizes tokens.
The document discusses three methods to optimize DFAs: 1) directly building a DFA from a regular expression, 2) minimizing states, and 3) compacting transition tables. It provides details on constructing a direct DFA from a regular expression by building a syntax tree and calculating first, last, and follow positions. It also describes minimizing states by partitioning states into accepting and non-accepting groups and compacting transition tables by representing them as lists of character-state pairs with a default state.
Automata theory deals with logic of computation using simple machines called automata. Automata enables understanding how machines compute functions and solve problems. The main concepts are strings, languages, finite automata, regular expressions, and regular grammars. Finite automata recognize patterns in input strings and transition between states, accepting or rejecting strings. Deterministic finite automata (DFAs) uniquely transition to one state for each input, while non-deterministic finite automata (NFAs) can transition to multiple states. Regular expressions and regular grammars also define regular languages recognized by finite automata.
This document discusses lexical analysis using finite automata. It begins by defining regular expressions, finite automata, and their components. It then covers non-deterministic finite automata (NFAs) and deterministic finite automata (DFAs), and how NFAs can recognize the same regular languages as DFAs. The document outlines the process of converting a regular expression to an NFA using Thompson's construction, then converting the NFA to a DFA using subset construction. It also discusses minimizing DFAs using Hopcroft's algorithm. Examples are provided to illustrate each concept.
Lexical analysis involves breaking input text into tokens. It is implemented using regular expressions to specify token patterns and a finite automaton to recognize tokens in the input stream. Lex is a tool that allows specifying a lexical analyzer by defining regular expressions for tokens and actions to perform on each token. It generates code to simulate the finite automaton for token recognition. The generated lexical analyzer converts the input stream into tokens by matching input characters to patterns defined in the Lex source program.
Similaire à Implementation of lexical analyser (20)
1) Data transfer instructions move data without changing it between memory and registers, between registers, and between registers and input/output devices. Common instructions include load, store, move, input, and output.
2) Data manipulation instructions perform operations on data to provide computational capabilities. These include arithmetic instructions like add and subtract, logical and bitwise instructions like AND and OR, and shift instructions.
3) Program control instructions alter program flow, like branches, jumps, calls, and returns. They use status bits set by operations to determine conditional branches. Subroutines use call and return instructions to branch to and from the main program.
Instruction formats can come in different types. R. Archana is an assistant professor in the computer science department at SACWC. She thanks the reader for their time.
The document discusses the role and process of lexical analysis using LEX. LEX is a tool that generates a lexical analyzer from regular expression rules. A LEX source program consists of auxiliary definitions for tokens and translation rules that match regular expressions to actions. The lexical analyzer created by LEX reads input one character at a time and finds the longest matching prefix, executes the corresponding action, and places the token in a buffer.
This document discusses minimizing the number of states in a deterministic finite automaton (DFA). It begins with an introduction to DFAs, defining their components. It then explains that states can be redundant and provides an example of two equivalent DFAs where one state is unnecessary. The task of DFA minimization is to automatically transform a given DFA into an equivalent state-minimized DFA. An algorithm for DFA minimization is described that works by partitioning states into groups based on whether their transitions lead to the same groups for each input symbol, and merging states that are not distinct. Examples are provided to illustrate the algorithm. In conclusion, it notes that DFA minimization is useful for applications like regular expression matching and compiler optimizations, and that the
The document discusses MapReduce, a programming model for distributed computing. It describes how MapReduce works like a Unix pipeline to efficiently process large amounts of data in parallel across clusters of computers. Key aspects covered include mappers and reducers, locality optimizations, input/output formats, and tools like counters, compression, and partitioners that can improve performance. An example word count program is provided to illustrate how MapReduce jobs are defined and executed.
This document discusses business intelligence and related topics. It begins by defining key terms like business analytics, BI, big data, and data mining. It then explains that businesses need support for decision making due to uncertainties and competition. The document outlines characteristics of good data for decision making and describes data mining as finding patterns in large datasets. It provides examples of BI applications and initiatives and discusses how the field of BI has evolved with the rise of data warehousing and data marts. Finally, it briefly covers some common data mining techniques like market basket analysis and cluster analysis.
The document provides an overview of Hadoop Distributed File System (HDFS):
1) HDFS is designed to reliably store very large files across commodity servers and is optimized for batch processing of huge datasets. It distributes data across clusters in a fault-tolerant way to handle hardware failures.
2) HDFS has a master/slave architecture with a NameNode that manages file system metadata and DataNodes that store file data blocks. Files are broken into blocks and replicated across DataNodes for reliability.
3) The NameNode tracks metadata like file locations and DataNode statuses, while DataNodes store and retrieve blocks. HDFS provides a unified namespace and facilitates reliable and high-throughput access to data.
R is an open source programming language and software environment for statistical computing and graphics. It is widely used among statisticians and data miners for developing statistical software and tools. R was originally developed in the late 1980s and is currently maintained by an international volunteer team. The presentation introduces basic R syntax for performing calculations, storing results, creating and manipulating data vectors and matrices, and defining functions.
This document discusses different types of decision making and branching statements in C programming, including if, switch, conditional operator, and goto statements. It focuses on explaining the if statement in more detail. The if statement allows for conditional execution of code blocks depending on the evaluation of a test expression. There are several types of if statements including simple if, if-else, nested if-else, and else-if ladder statements. Flowcharts and examples are provided to illustrate the syntax and logic flow for each type of if statement.
Unguided media, also known as wireless transmission, transmits electromagnetic waves without using a physical medium. It includes three main categories: radio waves, which transmit signals omnidirectionally and are used for applications like FM radio; microwaves, which transmit focused beams between aligned antennas and are used for terrestrial and satellite communication; and infrared, which supports short-range communication between devices in closed areas. Each type has advantages like coverage area but also disadvantages such as susceptibility to environmental interference.
The document discusses different types of computer memory. It describes cache memory as very high speed memory between the CPU and main memory used to store frequently accessed data and programs. Primary/main memory is volatile semiconductor memory that holds currently running programs and data. RAM and ROM are types of main memory. RAM is read/write memory that stores data temporarily while power is on, while ROM is read-only memory that permanently stores basic input/output instructions. The document outlines characteristics and types of each memory including static RAM, dynamic RAM, programmable ROM, erasable programmable ROM, and electrically erasable programmable ROM.
The life cycle of a thread involves several stages: being born when created, started, running, and eventually dying. A thread can be in one of five states: newborn, runnable, running, blocked, or dead. Creating a thread involves implementing the Runnable interface, instantiating a Thread object with the Runnable object, and calling the start() method.
This document provides an overview of advanced Java programming concepts covered in the unit, including object-oriented programming, data types, variables, arrays, operators, inheritance, and control statements. It defines key concepts like classes, objects, encapsulation, polymorphism, and inheritance. For data types, it covers primitive types like int, float, boolean and char, as well as arrays. Operators covered include unary, arithmetic, relational, logical, and assignment operators. The document also discusses variables, arrays, and control statements like selection, iteration, and jump statements.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Diana Rendina
Librarians are leading the way in creating future-ready citizens – now we need to update our spaces to match. In this session, attendees will get inspiration for transforming their library spaces. You’ll learn how to survey students and patrons, create a focus group, and use design thinking to brainstorm ideas for your space. We’ll discuss budget friendly ways to change your space as well as how to find funding. No matter where you’re at, you’ll find ideas for reimagining your space in this session.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
1. COMPILER DESIGN
Myself Archana R
Assistant Professor In
Department Of Computer Science
SACWC.
I am here because I love to give
presentations.
IMPLEMENTATION OF LEXICAL ANALYZER
4. Notation
• For convenience, we use a variation (allow user- defined abbreviations)
in regular expression notation.
• Union: A + B A | B
• Option: A + A?
• Range: ‘a’+’b’+…+’z’ [a-z]
• Excluded range:
complement of [a-
z]
[^a-z]
5. Regular Expressions in Lexical Specification
• Last lecture: a specification for the predicate,
s L(R)
• But a yes/no answer is not enough !
• Instead: partition the input into tokens.
• We will adapt regular expressions to this goal.
6. Regular Expressions Lexical Spec. (1)
1. Select a set of tokens
• Integer, Keyword, Identifier, OpenPar, ...
2. Write a regular expression (pattern) for the lexemes of
each token
• Integer = digit +
• Keyword = ‘if’ + ‘else’ + …
• Identifier = letter (letter + digit)*
• OpenPar = ‘(‘
• …
7. Regular Expressions Lexical Spec. (2)
3. Construct R, matching all lexemes for all tokens
R = Keyword + Identifier + Integer + …
= R1 + R2 + R3 + …
Facts: If s L(R) then s is a lexeme
– Furthermore s L(Ri) for some “i”
– This “i” determines the token that is reported
8. Regular Expressions Lexical Spec. (3)
4. Let input be x1…xn
• (x1 ... xn arecharacters)
• For 1 i n check
x1…xi L(R) ?
5. It must be thatt
x1…xi L(Rj) for some j
(if there is a choice, pick a smallest such j)
6. Remove x1…xi from input and go to previous step
9. How to Handle Spaces and Comments?
1. We could create a token Whitespace
Whitespace = (‘ ’ + ‘n’ + ‘t’)+
– We could also add comments in there
– An input “ tn 5555 “ is transformed into Whitespace Integer Whitespace
2. Lexer skips spaces (preferred)
• Modify step 5 from before as follows:
It must be that xk ... xi L(Rj) for some j such that x1 ... xk-1 L(Whitespace)
• Parser is not bothered with spaces
10. Ambiguities (1)
• There are ambiguities in the algorithm
• How much input is used? What if
• x1…xi L(R) and also
• x1…xK L(R)
–Rule: Pick the longest possible substring
–The “maximal munch”
11. Ambiguities (2)
• Which token is used? What if
• x1…xi L(Rj) and also
• x1…xi L(Rk)
– Rule: use rule listed first (j if j < k)
• Example:
–R1 = Keyword and R2 = Identifier
–“if” matches both
–Treats “if” as a keyword not an identifier
12. Error Handling
• What if
No rule matches a prefix of input ?
• Problem: Can’t just get stuck …
• Solution:
–Write a rule matching all “bad” strings
–Put it last
• Lexer tools allow the writing of:
R = R1 + ... + Rn +Error
–Token Error matches if nothing else matches
13. Regular Languages & FiniteAutomata
Basic formal language theory result:
Regular expressions and finite automata both define the class of regular languages.
Thus, we are going to use:
• Regular expressions for specification
• Finite automata for implementation (automatic generation of lexical analyzers)
14. FiniteAutomata
• A finite automaton is a recognizer for the strings of a regular language
A finite automaton consists of
–A finite input alphabet
–A set of states S
–A start state n
–A set of accepting states F S
–A set of transitions state input state
15. • Transition
s1 a s2
• Is read
In state s1 on input “a” go to states2
• If end of input (or no transition possible)
–If in accepting state accept
–Otherwise reject
16. Finite Automata State Graphs
• A state
a
• The start state
• An accepting state
• A transition
17. A Simple Example
• A finite automaton that accepts only “1”
1
Another Simple Example
• A finite automaton accepting any number of 1’s
followed by a single 0
• Alphabet: {0,1}
1
0
18. And Another Example
• Alphabet {0,1}
• What language does this recognize?
1
1
1
0
0 0
• Alphabet still { 0, 1 }
1
1
• The operation of the automaton is not completely
defined by the input
– On input “11” the automaton could be in either state
19. Epsilon Moves
• Another kind of transition: -moves
A B
• Machine can move from state A to state B without
reading input
20. Deterministic and Non-Deterministic Automata
• Deterministic Finite Automata (DFA)
–One transition per input per state
–No - moves
• Non-deterministic Finite Automata (NFA)
–Can have multiple transitions for one input in a given state
–Can have - moves
• Finite automata have finite memory
–Enough to only encode the current state
21. Execution of FiniteAutomata
• A DFA can take only one path through the state graph
– Completely determined by input
• NFAs can choose
–Whether to make -moves
–Which of multiple transitions for a single input to take.
22. Acceptance of NFAs
• An NFA can get into multiple states
1
0
0
1
• Input: 1 0
1
• Rule: NFA accepts an input if it can get in a final state
23. NFA vs. DFA (1)
• NFAs and DFAs recognize the same set of languages
(regular languages)
• DFAs are easier to implement
– There are no choices to consider
24. NFA vs. DFA (2)
• For a given language the NFA can be simpler than the DFA
NF
A
1
0
0
0
DF
A
0
1
1
1
0 0
• DFA can be exponentially larger than NFA
25. Regular Expressions to FiniteAutomata
• High-level sketch
NFA
Regular expressions DFA
Lexical Specification Table-driven Implementation of
DFA
26. Regular Expressions to NFA(1)
• For each kind of reg. expr, define an NFA
– Notation: NFA for regular expression M
• For
• For input a
M
a
29. The Trick of NFA to DFA
• Simulate the NFA
• Each state of DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through -moves from NFA start state
• Add a transition S a S’ to DFA iff
–S’ is the set of NFA states reachable from any state in S after seeing the input a
•considering -moves as well
30. IMPLEMENTATION
• A DFA can be implemented by a 2D table T
–One dimension is “states”
–Other dimension is “input symbols”
–For every transition Si a Sk define T[i,a] =k
• DFA “execution”
–If in state Si and input a, read T[i,a] = k and skip to state Sk
–Very efficient
32. Implementation (Cont.)
• NFA → DFA conversion is at the heart of tools such
as lex, ML-Lex or flex.
• But, DFAs can be huge.
• In practice, lex/ML-Lex/flex-like tools trade off
speed for space in the choice of NFA and DFA
representations.
33. DFA and Lexer
Two differences:
• DFAs recognize lexemes. A lexer must return a type of acceptance (token
type) rather than simply an accept/reject indication.
• DFAs consume the complete string and accept or reject it. A lexer must
find the end of the lexeme in the input stream and then find the next one,
etc.