SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
Static analysis and regular expressions
Author: Andrey Karpov

Date: 09.12.2010

I develop the PVS-Studio static code analyzer intended for analyzing C/C++ software. After we
implemented general analysis in PVS-Studio 4.00, we received a lot of responses, both positive and
negative. By the way, you are welcome to download a new version of PVS-Studio where we have fixed a
lot of errors and defects thanks to users who told us about them.

While discussing PVS-Studio 4.00, the question was again raised if we could implement most checks
using regular expressions and if we actually complicate the matter suggesting that we must necessarily
build and handle a parse tree during analysis. This question arises not for the first time, so I decided to
write an article to explain why it is a very bad idea to try to use regular expressions for C/C++ code
analysis.

Those familiar with the compilation theory certainly understand that the C++ language can be parsed
only relying on grammatics and not regular expressions. But most programmers are not familiar with
this theory and they continue to tell us about using regular expressions to search for errors in software
code over and over again.

Let me say right away that we can find some issues using regular expressions. There are even several
static analyzers that use this principle. But their capabilities are very restricted and mostly come to
messages like "There is the "strcpy" function being used, you'd better replace it with a safer one".

Having thought it over how to tell the community about lameness of the regular expression method, I
decided to do the following simple thing. I will take the first ten diagnostic messages of general analysis
implemented in PVS-Studio and show by the example of each of them what restrictions the regular
expression method involves.

Diagnosis 0

Once I started describing V501, I recalled that none of the analysis types would provide me with
sufficient information until #define's remain unexpanded. The error might hide inside the macro but it
will remain an error all the same. It is rather simple to create a preprocessed file, so assume we already
have i-files. Now we encounter the first trouble - we must determine which code fragments refer to
system files and which refer to user code. If we analyze system library functions, it will significantly
reduce the speed of analysis and cause a lot of unnecessary diagnostic messages. Thus, if we use regular
expressions, we must parse the following lines:

#line 27 "C:Program Files (x86)Microsoft Visual Studio 8VCatlmfcincludeafx.h"

#line 1008 ".mytestfile.cpp"

and understand which of them refer to our program and which refer to Visual Studio. But that's not the
half of it: we must also implement relative reading of lines inside i-files since we must generate not the
absolute number of the line with the error in the preprocessed i-file but the number of the line in our
native c/cpp-file we are analyzing.
So, we have not even started but already get a whole lot of difficulties.

Diagnosis 1

V501. There are identical sub-expressions to the left and to the right of the 'foo' operator.

In order not to overload the text, I suggest that the readers go by the link and read the description of
this error and samples. The point of this rule is to detect constructs of this type:

if (X > 0 && X > 0)

At first sight, we could easily find such constructs using a regular expression when identical expressions
stand to the left and to the right of operators &&, ||, ==, etc. For example: we search for the &&
operator. If there is something looking identical in parentheses to the right and to the left of &&, we
certainly have an error. But it won't work because one could write it this way:

if (A == A && B)

The error is still here but there are different expressions to the left and to the right of '=='. It means that
we must introduce the notion of precedence of operators. Then we must cut off boundaries on lower-
priority operators such as '&&' if we have '=='; and vice versa: if it is '&&', then we must capture
operators '==' to find the error for this case on approaching the limiting parentheses:

if (A == 0 && A == 0)

In the same way, we must provide for logic for all the versions of operators with different priorities. Yes,
by the way - you cannot fully rely on parentheses too because you may encounter cases like this:

if ( '(' == A && '(' == B )

b = X > 0 && X > 0;

It is very difficult to provide for all the possible ways using regular expressions. We will have too many of
them with a lot of exceptions. And still it won't be safe since we will not be sure that all the possible
constructs have been taken into account.

Now compare this whole stuff with the elegance with which I can find this error having a syntax tree. If I
have found operators &&, ==, ||, etc., I only have to compare the left and the right branches of the tree
to each other. I will do this in the following way:

if (Equal(left, right))

{

    // Error!

}

That is all. You don't have to think of operators' priorities, you don't have to fear that you will encounter
a bracket in this text: b = '(' == x && x == ')';. You can simply compare the left and the right tree
branches.

Diagnosis 2
V502. Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a
lower priority than the 'foo' operator.

This rule searches for confusion concerning operators' priorities (see the error description for details).
We must detect a text like this:

int a;

bool b;

int c = a + b ? 0 : 1;

Let's leave the question about operator's priorities aside for now: regular expressions appear too poor
when used for this purpose. But what is worse, you must know the VARIABLE'S TYPE for this and many
other rules.

You must derive the type of each variable. You must force your way through the maze of typedef. You
must look into classes to understand what vector<int>::size_type is. You must take scopes into
consideration as well as different using namespace std;. You must even derive the type of the X variable
from the expression: "auto X = 1 + 2;" in C++0x.

The question is how can we do all that using regular expressions? The answer is no way. Regular
expressions are perpendicular to this task. You must either write a complicated mechanism of type
derivation, i.e. create a syntactical code analyzer, or have regular expressions without knowing types of
variables and expressions.

The conclusion is: if we use regular expressions to handle a C/C++ application, we do not know types of
variables and expressions. Note this great limitation.

Diagnosis 3

V503. This is a nonsensical comparison: pointer < 0.

This rule is very simple. Comparison of a pointer with zero using < and > looks suspicious. For example:

CMeshBase *pMeshBase = getCutMesh(Idx);

if (pMeshBase < 0)

   return NULL;

Refer to the error description to learn how we got this code.

To implement this diagnosis, we must only know the type of the pMeshBase variable. It was explained
above why it is impossible.

This diagnosis cannot be implemented relying on regular expressions.



Diagnosis 4

V504. It is highly probable that the semicolon ';' is missing after 'return' keyword.

void Foo();
void Foo2(int *ptr)

{

    if (ptr == NULL)

        return

    Foo();

    ...

}

We could well diagnose constructs of this type using regular expressions. But we would have too many
false alarms. We are interested only in those cases when the function returns void. Well, we could find it
out using regular expressions either. But it will not be very clear where the function starts and ends. Try
yourself to invent a regular expression to find the function's start. Trust me, you will like this task,
especially if you understand that one could write a stuff like this:

int Foo()

{

    ...

    char c[] =

    "void MyFoo(int x) {"

    ;

    ...

}

If we have a complete syntax tree with diverse information, everything becomes much simpler. You may
find out the type of the returned function this way (the sample is taken right out of PVS-Studio):

SimpleType funcReturnType;

EFunctionReturnType fType;

if (!env->LookupFunctionReturnType(fType, funcReturnType))

    return;

if (funcReturnType != ST_VOID)

    return;

Diagnosis 5

V505. The 'alloca' function is used inside the loop. This can quickly overflow stack.

Yes, we could try to implement this rule relying on regular expressions.
But I wouldn't try to find out where the loop starts and ends for one could think up so many funny
situations with curly brackets in comments and strings.

{

    for (int i = 0; i < 10; i++)

    {

        //A cool comment. There you are { - try to solve it. :)

        char *x = "You must be careful here too {";

    }

    p = _alloca(10); // Are we inside the loop or not?

}

Diagnosis 6

V506. Pointer to local variable 'X' is stored outside the scope of this variable. Such a pointer will become
invalid.

We must handle variables' scope to detect these errors. We must also know types of variables.

This diagnosis cannot be implemented relying on regular expressions.

Diagnosis 7

V507. Pointer to local array 'X' is stored outside the scope of this array. Such a pointer will become
invalid.

This diagnosis cannot be implemented relying on regular expressions.

Diagnosis 8

V508. The use of 'new type(n)' pattern was detected. Probably meant: 'new type[n]'.

It is good to detect misprints of this kind:

float *p = new float(10);

Everything looks simple and it seems we could implement this diagnosis using regular expressions if we
knew the type of the object being created. No way. Once you change the text a bit, regular expressions
become useless:

typedef float MyReal;

...

MyReal *p = new MyReal(10);

This diagnosis cannot be implemented relying on regular expressions.

Diagnosis 9
V509. The 'throw' operator inside the destructor should be placed within the try..catch block. Raising
exception inside the destructor is illegal.

Yes, we could try to make this check using regular expressions. Destructors are usually small functions
and we will hardly meet any troubles with curly brackets there.

But you will have to sweat over regular expressions to find the destructor function, its beginning and
end and find out if it contains throw which is caught in catch. Do you imagine the whole amount of
work? Can you do a thing like that?

Well, I can. This is how I made it in a very smart way in PVS-Studio (the rule is given in full):

void ApplyRuleG_509(VivaWalker &walker, Environment *env,

    const Ptree *srcPtree)

{

    SimpleType returnType;

    EFunctionReturnType fType;

    bool res = env->LookupFunctionReturnType(fType, returnType);

    if (res == false || returnType != ST_UNKNOWN)

      return;

    if (fType != DESTRUCTOR)

      return;



    ptrdiff_t tryLevel = OmpUtil::GetLevel_TRY(env);

    if (tryLevel != -1)

      return;

    string error = VivaErrors::V509();

    walker.AddError(error, srcPtree, 509, DATE_1_SEP_2010(), Level_1);

}

Diagnosis 10

V510. The 'Foo' function is not expected to receive class-type variable as 'N' actual argument.

This rule concerns passing classes of std::string type and the like as arguments into functions of printf
type. We need types. That is, this diagnosis cannot be implemented relying on regular expressions as
well.

Summary
I hope I have made the situation with regular expressions, syntax trees and static code analysis clearer
to you. Thank you for your attention. Once again I ask you to download and try PVS-Studio. I would also
appreciate if you ask questions but I am not intending to get into debates about what regular
expressions can give us and what they cannot. It is not interesting. They do allow us to get much, but
they do not allow us to get even more. C++ can be successfully parsed only using the grammatics
mathematical apparatus.

Contenu connexe

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

En vedette

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

En vedette (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Static analysis and regular expressions

  • 1. Static analysis and regular expressions Author: Andrey Karpov Date: 09.12.2010 I develop the PVS-Studio static code analyzer intended for analyzing C/C++ software. After we implemented general analysis in PVS-Studio 4.00, we received a lot of responses, both positive and negative. By the way, you are welcome to download a new version of PVS-Studio where we have fixed a lot of errors and defects thanks to users who told us about them. While discussing PVS-Studio 4.00, the question was again raised if we could implement most checks using regular expressions and if we actually complicate the matter suggesting that we must necessarily build and handle a parse tree during analysis. This question arises not for the first time, so I decided to write an article to explain why it is a very bad idea to try to use regular expressions for C/C++ code analysis. Those familiar with the compilation theory certainly understand that the C++ language can be parsed only relying on grammatics and not regular expressions. But most programmers are not familiar with this theory and they continue to tell us about using regular expressions to search for errors in software code over and over again. Let me say right away that we can find some issues using regular expressions. There are even several static analyzers that use this principle. But their capabilities are very restricted and mostly come to messages like "There is the "strcpy" function being used, you'd better replace it with a safer one". Having thought it over how to tell the community about lameness of the regular expression method, I decided to do the following simple thing. I will take the first ten diagnostic messages of general analysis implemented in PVS-Studio and show by the example of each of them what restrictions the regular expression method involves. Diagnosis 0 Once I started describing V501, I recalled that none of the analysis types would provide me with sufficient information until #define's remain unexpanded. The error might hide inside the macro but it will remain an error all the same. It is rather simple to create a preprocessed file, so assume we already have i-files. Now we encounter the first trouble - we must determine which code fragments refer to system files and which refer to user code. If we analyze system library functions, it will significantly reduce the speed of analysis and cause a lot of unnecessary diagnostic messages. Thus, if we use regular expressions, we must parse the following lines: #line 27 "C:Program Files (x86)Microsoft Visual Studio 8VCatlmfcincludeafx.h" #line 1008 ".mytestfile.cpp" and understand which of them refer to our program and which refer to Visual Studio. But that's not the half of it: we must also implement relative reading of lines inside i-files since we must generate not the absolute number of the line with the error in the preprocessed i-file but the number of the line in our native c/cpp-file we are analyzing.
  • 2. So, we have not even started but already get a whole lot of difficulties. Diagnosis 1 V501. There are identical sub-expressions to the left and to the right of the 'foo' operator. In order not to overload the text, I suggest that the readers go by the link and read the description of this error and samples. The point of this rule is to detect constructs of this type: if (X > 0 && X > 0) At first sight, we could easily find such constructs using a regular expression when identical expressions stand to the left and to the right of operators &&, ||, ==, etc. For example: we search for the && operator. If there is something looking identical in parentheses to the right and to the left of &&, we certainly have an error. But it won't work because one could write it this way: if (A == A && B) The error is still here but there are different expressions to the left and to the right of '=='. It means that we must introduce the notion of precedence of operators. Then we must cut off boundaries on lower- priority operators such as '&&' if we have '=='; and vice versa: if it is '&&', then we must capture operators '==' to find the error for this case on approaching the limiting parentheses: if (A == 0 && A == 0) In the same way, we must provide for logic for all the versions of operators with different priorities. Yes, by the way - you cannot fully rely on parentheses too because you may encounter cases like this: if ( '(' == A && '(' == B ) b = X > 0 && X > 0; It is very difficult to provide for all the possible ways using regular expressions. We will have too many of them with a lot of exceptions. And still it won't be safe since we will not be sure that all the possible constructs have been taken into account. Now compare this whole stuff with the elegance with which I can find this error having a syntax tree. If I have found operators &&, ==, ||, etc., I only have to compare the left and the right branches of the tree to each other. I will do this in the following way: if (Equal(left, right)) { // Error! } That is all. You don't have to think of operators' priorities, you don't have to fear that you will encounter a bracket in this text: b = '(' == x && x == ')';. You can simply compare the left and the right tree branches. Diagnosis 2
  • 3. V502. Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the 'foo' operator. This rule searches for confusion concerning operators' priorities (see the error description for details). We must detect a text like this: int a; bool b; int c = a + b ? 0 : 1; Let's leave the question about operator's priorities aside for now: regular expressions appear too poor when used for this purpose. But what is worse, you must know the VARIABLE'S TYPE for this and many other rules. You must derive the type of each variable. You must force your way through the maze of typedef. You must look into classes to understand what vector<int>::size_type is. You must take scopes into consideration as well as different using namespace std;. You must even derive the type of the X variable from the expression: "auto X = 1 + 2;" in C++0x. The question is how can we do all that using regular expressions? The answer is no way. Regular expressions are perpendicular to this task. You must either write a complicated mechanism of type derivation, i.e. create a syntactical code analyzer, or have regular expressions without knowing types of variables and expressions. The conclusion is: if we use regular expressions to handle a C/C++ application, we do not know types of variables and expressions. Note this great limitation. Diagnosis 3 V503. This is a nonsensical comparison: pointer < 0. This rule is very simple. Comparison of a pointer with zero using < and > looks suspicious. For example: CMeshBase *pMeshBase = getCutMesh(Idx); if (pMeshBase < 0) return NULL; Refer to the error description to learn how we got this code. To implement this diagnosis, we must only know the type of the pMeshBase variable. It was explained above why it is impossible. This diagnosis cannot be implemented relying on regular expressions. Diagnosis 4 V504. It is highly probable that the semicolon ';' is missing after 'return' keyword. void Foo();
  • 4. void Foo2(int *ptr) { if (ptr == NULL) return Foo(); ... } We could well diagnose constructs of this type using regular expressions. But we would have too many false alarms. We are interested only in those cases when the function returns void. Well, we could find it out using regular expressions either. But it will not be very clear where the function starts and ends. Try yourself to invent a regular expression to find the function's start. Trust me, you will like this task, especially if you understand that one could write a stuff like this: int Foo() { ... char c[] = "void MyFoo(int x) {" ; ... } If we have a complete syntax tree with diverse information, everything becomes much simpler. You may find out the type of the returned function this way (the sample is taken right out of PVS-Studio): SimpleType funcReturnType; EFunctionReturnType fType; if (!env->LookupFunctionReturnType(fType, funcReturnType)) return; if (funcReturnType != ST_VOID) return; Diagnosis 5 V505. The 'alloca' function is used inside the loop. This can quickly overflow stack. Yes, we could try to implement this rule relying on regular expressions.
  • 5. But I wouldn't try to find out where the loop starts and ends for one could think up so many funny situations with curly brackets in comments and strings. { for (int i = 0; i < 10; i++) { //A cool comment. There you are { - try to solve it. :) char *x = "You must be careful here too {"; } p = _alloca(10); // Are we inside the loop or not? } Diagnosis 6 V506. Pointer to local variable 'X' is stored outside the scope of this variable. Such a pointer will become invalid. We must handle variables' scope to detect these errors. We must also know types of variables. This diagnosis cannot be implemented relying on regular expressions. Diagnosis 7 V507. Pointer to local array 'X' is stored outside the scope of this array. Such a pointer will become invalid. This diagnosis cannot be implemented relying on regular expressions. Diagnosis 8 V508. The use of 'new type(n)' pattern was detected. Probably meant: 'new type[n]'. It is good to detect misprints of this kind: float *p = new float(10); Everything looks simple and it seems we could implement this diagnosis using regular expressions if we knew the type of the object being created. No way. Once you change the text a bit, regular expressions become useless: typedef float MyReal; ... MyReal *p = new MyReal(10); This diagnosis cannot be implemented relying on regular expressions. Diagnosis 9
  • 6. V509. The 'throw' operator inside the destructor should be placed within the try..catch block. Raising exception inside the destructor is illegal. Yes, we could try to make this check using regular expressions. Destructors are usually small functions and we will hardly meet any troubles with curly brackets there. But you will have to sweat over regular expressions to find the destructor function, its beginning and end and find out if it contains throw which is caught in catch. Do you imagine the whole amount of work? Can you do a thing like that? Well, I can. This is how I made it in a very smart way in PVS-Studio (the rule is given in full): void ApplyRuleG_509(VivaWalker &walker, Environment *env, const Ptree *srcPtree) { SimpleType returnType; EFunctionReturnType fType; bool res = env->LookupFunctionReturnType(fType, returnType); if (res == false || returnType != ST_UNKNOWN) return; if (fType != DESTRUCTOR) return; ptrdiff_t tryLevel = OmpUtil::GetLevel_TRY(env); if (tryLevel != -1) return; string error = VivaErrors::V509(); walker.AddError(error, srcPtree, 509, DATE_1_SEP_2010(), Level_1); } Diagnosis 10 V510. The 'Foo' function is not expected to receive class-type variable as 'N' actual argument. This rule concerns passing classes of std::string type and the like as arguments into functions of printf type. We need types. That is, this diagnosis cannot be implemented relying on regular expressions as well. Summary
  • 7. I hope I have made the situation with regular expressions, syntax trees and static code analysis clearer to you. Thank you for your attention. Once again I ask you to download and try PVS-Studio. I would also appreciate if you ask questions but I am not intending to get into debates about what regular expressions can give us and what they cannot. It is not interesting. They do allow us to get much, but they do not allow us to get even more. C++ can be successfully parsed only using the grammatics mathematical apparatus.