64-bit computers have been around and well for a long time already. Most applications have 64-bit versions that can benefit from larger memory capacity and improved performance thanks to the architectural capabilities of 64-bit processors. Developing 64-bit application in C/C++ requires much attention from a programmer. There is a number of reasons for 32-bit code to fail to work properly when recompiled for the 64-bit platform. There are a lot of articles on this subject, so we will focus on some other thing. Let's find out if the new features introduced in C++11 have made 64-bit software programmers' life any better and easier.
Note. The article was originally published in Software Developer's Journal (April 25, 2014) and is published here by the editors' permission.
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
C++11 and 64-bit Issues
1. C++11 and 64-bit Issues
Author: Andrey Karpov
Date: 29.04.2014
64-bit computers have been around and well for a long time already. Most applications have 64-bit
versions that can benefit from larger memory capacity and improved performance thanks to the
architectural capabilities of 64-bit processors. Developing 64-bit application in C/C++ requires much
attention from a programmer. There is a number of reasons for 32-bit code to fail to work properly
when recompiled for the 64-bit platform. There are a lot of articles on this subject, so we will focus on
some other thing. Let's find out if the new features introduced in C++11 have made 64-bit software
programmers' life any better and easier.
Note. The article was originally published in Software Developer's Journal (April 25, 2014) and is
published here by the editors' permission.
The world of 64-bit errors
There are quite many traps a 64-bit C/C++ programmer can fall into. Many articles were published on
this subject, so we will not dwell on it. If you are not familiar with specific aspects of 64-bit software
development or want to refresh your knowledge about it, consider the following resources:
• A Collection of Examples of 64-bit Errors in Real Programs;
• Lessons on development of 64-bit C/C++ applications;
• All about 64-bit programming in one place.
Nevertheless, time runs on and has eventually brought us an updated and improved version of the C++
language named C++11. Most of the innovations described in the C++11 language standard are currently
supported by modern compilers. Let's find out if these innovations can help programmers avoid 64-bit
errors.
The article is organized in the following way. I will give a brief description of a typical 64-bit issue and
offer ways to avoid it with the means of the C++11 language. It should be noted that C++11 is not always
helpful, so it is only careful programming that will protect you against making errors. The new standard
will only provide additional aid in it, but it will never be able to solve all of your troubles.
2. Magic numbers
I mean numbers like 4, 32, 0x7FFFFFFF, 0xFFFFFFFF (more). Programmers should never assume that the
pointer size is always 4 bytes as it may result it in the following incorrect code:
int **array = (int **)malloc(n * 4);
The C++11 standard has nothing to offer to handle such an error. Magic numbers are evil and should be
avoided whenever possible to prevent any errors related to them.
Note. True, malloc() is not from C++, it is from the good old C. It would be better to use the new
operator or the std::vector container here. But we won't touch upon that since it has nothing to do with
our subject, magic numbers.
However, C++11 can actually help you use fewer magic numbers in certain cases. Programmers
sometimes use magic numbers because they are afraid (usually without reason) that the compiler will
not optimize the code properly. In this case, one should use generalized constant expressions
(constexpr).
The constexpr mechanism guarantees initialization of expressions during compilation. You can declare
functions which will certainly be expanded into constants during compilation. For example:
constexpr int Formula(int a) {
constexpr int tmp = a * 2;
return tmp + 55;
}
int n = Formula(1);
The call of the Formula(1) function will turn into a number. The explanation is too short of course, so I
recommend you to check out the references at the end of the article to learn more about "constexpr"
and other innovations of C++11.
Variadic functions
Here I mean the issues that occur when the functions printf, scanf and the like are used incorrectly
(more). For example:
size_t value = ....;
printf("%u", value);
This code works properly in the 32-bit version of the program but may print incorrect values when
recompiled into the 64-bit version.
Variadic functions are vestiges of the C language. Their disadvantage is the absence of control over the
types of actual arguments. A time had come to drop them completely in a modern C++. After all, there
are numbers of other string formatting methods. For example, you can replace printf with cout, and
sprintf with boost::format or std::stringstream.
3. Things improved even more as the C++11 language appeared. It brought us variadic templates which
allow one to implement a safe version of the printf function:
void printf(const char* s)
{
while (s && *s) {
if (*s=='%' && *++s!='%')
throw runtime_error("invalid format: missing arguments");
std::cout << *s++;
}
}
template<typename T, typename... Args>
void printf(const char* s, T value, Args... args)
{
while (s && *s) {
if (*s=='%' && *++s!='%') {
std::cout << value;
return printf(++s, args...);
}
std::cout << *s++;
}
}
This code simply "pulls out" the first argument which is not a format string and then calls itself
recursively. When there are no such arguments left, the first (simpler) version of the printf() method will
be called.
The Args...defines what is called a "parameter pack". That's basically a sequence of 'type/value' pairs
from which you can "peel off" arguments starting with the first. When printf() is called with one
argument, the first definition (printf(const char*)) is chosen. When printf() is called with two or more
arguments, the second definition (printf(const char*, T value, Args... args)) is chosen, with the first
argument as s, the second as value, and the rest (if any) bundled into the 'args' parameter pack for the
subsequent use. In the call
printf(++s, args...);
The 'args' parameter pack is expanded so that the next argument can now be selected as value. This
carries on until args is empty (so that the first version of printf() is called).
4. Incorrect shift operations
The numerical literal 1 is of the int type. It means that it can't be shifted by more than 31 bits (more).
Programmers often forget about this and write incorrect code:
ptrdiff_t mask = 1 << bitNum;
If the bitNum value equals 40, for example, it will have unpredictable consequences, formally leading to
undefined behavior (more).
What does C++11 have to offer to solve this issue? Unfortunately, nothing.
Disparity between virtual functions
Assume we have a virtual function declared in a base class:
int A(DWORD_PTR x);
And the following function in the descendant class:
int A(DWORD x);
In a 32-bit version, the types DWORD_PTR and DWORD coincide. But they turn into two different types
in a 64-bit version (more). As a result, calling the A function from the base class will lead to different
outputs in the 32-bit and 64-bit programs.
To avoid such errors, we can use the new keywords introduced in C++11.
Now we have the keyword override which allows the programmer to explicitly manifest his intentions
concerning function overriding. It is only correct to declare a function with the override keyword when
there is a function to be overridden.
The code will fail to compile in the 64-bit mode and therefore the error will be prevented:
struct X
{
virtual int A(DWORD_PTR) { return 1; }
};
struct Y : public X
{
int A(DWORD x) override { return 2; }
};
Mixed arithmetic
This topic is pretty large and important, so I suggest that you study the corresponding section of the "64-
bit Lessons ": Mixed arithmetic.
5. Let me just cite a couple of theses here:
1. Programmers tend to forget that the resulting value of a multiplication or addition of two
variables of the 'int' type will be also 'int', which may cause an overflow, and it doesn't matter
how this result is used after that.
2. It is unsafe to mix 32-bit and 64-bit data types as the consequences may be unpleasant ones:
incorrect conditions, infinite loops, etc.
A few simple examples of an overflow
char *p = new char[1024*1024*1024*5];
The programmer is trying to allocate 5 GBytes of memory, but the program will actually allocate much
less because the "1024*1024*1024*5" expression is of the int type. It will result in an overflow, and the
expression will evaluate to 1073741824 (1 GByte). After that, this value will be extended to the size_t
type when being passed to the 'new' operator, but it just won't matter (it will be too late).
If you still haven't grasped the idea, here you are another example:
unsigned a = 1024, b = 1024, c = 1024, d = 5;
size_t n = a * b * c * d;
The expression's result is written into a variable of the 'size_t' type. It can store values larger than
UINT_MAX. However, when multiplying 'unsigned' variables, an overflow will occur and the result will
be incorrect.
Why do we refer to all these issues as 64-bit ones? The point is that you can't allocate an array larger
than 2 GBytes in a 32-bit program. It means that you will simply never see any overflows there. But in
64-bit applications handling larger memory amounts, these errors will reveal themselves.
Now a couple of examples on comparison
size_t Count = BigValue;
for (unsigned Index = 0; Index < Count; ++Index)
{ ... }
In this fragment, an infinite loop will occur if Count > UINT_MAX. Suppose this code used to iterate
fewer times than UINT_MAX in the 32-bit version. But the 64-bit version can handle more data and
therefore may need more iterations. Since the values of the Index variable lie inside the range
[0..UINT_MAX], the "Index < Count" condition is always true thus leading to an infinite loop.
One more example:
string str = .....;
unsigned n = str.find("ABC");
if (n != string::npos)
This code is incorrect. The find() function returns a value of the string::size_type type. It will work
correctly in the 32-bit version, but let's see what will happen in the 64-bit one.
6. In the 64-bit program, string::size_type and unsigned do not coincide anymore. If the substring cannot
be found, the find() function will return the value string::npos which equals 0xFFFFFFFFFFFFFFFFui64.
This value is truncated to 0xFFFFFFFFu and is written into a 32-bit variable. The 0xFFFFFFFFu !=
0xFFFFFFFFFFFFFFFFui64 expression is calculated, and it turns that the (n != string::npos) condition is
always true!
Can C++11 help in any way here?
The answer is both yes and no.
In some cases, the new keyword auto may be of use. And in some other cases, it will only confuse the
programmer. So let's figure out when it can and cannot be used.
If you declare "auto a = .....", the type will be estimated automatically. It is very important that you don't
get confused and don't write such an incorrect code as "auto n = 1024*1024*1024*5;".
Now, a few words about the auto keyword. Take a look at this example:
auto x = 7;
In this case, the 'x' variable will have the 'int' type as it is the same type as that of the variable initializer.
In the general case, we can write the following code:
auto x = expression;
The type of the 'x' variable will be the same as that of the value the expression evaluates to.
The 'auto' keyword is most useful to get the type of a variable from its initializer when you don't know
the exact type of the expression or it is too complex to write manually. Take a look at the following
example:
template<class T> void printall(const vector<T>& v)
{
for (auto p = v.begin(); p!=v.end(); ++p)
cout << *p << "n";
}
In C++98, you would have to write a much longer code:
template<class T> void printall(const vector<T>& v)
{
for (typename vector<T>::const_iterator p = v.begin();
p!=v.end(); ++p)
cout << *p << "n";
}
So, that's a very useful innovation of the C++11 language.
7. Let's get back to our problem. The "1024*1024*1024*5" expression has the 'int' type. That's why the
'auto' keyword will be useless in this case.
Neither will it help to deal with a loop like this:
size_t Count = BigValue;
for (auto Index = 0; Index < Count; ++Index)
Did we make it any better? No, we didn't. The number 0 is 'int', which means that the Index variable will
now become 'unsigned' instead of 'int'. I'd say it has become even worse.
So is 'auto' of any use at all? Yes, it is. For example, in the following code:
string str = .....;
auto n = str.find("ABC");
if (n != string::npos)
The 'n' variable will have the 'string::size_type' type, and everything will be alright now.
We made use of the 'auto' keyword at last. But be careful - you should understand very well what and
why you are doing. Don't strive to defeat all the errors related to mixed arithmetic by using 'auto'
everywhere you can. It's just one of the means to make it a bit easier, not the cure-all.
By the way, there is one more method to prevent type truncation in the example above:
unsigned n = str.find("ABC");
You can use a new variable initialization format which prevents type narrowing. The issue is that C and
C++ languages tend to implicitly truncate certain types:
int x = 7.3; // Oops!
void f(int);
f(7.3); // Oops!
However, C++11's initialization lists don't allow type narrowing:
int x0 {7.3}; //compilation error
int x1 = {7.3}; //compilation error
double d = 7;
int x2{d}; //compilation error
But the following example is of more interest to us right now:
size_t A = 1;
unsigned X = A;
unsigned Y(A);
8. unsigned Q = { A }; //compilation error
unsigned W { A }; //compilation error
Imagine the code is written like this:
unsigned n = { str.find("ABC") };
or this
unsigned n{str.find("ABC")};
This code will compile correctly in the 32-bit mode but will fail in the 64-bit mode.
Again, it's not the cure-all; it's just another way to write safer programs.
Address arithmetic
It's pretty similar to what we discussed in the "Mixed arithmetic" section. The difference is only in that
overflows occur when working with pointers (more).
For example:
float Region::GetCell(int x, int y, int z) const {
return array[x + y * Width + z * Width * Height];
}
This fragment is taken from a real-life program for mathematical simulation, the amount of memory
being a very crucial resource for it. In order to save memory in such applications, one-dimensional arrays
are often used, which then are handled as the three-dimensional ones. There are special functions
similar to GetCell for the programmer to access the required elements. But the code fragment above
will only correctly handle those arrays which consist of fewer than INT_MAX items because 32-bit int
types are used to calculate the item indexes.
Can C++11 help us with this one? No.
Changing an array type and pointer packing
It is sometimes necessary (or just convenient) to represent array items as items of a different type
(more). It may also be convenient to store pointers in integer variables (more).
You may face issues here when exploiting incorrect explicit type conversions. The new C++11 standard
can't help with that - programmers have always used explicit type conversions at their own risk.
Handling data stored in unions should also be mentioned. Such handling of data is a low-level one and
its results also depend solely on the programmer's skills and knowledge (more).
9. Serialization and data exchange
Sometimes you may need to create a compatible data format in your project - that is, one data set must
be handled both by the 32-bit and 64-bit versions of the program. The issue is that sizes of some data
types may change (more).
The C++11 standard has made life a bit easier by offering types of a fixed size. Until that, programmers
had to declare such types manually or employ ones from the system libraries.
Now we have the following types with a fixed size:
• int8_t
• int16_t
• int32_t
• int64_t
• uint8_t
• uint16_t
• uint32_t
• uint64_t
Besides the type sizes, the data alignment is also subject to change, which may cause some troubles as
well (more).
In connection to this, we should also mention the new keyword 'alignment' introduced in C++11. Now
you can write the following code:
// an array of characters aligned to store double types
alignas(double) unsigned char c[1024];
// alignment on the 16-byte boundary
alignas(16) char[100];
There also exists the 'alignof' operator which returns alignment of a certain argument (which must be a
type). For example:
constexpr int n = alignof(int);
Overloaded functions
When porting a 32-bit program to the 64-bit platform, you may discover that its execution logic has
changed, which was caused by the use of overloaded functions in your code. If a function is overloaded
for 32-bit and 64-bit values, an attempt to access it with an argument, say, of the size_t type will be
translated into different calls on different platforms (more).
I can't say for sure if any innovations of the C++11 language can help solve these issues.
10. Type size checks
There are cases when you need to check the sizes of data types. It may be necessary to make sure you
won't get a buggy program after recompiling the code for a new platform.
Programmers often do it incorrectly, for example:
assert(sizeof(unsigned) < sizeof(size_t));
assert(sizeof(short) == 2);
It's a bad idea to do it like that. First, the program will compile anyway. Second, these checks will only
make sense in the debug version.
Instead, one should rather terminate compilation if the necessary conditions prove false. There are a lot
of ways to do that. For instance, you can use the _STATIC_ASSERT macro, available to developers
working in Visual Studio. For example:
_STATIC_ASSERT(sizeof(int) == sizeof(long));
C++11 has a standard algorithm to terminate compilation if things go wrong - by offering static
assertions.
Static assertions (compile-time-assertions) contain a constant expression and a string literal:
static_assert(expression, string);
The compiler calculates the expression and outputs a string as an error message if the calculation result
is false (i.e. the assertion is violated). For example:
static_assert(sizeof(size_t)>=8,
"64-bit code generation required for this library.");
struct S { X m1; Y m2; };
static_assert(sizeof(S)==sizeof(X)+sizeof(Y),
"unexpected padding in S");
Conclusion
Extensive use of the C++11 language's new constructs in your code doesn't guarantee that you will avoid
64-bit errors. However, the language does offer a number of useful features to help make your code
shorter and safer.
References
We didn't aim at familiarizing the readers with as many innovations of the C++11 language as possible in
this article. To get started with the new standard, please consider the following resources:
11. 1. Bjarne Stroustrup. C++11 - the new ISO C++ standard.
2. Wikipedia. C++11.
3. Scott Meyers. An Effective C++11/14 Sampler.