
精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
锐英源精品开源,禁止转载和任何形式的非法内容使用,违者必究
Buffer overflows, both on the stack and on the heap, are a major source of security vulnerabilities in C, Objective-C, and C++ code. This chapter discusses coding practices that will avoid buffer overflow and underflow problems, lists tools you can use to detect buffer overflows, and provides samples illustrating safe code.
Every time your program solicits input (whether from a user, from a file, over a network, or by some other means), there is a potential to receive inappropriate data. For example, the input data might be longer than what you have reserved room for in memory.
When the input data is longer than will fit in the reserved space, if you do not truncate it, that data will overwrite other data in memory. When this happens, it is called a buffer overflow. If the memory overwritten contained data essential to the operation of the program, this overflow causes a bug that, being intermittent, might be very hard to find. If the overwritten data includes the address of other code to be executed and the user has done this deliberately, the user can point to malicious code that your program will then execute.
栈和堆上的缓冲区溢出是C、Objective-C和C ++代码中安全漏洞的主要来源。本章将讨论避免缓冲区溢出和下溢问题的编码实践,列出可用于检测缓冲区溢出的工具,并提供说明安全代码的示例。
每次你的程序请求输入(无论是来自用户,来自文件,通过网络还是通过其他方式),都有可能接收到不适当的数据。例如,输入数据可能比您在内存中保留的空间长。
当输入数据长于适合保留空间的数据时,如果不截断,则该数据将覆盖内存中的其他数据。发生这种情况时,称为缓冲区溢出。如果覆盖的内存包含对程序操作至关重要的数据,则此溢出会导致一个间歇性的错误,可能很难找到。如果覆盖的数据包括要执行的其他代码的地址,并且用户故意这样做,则用户可以指向恶意代码,您的程序随后将执行这些代码。
Similarly, when the input data is or appears to be shorter than the reserved space (due to erroneous assumptions, incorrect length values, or copying raw data as a C string), this is called a buffer underflow. This can cause any number of problems from incorrect behavior to leaking data that is currently on the stack or heap.
Although most programming languages check input against storage to prevent buffer overflows and underflows, C, Objective-C, and C++ do not. Because many programs link to C libraries, vulnerabilities in standard libraries can cause vulnerabilities even in programs written in “safe” languages. For this reason, even if you are confident that your code is free of buffer overflow problems, you should limit exposure by running with the least privileges possible. See Elevating Privileges Safely for more information on this topic.
类似地,当输入数据显示或看起来比保留空间短时(由于错误的假设,不正确的长度值或将原始数据复制为C字符串),这称为缓冲区下溢。这可能导致从错误行为到当前堆栈或堆上的数据泄漏的任何问题。
虽然大多数编程语言检查输入是否存储以防止缓冲区溢出和下溢,但C,Objective-C和C ++却没有。由于许多程序链接到C库,因此即使在使用“安全”语言编写的程序中,标准库中的漏洞也可能导致漏洞。因此,即使您确信您的代码没有缓冲区溢出问题,您也应该通过尽可能少的权限运行来限制曝光。有关此主题的更多信息,请参阅
Keep in mind that obvious forms of input, such as strings entered through dialog boxes, are not the only potential source of malicious input. For example:
Buffer overflows in one operating system’s help system could be caused by maliciously prepared embedded images.
A commonly-used media player failed to validate a specific type of audio files, allowing an attacker to execute arbitrary code by causing a buffer overflow with a carefully crafted audio file.
请记住,明显的输入形式(例如通过对话框输入的字符串)不是恶意输入的唯一潜在来源。例如:
一个操作系统的帮助系统中的缓冲区溢出可能是由恶意准备的嵌入式映像引起的。
常用的媒体播放器无法验证特定类型的音频文件,允许攻击者通过精心制作的音频文件导致缓冲区溢出来执行任意代码。
[1CVE-2006-1591 2CVE-2006-1370]
There are two basic categories of overflow: stack overflows and heap overflows. These are described in more detail in the sections that follow.溢出有两种基本类别:堆栈溢出和堆溢出。这些将在后面的章节中详细介绍。
In most operating systems, each application has a stack (and multithreaded applications have one stack per thread). This stack contains storage for locally scoped data.
The stack is divided into units called stack frames. Each stack frame contains all data specific to a particular call to a particular function. This data typically includes the function’s parameters, the complete set of local variables within that function, and linkage information—that is, the address of the function call itself, where execution continues when the function returns). Depending on compiler flags, it may also contain the address of the top of the next stack frame. The exact content and order of data on the stack depends on the operating system and CPU architecture.
在大多数操作系统中,每个应用程序都有一个堆栈(多线程应用程序每个线程有一个堆栈)。此堆栈包含用于本地范围数据的存储。
堆栈分为称为堆栈帧的单元。每个堆栈帧包含特定于特定函数的特定调用的所有数据。此数据通常包括函数的参数,该函数中的完整局部变量集以及链接信息 - 即函数调用本身的地址,其中函数返回时继续执行)。根据编译器标志,它还可能包含下一个堆栈帧顶部的地址。堆栈上数据的确切内容和顺序取决于操作系统和CPU架构。
Each time a function is called, a new stack frame is added to the top of the stack. Each time a function returns, the top stack frame is removed. At any given point in execution, an application can only directly access the data in the topmost stack frame. (Pointers can get around this, but it is generally a bad idea to do so.) This design makes recursion possible because each nested call to a function gets its own copy of local variables and parameters.
每次调用函数时,都会在堆栈顶部添加一个新的堆栈帧。每次函数返回时,顶部堆栈帧都会被删除。在执行的任何给定点,应用程序只能直接访问最顶层堆栈帧中的数据。(指针可以解决这个问题,但这样做通常是一个坏主意。)这种设计使递归成为可能,因为对函数的每个嵌套调用都有自己的局部变量和参数副本。
Figure 2-1 illustrates the organization of the stack. Note that this figure is schematic only; the actual content and order of data put on the stack depends on the architecture of the CPU being used. See OS X ABI Function Call Guide for descriptions of the function-calling conventions used in all the architectures supported by macOS.图2-1说明了堆栈的组织。请注意,此图仅为示意图; 放在堆栈上的实际内容和数据顺序取决于所使用的CPU的体系结构。有关macOS支持的所有体系结构中使用的函数调用约定的说明
Figure 2-1 Schematic view of the stackIn general, an application should check all input data to make sure it is appropriate for the purpose intended (for example, making sure that a filename is of legal length and contains no illegal characters). Unfortunately, in many cases, programmers do not bother, assuming that the user will not do anything unreasonable.
This becomes a serious problem when the application stores that data into a fixed-size buffer. If the user is malicious (or opens a file that contains data created by someone who is malicious), he or she might provide data that is longer than the size of the buffer. Because the function reserves only a limited amount of space on the stack for this data, the data overwrites other data on the stack.
通常,应用程序应检查所有输入数据以确保它适用于预期目的(例如,确保文件名具有合法长度且不包含非法字符)。不幸的是,在许多情况下,程序员不会打扰,假设用户不会做任何不合理的事情。
当应用程序将该数据存储到固定大小的缓冲区时,这将成为一个严重的问题。如果用户是恶意的(或打开包含由恶意某人创建的数据的文件),他或她可能会提供长度超过缓冲区大小的数据。由于该函数仅在堆栈上为此数据保留有限的空间,因此数据会覆盖堆栈上的其他数据。
As shown in Figure 2-2, a clever attacker can use this technique to overwrite the return address used by the function, substituting the address of other code. Then, when function C completes execution, rather than returning to function B, it jumps to the attacker’s code.
Because the application executes the attacker’s code, the attacker’s code inherits the user’s permissions. If the user is logged on as an administrator, the attacker can take complete control of the computer, reading data from the disk, sending emails, and so forth. (Note that apps subject to sandboxing, which includes any app in iOS, as well as any for which you adopt App Sandbox in macOS, are unable to take complete control of the device, although an attacker can still access the app’s own data this way.)
如图2-2所示,聪明的攻击者可以使用此技术覆盖函数使用的返回地址,替换其他代码的地址。然后,当函数C完成执行,而不是返回到函数B时,它会跳转到攻击者的代码。
由于应用程序执行攻击者的代码,因此攻击者的代码会继承用户的权限。如果用户以管理员身份登录,则攻击者可以完全控制计算机,从磁盘读取数据,发送电子邮件等。(请注意,受沙盒影响的应用程序,包括iOS中的任何应用程序,以及您在macOS中采用App Sandbox的任何应用程序,都无法完全控制该设备,尽管攻击者仍然可以通过这种方式访问应用程序自己的数据。)
Figure 2-2 Stack after malicious buffer overflowIn addition to attacks on the linkage information, an attacker can also alter program operation by modifying local data and function parameters on the stack. For example, instead of connecting to the desired host, the attacker could modify a data structure so that your application connects to a different (malicious) host.除了对链接信息的攻击之外,攻击者还可以通过修改堆栈上的本地数据和函数参数来改变程序操作。例如,攻击者可以修改数据结构,而不是连接到所需的主机,以便您的应用程序连接到不同的(恶意)主机。
The heap is used for all dynamically allocated memory in your application. When you use malloc, the C++ new operator, or equivalent functions to allocate a block of memory or instantiate an object, the memory that backs those pointers is allocated on the heap.
Because the heap is used to store data but is not used to store the return address value of functions and methods, and because the data on the heap changes in a nonobvious way as a program runs, it is less obvious how an attacker can exploit a buffer overflow on the heap. To some extent, it is this nonobviousness that makes heap overflows an attractive target—programmers are less likely to worry about them and defend against them than they are for stack overflows.
堆用于应用程序中所有动态分配的内存。当您使用malloc C++ new运算符或等效函数来分配内存块或实例化对象时,支持这些指针的内存将在堆上分配。
因为堆用于存储数据但不用于存储函数和方法的返回地址值,并且由于堆上的数据在程序运行时以非显而易见的方式发生变化,因此攻击者如何利用堆上的缓冲区溢出,它不太明显。在某种程度上,正是这种非显而易见性使得堆溢出成为一个有吸引力的目标 - 程序员不太可能担心它们并且防御它们而不是堆栈溢出。
Figure 2-1 illustrates a heap overflow overwriting a pointer.
Figure 2-3 Heap overflowIn general, exploiting a buffer overflow on the heap is more challenging than exploiting an overflow on the stack. However, many successful exploits have involved heap overflows. There are two ways in which heap overflows are exploited: by modifying data and by modifying objects.
An attacker can exploit a buffer overflow on the heap by overwriting critical data, either to cause the program to crash or to change a value that can be exploited later (overwriting a stored user ID to gain additional access, for example). Modifying this data is known as a non-control-data attack. Much of the data on the heap is generated internally by the program rather than copied from user input; such data can be in relatively consistent locations in memory, depending on how and when the application allocates it.
通常,利用堆上的缓冲区溢出比利用堆栈上的溢出更具挑战性。但是,许多成功的漏洞都涉及堆溢出。有两种方法可以利用堆溢出:通过修改数据和修改对象。
攻击者可以通过覆盖关键数据来利用堆上的缓冲区溢出,导致程序崩溃或更改可以在以后利用的值(例如,覆盖存储的用户ID以获得额外的访问权限)。修改此数据称为非控制数据攻击。堆上的大部分数据都是由程序在内部生成的,而不是从用户输入中复制的; 这些数据可以在内存中相对一致的位置,具体取决于应用程序分配它的方式和时间。
An attacker can also exploit a buffer overflow on the heap by overwriting pointers. In many languages such as C++ and Objective-C, objects allocated on the heap contain tables of function and data pointers. By exploiting a buffer overflow to change such pointers, an attacker can potentially substitute different data or even replace the instance methods in a class object.攻击者还可以通过覆盖指针来利用堆上的缓冲区溢出。在诸如C ++和Objective-C的许多语言中,在堆上分配的对象包含函数和数据指针的表。通过利用缓冲区溢出来更改此类指针,攻击者可以替换不同的数据,甚至可以替换类对象中的实例方法。
Exploiting a buffer overflow on the heap might be a complex, arcane problem to solve, but some malicious hackers thrive on just such challenges. For example:
A heap overflow in code for decoding a bitmap image allowed remote attackers to execute arbitrary code.
A heap overflow vulnerability in a networking server allowed an attacker to execute arbitrary code by sending an HTTP POST request with a negative “Content-Length” header.
利用堆上的缓冲区溢出可能是一个复杂,棘手的问题需要解决,但是一些恶意黑客就是在这些挑战中茁壮成长。例如:
用于解码位图图像的代码中的堆溢出允许远程攻击者执行任意代码。
网络服务器中的堆溢出漏洞允许攻击者通过发送带有负“Content-Length”标头的HTTP POST请求来执行任意代码
[1CVE-2006-0006 2CVE-2005-3655]
Strings are a common form of input. Because many string-handling functions have no built-in checks for string length, strings are frequently the source of exploitable buffer overflows. Figure 2-4 illustrates the different ways three string copy functions handle the same over-length string.字符串是一种常见的输入形式。因为很多字符串处理函数没有内置的字符串长度检查,字符串通常是可利用的缓冲区溢出的来源。图2-4说明了三个字符串复制函数处理相同超长字符串的不同方式。
Figure 2-4 C string handling functions and buffer overflowsAs you can see, the strcpy function merely writes the entire string into memory, overwriting whatever came after it.
The strncpy function truncates the string to the correct length, but without the terminating null character. When this string is read, then, all of the bytes in memory following it, up to the next null character, might be read as part of the string. Although this function can be used safely, it is a frequent source of programmer mistakes, and thus is regarded as moderately unsafe. To safely use strncpy, you must either explicitly zero the last byte of the buffer after calling strncpy or pre-zero the buffer and then pass in a maximum length that is one byte smaller than the buffer size.
Only the strlcpy function is fully safe, truncating the string to one byte smaller than the buffer size and adding the terminating null character.
Table 2-1 summarizes the common C string-handling routines to avoid and which to use instead.
如您所见,该strcpy函数仅将整个字符串写入内存,覆盖后面的内容。
strncpy函数将字符串截断为正确的长度,但没有终止空字符。当读取该字符串时,跟随它的内存中的所有字节,直到下一个空字符,都可能被读作字符串的一部分。虽然这个功能可以安全使用,但它常常是程序员错误的根源,因此被认为是中度不安全的。要安全使用strncpy,必须在调用缓冲区之后显式清零缓冲区的最后一个字节,strncpy然后传入一个比缓冲区大小小一个字节的最大长度。
只有strlcpy函数是完全安全的,将字符串截断为小于缓冲区大小的一个字节并添加终止空字符。
表2-1总结了要避免和使用的常见C字符串处理例程。
Don’t use these functions |
Use these instead |
---|---|
fgets (see note) or use Core Foundation or Foundation APIs |
Security Note for snprintf and vsnprintf: The functions snprintf, vsnprintf, and variants are dangerous if used incorrectly. Although they do behave functionally like strlcat and similar in that they limit the bytes written to n-1, the length returned by these functions is the length that would have been printed if n were infinite.
For this reason, you must not use this return value to determine where to null-terminate the string or to determine how many bytes to copy from the string at a later time.snprintf和vsnprintf安全注意事项: 如果使用不当,snprintf、vsnprintf和变量是很危险的。虽然它们的功能strlcat和功能相似,但它们限制了写入的字节数为n-1,如果n是无限时,但这些函数返回的长度将是纸面长度。
因此,您不得使用此返回值来确定将字符串空终止的位置,或确定稍后从字符串中复制的字节数。
Security Note for fgets: Although the fgets function provides the ability to read a limited amount of data, you must be careful when using it. Like the other functions in the “safer” column, fgets always terminates the string. However, unlike the other functions in that column, it takes a maximum number of bytes to read, not a buffer size.
In practical terms, this means that you must always pass a size value that is one fewer than the size of the buffer to leave room for the null termination. If you do not, the fgets function will dutifully terminate the string past the end of your buffer, potentially overwriting whatever byte of data follows it.
fgets的安全说明: 虽然该fgets功能提供了读取有限数量数据的功能,但使用它时必须小心。与“更安全”列中的其他函数一样,fgets始终终止字符串。但是,与该列中的其他函数不同,它需要读取最大字节数,而不是缓冲区大小。
实际上,这意味着您必须始终传递一个小于缓冲区大小的大小值,以便为空终止留出空间。如果不这样做,该fgets函数将尽职地终止缓冲区末尾的字符串,可能会覆盖其后的任何数据字节。
You can also avoid string handling buffer overflows by using higher-level interfaces.
If you are using C++, the ANSI C++ string class avoids buffer overflows, though it doesn’t handle non-ASCII encodings (such as UTF-8).
If you are writing code in Objective-C, use the NSString class. Note that an NSString object has to be converted to a C string in order to be passed to a C routine, such as a POSIX function.
If you are writing code in C, you can use the Core Foundation representation of a string, referred to as a CFString, and the string-manipulation functions in the CFString API.
您还可以通过使用更高级别的接口来避免字符串处理缓冲区溢出。
如果您使用的是C ++,则ANSI C ++ string类可以避免缓冲区溢出,但它不处理非ASCII编码(例如UTF-8)。
如果要在Objective-C中编写代码,请使用NSString该类。请注意,必须将NSString对象转换为C字符串才能传递给C例程,例如POSIX函数。
如果您使用C编写代码,则可以使用字符串的Core Foundation表示(称为CFString)和CFString API中的字符串操作函数。
The Core Foundation CFString is “toll-free bridged” with its Cocoa Foundation counterpart, NSString. This means that the Core Foundation type is interchangeable in function or method calls with its equivalent Foundation object. Therefore, in a method where you see an NSString * parameter, you can pass in a value of type CFStringRef, and in a function where you see a CFStringRef parameter, you can pass in an NSString instance. This also applies to concrete subclasses of NSString.
See CFString Reference and Foundation Framework Reference for more details on using these representations of strings and on converting between CFString objects and NSString objects.
Core Foundation CFString是“免费桥接”,与其对应的Cocoa Foundation类是NSString。这意味着Core Foundation类型在函数或方法调用中可与其等效的Foundation对象互换。因此,在您看到NSString *参数的方法中,您可以传入type的值CFStringRef,并且在您看到CFStringRef参数的函数中,您可以传入NSString实例。这也适用于具体的子类NSString。
有关使用这些字符串表示以及在CFString对象和对象之间进行转换的更多详细信息,请参阅CFString参考和Foundation Framework参考NSString。
When working with fixed-length buffers, you should always use sizeof to calculate the size of a buffer, and then make sure you don’t put more data into the buffer than it can hold. Even if you originally assigned a static size to the buffer, either you or someone else maintaining your code in the future might change the buffer size but fail to change every case where the buffer is written to.
The first example, Table 2-2, shows two ways of allocating a character buffer 1024 bytes in length, checking the length of an input string, and copying it to the buffer.
使用固定长度的缓冲区时,应始终使用sizeof计算缓冲区的大小,然后确保不要将更多数据放入缓冲区而不是保存缓冲区。即使您最初为缓冲区分配了静态大小,您或将来维护代码的其他人也可能会更改缓冲区大小,但无法更改写入缓冲区的每种情况。
第一个示例(表2-2)显示了两种方法:分配长度为1024字节的字符缓冲区,检查输入字符串的长度,并将其复制到缓冲区。
Instead of this: |
Do this: |
---|---|
char buf[1024]; ... if (size <= 1023) { ... } |
#define BUF_SIZE 1024 ... char buf[BUF_SIZE]; ... if (size < BUF_SIZE) { ... } |
char buf[1024]; ... if (size < 1024) { ... } |
char buf[1024]; ... if (size < sizeof(buf)) { ... } |
The two snippets on the left side are safe as long as the original declaration of the buffer size is never changed. However, if the buffer size gets changed in a later version of the program without changing the test, then a buffer overflow will result.
The two snippets on the right side show safer versions of this code. In the first version, the buffer size is set using a constant that is set elsewhere, and the check uses the same constant. In the second version, the buffer is set to 1024 bytes, but the check calculates the actual size of the buffer. In either of these snippets, changing the original size of the buffer does not invalidate the check.
只要缓冲区大小的原始声明永远不会更改,左侧的两个片段就是安全的。但是,如果在不更改测试的情况下在更高版本的程序中更改缓冲区大小,则会导致缓冲区溢出。
右侧的两个片段显示了此代码的更安全版本。在第一个版本中,缓冲区大小使用在其他位置设置的常量设置,并且检查使用相同的常量。在第二个版本中,缓冲区设置为1024字节,但检查计算缓冲区的实际大小。在其中任何一个片段中,更改缓冲区的原始大小不会使检查无效。
Table 2-3, shows a function that adds an .ext suffix to a filename.表2-3显示了为文件名添加.ext后缀的函数。
Instead of this: |
Do this: |
---|---|
{ char file[MAX_PATH]; ... addsfx(file); ... } static *suffix = ".ext"; char *addsfx(char *buf) { return strcat(buf, suffix); } |
{ char file[MAX_PATH]; ... addsfx(file, sizeof(file)); ... } static *suffix = ".ext"; size_t addsfx(char *buf, uint size) { size_t ret = strlcat(buf, suffix, size); if (ret >= size) { fprintf(stderr, "Buffer too small....\n"); } return ret; } |
Both versions use the maximum path length for a file as the buffer size. The unsafe version in the left column assumes that the filename does not exceed this limit, and appends the suffix without checking the length of the string. The safer version in the right column uses the strlcat function, which truncates the string if it exceeds the size of the buffer.两个版本都使用文件的最大路径长度作为缓冲区大小。左列中的不安全版本假定文件名不超过此限制,并附加后缀而不检查字符串的长度。右列中更安全的版本使用该strlcat函数,如果超过缓冲区大小,则会截断该字符串
Important: You should always use an unsigned variable (such as size_t) when calculating sizes of buffers and of data going into buffers. Because negative numbers are stored as large positive numbers, if you use signed variables, an attacker might be able to cause a miscalculation in the size of the buffer or data by writing a large number to your program. See Avoiding Integer Overflows and Underflows for more information on potential problems with integer arithmetic.
重要提示:size_t在计算缓冲区大小和进入缓冲区的数据时, 应始终使用无符号变量(例如)。因为负数存储为大的正数,所以如果使用有符号变量,攻击者可能会通过向程序写入一个大数字来导致错误估计缓冲区或数据的大小。有关整数算术可能存在的问题的详细信息,请参阅避免整数溢出和下溢。
For a further discussion of this issue and a list of more functions that can cause problems, see Wheeler, Secure Programming HOWTO (http://www.dwheeler.com/secure-programs/).
If the size of a buffer is calculated using data supplied by the user, there is the potential for a malicious user to enter a number that is too large for the integer data type, which can cause program crashes and other problems.
In two’s-complement arithmetic (used for signed integer arithmetic by most modern CPUs), a negative number is represented by inverting all the bits of the binary number and adding 1. A 1 in the most-significant bit indicates a negative number. Thus, for 4-byte signed integers, 0x7fffffff = 2147483647, but 0x80000000 = -2147483648
Therefore,
如果使用用户提供的数据计算缓冲区的大小,则恶意用户可能输入对于整数数据类型而言太大的数字,这可能导致程序崩溃和其他问题。
在 二进制补码算法(用于大多数现代CPU的有符号整数运算),a 通过反转二进制数的所有位并加1来1表示负数。最高有效位中的A表示负数。因此,对于4字节有符号整数0x7fffffff = 2147483647,但是0x80000000 = -2147483648
因此,
int 2147483647 + 1 = - 2147483648
If a malicious user specifies a negative number where your program is expecting only unsigned numbers, your program might interpret it as a very large number. Depending on what that number is used for, your program might attempt to allocate a buffer of that size, causing the memory allocation to fail or causing a heap overflow if the allocation succeeds. In an early version of a popular web browser, for example, storing objects into a JavaScript array allocated with negative size could overwrite memory. [CVE-2004-0361]如果恶意用户指定了一个负数,而您的程序只需要无符号数,那么您的程序可能会将其解释为非常大的数字。根据该数字的用途,您的程序可能会尝试分配该大小的缓冲区,导致内存分配失败或在分配成功时导致堆溢出。例如,在流行的Web浏览器的早期版本中,将对象存储到以负大小分配的JavaScript数组中可能会覆盖内存。[CVE-2004-0361]
In other cases, if you use signed values to calculate buffer sizes and test to make sure the data is not too large for the buffer, a sufficiently large block of data will appear to have a negative size, and will therefore pass the size test while overflowing the buffer.在其他情况下,如果使用有符号值来计算缓冲区大小并进行测试以确保数据对于缓冲区而言不是太大,则足够大的数据块将显示为负大小,因此将通过大小测试溢出缓冲区。
Depending on how the buffer size is calculated, specifying a negative number could result in a buffer too small for its intended use. For example, if your program wants a minimum buffer size of 1024 bytes and adds to that a number specified by the user, an attacker might cause you to allocate a buffer smaller than the minimum size by specifying a large positive number, as follows:根据缓冲区大小的计算方式,指定负数可能会导致缓冲区太小而无法达到预期用途。例如,如果程序希望最小缓冲区大小为1024字节并将其添加到用户指定的数字,则攻击者可能会通过指定大的正数来分配小于最小大小的缓冲区,如下所示:
1024 + 4294966784 = 512
0x400 + 0xFFFFFE00 = 0x200
Also, any bits that overflow past the length of an integer variable (whether signed or unsigned) are dropped. For example, when stored in a 32-bit integer, 2**32 == 0. Because it is not illegal to have a buffer with a size of 0, and because malloc(0) returns a pointer to a small block, your code might run without errors if an attacker specifies a value that causes your buffer size calculation to be some multiple of 2**32. In other words, for any values of n and m where (n * m) mod 2**32 == 0, allocating a buffer of size n*m results in a valid pointer to a buffer of some very small (and architecture-dependent) size. In that case, a buffer overflow is assured.
To avoid such problems, when performing buffer math, you should always include range checks to make sure no integer overflow is about to occur.
A common mistake when performing these tests is to check the result of a potentially overflowing multiplication or other operation:
此外,任何溢出超过整数变量长度的位(无论是有符号还是无符号)都将被删除。例如,当存储在32位整数中时, 2**32 == 0。因为拥有大小为0的缓冲区并不是非法,并且因为malloc(0)返回指向小块的指针,所以如果攻击者指定的值导致缓冲区大小计算为某个倍数,则代码可能会无错误地运行2**32。换句话说,对于任何值n和mwhere (n * m) mod 2**32 == 0,分配大小的缓冲区会n*m导致指向某个非常小(和体系结构相关)大小的缓冲区的有效指针。在这种情况下,确保缓冲区溢出。
为避免此类问题,在执行缓冲区数学运算时,应始终包括范围检查以确保不会发生整数溢出。
执行这些测试时常见的错误是检查可能溢出的乘法或其他操作的结果:
size_t bytes = n * m;
if (bytes < n || bytes < m) { /* BAD BAD BAD */
... /* allocate “bytes” space */
}
Unfortunately, if m and n are signed, the C language specification allows the compiler to optimize out such tests [CWE-733, CERT VU#162289]. Even if they are unsigned, the test still fails in some cases. For example, on a 64-bit machine, if m and n are both declared size_t, and both set to 0x180000000, the result of multiplying them is 0x24000000000000000, but bytes will contain that result modulo 2**64, or 0x4000000000000000. This passes the test (the result is bigger than either input), despite the fact that overflow did occur.
Instead, the correct way to test for integer overflow during multiplication is to test before the multiplication. In particular, you divide the maximum allowable result by the multiplier and compare the result to the multiplicand or vice-versa. If the result is smaller than the multiplicand, the product of those two values would cause an integer overflow. Still, getting this right can be tricky. For example, choosing the wrong maximum allowable integer constant (e.g., SIZE_MAX or INT_MAX?) produces incorrect results.
Therefore the safest way to perform multiplication with unknown inputs is to use the clang checked arithmetic builtins. For example:
不幸的是,如果m和n已经签名,C语言规范允许编译器优化这些测试[CWE-733,CERT VU#162289]。即使它们未签名,在某些情况下测试仍然失败。例如,在64位机器上,如果m和n都被声明size_t,并且两者都设置为0x180000000,则将它们相乘的结果是0x24000000000000000,但bytes将包含该结果模数2**64,或0x4000000000000000。这通过了测试(结果大于任一输入),尽管确实发生了溢出。
相反,在乘法期间测试整数溢出的正确方法是在乘法之前进行测试。特别是,您将最大允许结果除以乘数,并将结果与被乘数进行比较,反之亦然。如果结果小于被乘数,则这两个值的乘积将导致整数溢出。尽管如此,做到这一点仍然很棘手。例如,选择错误的最大允许整数常量(例如,SIZE_MAX或INT_MAX?)会产生不正确的结果。
因此,使用未知输入执行乘法的最安全方法是使用clang检查算术内置函数。例如:
size_t bytes;
if (__builtin_umull_overflow(m, n, &bytes)) {
/* Overflow occured. Handle appropriately. */
} else {
/* Allocate "bytes" space. */
}
In this case, __builtin_umull_overflow performs an unsigned multiplication (casting m and n as needed), and stores the (potentially overflowed) result in bytes, but also returns a boolindicating if overflow occurred as a result of the operation.
When you build using the macOS 10.12 or iOS 10 SDK or later, you can enhance this code by using builtin wrappers, defined in the os/overflow.h header file:
在这种情况下,__builtin_umull_overflow执行无符号乘法(转换m并n根据需要),并存储(可能溢出)结果bytes,但也返回bool指示是否由于操作而发生溢出。
使用macOS 10.12或iOS 10 SDK或更高版本构建时,可以使用os/overflow.h头文件中定义的内置包装来增强此代码:
#include <os/overflow.h>
if (os_mul_overflow(m, n, &bytes)) {
/* Overflow occured. Handle appropriately. */
} else {
/* Allocate "bytes" space. */
}
The os_mul_overflow macro (like its siblings os_add_overflow and os_sub_overflow) wraps a new clang builtin that detects overflow correctly even for mixed-type integer arithmetic. This removes the need to convert the arguments and result to a common type, eliminating another source of overflow errors. The wrapper also generates a compile-time warning if you do not use the return value, helping to ensure that your code does in fact take some action based on a reported overflow condition.os_mul_overflow宏(如它的兄弟姐妹os_add_overflow和os_sub_overflow)包装了新clang内置,这些内置正确检测溢出即使对于混合类型整数运算。这消除了将参数和结果转换为公共类型的需要,从而消除了另一个溢出错误源。如果不使用返回值,则包装器还会生成编译时警告,从而有助于确保代码确实根据报告的溢出条件执行某些操作。
To test for buffer overflows, you should attempt to enter more data than is asked for wherever your program accepts input. Also, if your program accepts data in a standard format, such as graphics or audio data, you should attempt to pass it malformed data. This process is known as fuzzing.
If there are buffer overflows in your program, it will eventually crash. (Unfortunately, it might not crash until some time later, when it attempts to use the data that was overwritten.) The crash log might provide some clues that the cause of the crash was a buffer overflow. If, for example, you enter a string containing the uppercase letter “A” several times in a row, you might find a block of data in the crash log that repeats the number 41, the ASCII code for “A” (see Figure 2-2). If the program is trying to jump to a location that is actually an ASCII string, that’s a sure sign that a buffer overflow was responsible for the crash.
为了测试 缓冲区溢出,您应该尝试输入比程序接受的地方更多的数据 输入。此外,如果您的程序接受标准格式的数据,如图形或音频数据,您应该尝试传递格式错误的数据。这个过程称为模糊测试。
如果程序中存在缓冲区溢出,它最终会崩溃。(不幸的是,它可能会在一段时间后才会崩溃,当它尝试使用被覆盖的数据时。)崩溃日志可能会提供一些线索,说明崩溃的原因是缓冲区溢出。例如,如果连续多次输入包含大写字母“A”的字符串,则可能会在崩溃日志中找到重复数字41的数据块,即“A”的ASCII代码(参见图2) -2)。如果程序试图跳转到实际上是ASCII字符串的位置,那么这肯定表明缓冲区溢出是造成崩溃的原因。
Figure 2-5 Buffer overflow crash logIf there are any buffer overflows in your program, you should always assume that they are exploitable and fix them. It is much harder to prove that a buffer overflow is not exploitable than to just fix the bug. Also note that, although you can test for buffer overflows, you cannot test for the absence of buffer overflows; it is necessary, therefore, to carefully check every input and every buffer size calculation in your code.如果程序中存在任何缓冲区溢出,则应始终假设它们可被利用并修复它们。要证明缓冲区溢出不能被利用而不仅仅是修复bug要困难得多。另请注意,虽然您可以测试缓冲区溢出,但您无法测试是否存在缓冲区溢出; 因此,有必要仔细检查代码中的每个输入和每个缓冲区大小计算。
For more information on fuzzing, see Fuzzing in Validating Input and Interprocess Communication.
Fundamentally, buffer underflows occur when two parts of your code disagree about the size of a buffer or the data in that buffer. For example, a fixed-length C string variable might have room for 256 bytes, but might contain a string that is only 12 bytes long.
Buffer underflow conditions are not always dangerous; they become dangerous when correct operation depends upon both parts of your code treating the data in the same way. This often occurs when you read the buffer to copy it to another block of memory, to send it across a network connection, and so on.
There are two broad classes of buffer underflow vulnerabilities: short writes, and short reads.
从根本上说,当代码的两个部分不同意缓冲区的大小或缓冲区中的数据时,会发生缓冲区下溢。例如,固定长度的C字符串变量可能有256个字节的空间,但可能包含一个只有12个字节长的字符串。
缓冲区下溢条件并不总是危险的; 当正确的操作依赖于代码的两个部分以相同的方式处理数据时,它们会变得危险。当您读取缓冲区以将其复制到另一个内存块,通过网络连接发送它等时,通常会发生这种情况。
有两大类缓冲区下溢漏洞:短写和短读。
A short write vulnerability occurs when a short write to a buffer fails to fill the buffer completely. When this happens, some of the data that was previously in the buffer is still present after the write. If the application later performs an operation on the entire buffer (writing it to disk or sending it over the network, for example), that existing data comes along for the ride. The data could be random garbage data, but if the data happens to be interesting, you have an information leak.
Further, when such an underflow occurs, if the values in those locations affect program flow, the underflow can potentially cause incorrect behavior up to and including allowing you to skip past an authentication or authorization step by leaving the existing authorization data on the stack from a previous call by another user, application, or other entity.
当短写到缓冲器失败以完全填充缓冲器时发生短写入漏洞时。发生这种情况时,写入后,缓冲区中的某些数据仍然存在。如果应用程序稍后对整个缓冲区执行操作(例如,将其写入磁盘或通过网络发送),那么现有数据就会出现。数据可能是随机垃圾数据,但如果数据恰好有趣,则会出现信息泄漏。
此外,当发生这样的下溢时,如果这些位置中的值影响程序流,则下溢可能会导致不正确的行为,包括允许您跳过验证或授权步骤,方法是将现有授权数据保留在堆栈中其他用户,应用程序或其他实体的上一次调用。
Short write example (system call): For example, consider a UNIX system call that requires a command data structure, and includes an authorization token in that data structure. Assume that there are multiple versions of the data structure, with different lengths, so the system call takes both the structure and the length. Assume that the authorization token is fairly far down in the structure.
Suppose a malicious application passes in a command structure, and passes a size that encompasses the data up to, but not including, the authorization token. The kernel’s system call handler calls copyin, which copies a certain number of bytes from the application into the data structure in the kernel’s address space. If the kernel does not zero-fill that data structure, and if the kernel does not check to see if the size is valid, there is a narrow possibility that the stack might still contain the previous caller’s authorization token at the same address in kernel memory. Thus, the attacker is able to perform an operation that should have been disallowed.
短写示例(系统调用): 例如,考虑需要命令数据结构的UNIX系统调用,并在该数据结构中包含授权令牌。假设数据结构有多个版本,长度不同,因此系统调用同时采用结构和长度。假设授权令牌在结构中相当远。
假设恶意应用程序传入命令结构,并传递包含数据的大小,但不包括授权令牌。内核的系统调用处理程序调用copyin,它将应用程序中的一定数量的字节复制到内核地址空间中的数据结构中。如果内核没有对该数据结构进行零填充,并且内核没有检查大小是否有效,则堆栈可能仍然包含内核内存中相同地址的前一个调用者的授权令牌。 。因此,攻击者能够执行本应该被禁止的操作。
A short read vulnerability occurs when a read from a buffer fails to read the complete contents of a buffer. If the program then makes decisions based on that short read, any number of erroneous behaviors can result. This usually occurs when a C string function is used to read from a buffer that does not actually contain a valid C string.
A C string is defined as a string containing a series of bytes that ends with a null terminator. By definition, it cannot contain any null bytes prior to the end of the string. As a result, C-string-based functions, such as strlen, strlcpy, and strdup, copy a string until the first null terminator, and have no knowledge of the size of the original source buffer.
By contrast, strings in other formats (a CFStringRef object, a Pascal string, or a CFDataRef blob, for example) have an explicit length and can contain null bytes at arbitrary locations in the data. If you convert such a string into a C string and then evaluate that C string, you get incorrect behavior because the resulting C string effectively ends at the first null byte.
当从缓冲器读无法读取缓冲器的完整内容时发生短读漏洞。如果程序然后基于该短读取做出决定,则可导致任何数量的错误行为。当使用C字符串函数从实际上不包含有效C字符串的缓冲区中读取时,通常会发生这种情况。
C字符串定义为包含一系列以空终止符结尾的字节的字符串。根据定义,它不能包含字符串结尾之前的任何空字节。其结果是,基于C-串的功能,如strlen,strlcpy,和strdup,复制一个字符串,直到第一空终止,并且不知道原始源缓冲区的大小。
相比之下,其他格式的字符串(例如,CFStringRef对象,Pascal字符串或CFDataRefblob)具有显式长度,并且可以在数据中的任意位置包含空字节。如果将此类字符串转换为C字符串然后评估该C字符串,则会得到不正确的行为,因为生成的C字符串实际上在第一个空字节处结束。
Short read example (SSL verification): An example of a short read vulnerability occurred in many SSL stacks a few years ago. By applying for an SSL cert for a carefully crafted subdomain of a domain that you own, you could effectively create a certificate that was valid for arbitrary domains.
Consider a subdomain in the form targetdomain.tld[null_byte].yourdomain.tld.
Because the certificate signing request contains a Pascal string, assuming that the certificate authority interprets it correctly, the certificate authority would contact the owner of yourdomain.tld and would ask for permission to deliver the certificate. Because you own the domain, you would agree to it. You would then have a certificate that is valid for the rather odd-looking subdomain in question.
简短阅读示例(SSL验证): 几年前许多SSL堆栈中出现了一个短读取漏洞的示例。通过为您拥有的域的精心设计的子域申请SSL证书,您可以有效地创建对任意域有效的证书。
考虑targetdomain.tld[null_byte].yourdomain.tld形式的子域。
由于证书签名请求包含Pascal字符串,假设证书颁发机构正确解释它,证书颁发机构将联系所有者yourdomain.tld并要求获得颁发证书的权限。因为您拥有该域名,所以您同意这一点。然后,您将拥有一个对相关奇怪的子域有效的证书。
When checking the certificate for validity, however, many SSL stacks incorrectly converted that Pascal string into a C string without any validity checks. When this happened, the resulting C string contained only the targetdomain.tld portion. The SSL stack then compared that truncated version with the domain the user requested, and interpreted the certificate as being valid for the targeted domain.
In some cases, it was even possible to construct wildcard certificates that were valid for every possible domain in such browsers (*.com[null].yourdomain.tld would match every .comaddress, for example).
但是,在检查证书的有效性时,许多SSL堆栈错误地将Pascal字符串转换为C字符串而没有任何有效性检查。发生这种情况时,生成的C字符串只包含该targetdomain.tld部分。然后,SSL堆栈将截断的版本与用户请求的域进行比较,并将证书解释为对目标域有效。
在某些情况下,甚至可以构造对这些浏览器中的每个可能域都有效的通配符证书(例如,*.com[null].yourdomain.tld将匹配每个.com地址)。
If you obey the following rules, you should be able to avoid most underflow attacks:如果遵守以下规则,您应该能够避免大多数下溢攻击:
Zero-fill all buffers before use. A buffer that contains only zeros cannot contain stale sensitive information.
使用前将所有缓冲区归零。仅包含零的缓冲区不能包含过时敏感信息。
Always check return values and fail appropriately.
始终检查返回值并的某些条件下不执行功能。
If a call to an allocation or initialization function fails (AuthorizationCopyRights, for example), do not evaluate the resulting data, as it could be stale.
如果对分配或初始化函数的调用失败(例如AuthorizationCopyRights),则不要评估结果数据,因为它可能是陈旧的。
Use the value returned from read system calls and other similar calls to determine how much data was actually read. Then either:
Use that result to determine how much data is present instead of using a predefined constant or
fail if the function did not return the expected amount of data.
使用read系统调用和其他类似调用返回的值来确定实际读取的数据量。然后:
使用该结果来确定存在多少数据而不是使用预定义的常量或
如果函数未返回预期的数据量,则会失败。
Display an error and fail if a write call, printf call, or other output call returns without writing all of the data, particularly if you might later read that data back.
如果write调用,printf调用或其他输出调用返回而未写入所有数据,则显示错误并失败,特别是如果您稍后可能会回读该数据。
When working with data structures that contain length information, always verify that the data is the size you expected.
处理包含长度信息的数据结构时,请始终验证数据是否符合预期。
Avoid converting non-C strings (CFStringRef objects, NSString objects, CFDataRef objects, Pascal strings, and so on) into C strings if possible. Instead, work with the strings in their original format.
If this is not possible, always perform length checks on the resulting C string or check for null bytes in the source data.如果可能,避免将非C字符串(CFStringRef对象,NSString对象,CFDataRef对象,Pascal字符串等)转换为C字符串。相反,使用原始格式的字符串。
如果无法做到这一点,请始终对生成的C字符串执行长度检查,或检查源数据中的空字节。
Avoid mixing buffer operations and string operations. If this is not possible, always perform length checks on the resulting C string or check for null bytes in the source data.
Save files in a fashion that prevents malicious tampering or truncation. (See Race Conditions and Secure File Operations for more information.)
Avoid integer overflows and underflows. (See Calculating Buffer Sizes for details.)
避免混合缓冲区操作和字符串操作。如果无法做到这一点,请始终对生成的C字符串执行长度检查,或检查源数据中的空字节。
以防止恶意篡改或截断的方式保存文件。(有关详细信息,请参阅竞争条件和安全文件操作。)
避免整数溢出和下溢。
macOS and iOS provide two features that can make it harder to exploit stack and buffer overflows: address space layout randomization (ASLR) and a non-executable stack and heap. These features are briefly explained in the sections that follow.macOS和iOS提供了两个功能,可以使得更难以利用堆栈和缓冲区溢出:地址空间布局随机化(ASLR)和不可执行的堆栈和堆。以下各节简要介绍了这些功能
Recent versions of macOS and iOS, where possible, choose different locations for your stack, heap, libraries, frameworks, and executable code each time you run your software. This makes it much harder to successfully exploit buffer overflows because it is no longer possible to know where the buffer is in memory, nor is it possible to know where libraries and other code are located.
Address space layout randomization requires some help from the compiler—specifically, it requires position-independent code.
If you are compiling an executable that targets macOS 10.7 and later or iOS 4.3 and later, the necessary flags are enabled by default. You can disable this feature, if necessary, with the -no_pie flag, but for maximum security, you should not do so.
If you are compiling an executable that targets an earlier OS, you must explicitly enable position-independent executable support by adding the -pie flag.
最近版本的macOS和iOS,在可能的情况下,每次运行软件时都会为堆栈,堆,库,框架和可执行代码选择不同的位置。这使得成功利用缓冲区溢出变得更加困难,因为不再可能知道缓冲区在内存中的位置,也无法知道库和其他代码的位置。
地址空间布局随机化需要编译器的一些帮助 - 具体来说,它需要与位置无关的代码。
如果要编译面向macOS 10.7及更高版本或iOS 4.3及更高版本的可执行文件,则默认情况下会启用必要的标志。如有必要,您可以使用-no_pie标志禁用此功能,但为了最大限度地提高安全性,您不应该这样做。
如果要编译以早期操作系统为目标的可执行文件,则必须通过添加-pie标志显式启用与位置无关的可执行文件支持。
Recent processors support a feature called the NX bit that allows the operating system to mark certain parts of memory as non-executable. If the processor tries to execute code in any memory page marked as non-executable, the program in question crashes.
macOS and iOS take advantage of this feature by marking the stack and heap as non-executable. This makes buffer overflow attacks harder because any attack that places executable code on the stack or heap and then tries to run that code will fail.最近的处理器支持称为NX位的功能,允许操作系统将某些内存部分标记为不可执行。如果处理器尝试在标记为不可执行的任何内存页面中执行代码,则该程序崩溃。
macOS和iOS通过将堆栈和堆标记为不可执行来利用此功能。这使得缓冲区溢出攻击更加困难,因为任何将可执行代码放在堆栈或堆上然后尝试运行该代码的攻击都将失败。
Note: For 32-bit macOS apps, if your app allows execution on OS X prior to 10.7, only the stack is marked non-executable, not the heap.注意: 对于32位macOS应用程序,如果您的应用程序允许在10.7之前的OS X上执行,则只有堆栈被标记为不可执行,而不是堆。
Most of the time, this is the behavior that you want. However, in some rare situations (such as writing a just-in-time compiler), it may be necessary to modify that behavior.
There are two ways to make the stack and heap executable:
Pass the -allow_stack_execute flag to the compiler. This makes the stack (not the heap) executable.
Use the mprotect system call to mark specific memory pages as executable.
The details are beyond the scope of this document. For more information, see the manual page for mprotect.
大多数情况下,这是您想要的行为。但是,在某些罕见的情况下(例如编写即时编译器),可能需要修改该行为。
有两种方法可以使堆栈和堆可执行:
将-allow_stack_execute标志传递给编译器。这使得堆栈(而不是堆)可执行。
使用mprotect系统调用将特定内存页标记为可执行文件。
详细信息超出了本文档的范围
To help you debug heap corruption bugs, you can use the libgmalloc library. It provides additional detection of overflows through the use of guard pages and other techniques. To enable this library, type the following command in Terminal:
export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
Then run your software from Terminal (either by running the executable itself or using the open command). For more information, see the manual page for libgmalloc.
In addition to -pie and -allow_stack_execute, the following flags have an effect on security:
-fstack-protector or -fstack-protector-all—Enables stack canaries (special values that, if modified, mean that the adjacent string overflowed its bounds) and changes the order of items in the stack to minimize the risk of corruption. When your functions call other functions that potentially could overflow, the compiler then inserts additional code afterwards to verify that the canary values have not been modified.
The -fstack-protector flag enables stack canaries only for functions that contain buffers over 8 bytes (a string on the stack, for example), and is enabled by default when compiling for macOS 10.6 and later.
The -fstack-protector-all flag enables stack canaries for all functions.
-D_FORTIFY_SOURCE—Adds additional static and dynamic bounds checking to a number of functions that normally provide none (sprintf, vsprintf, snprintf, vsnprintf, memcpy,mempcpy, memmove, memset, strcpy, stpcpy, strncpy, strcat and strncat). If set to level 1 (-D_FORTIFY_SOURCE=1), only compile-time checking is performed. Level 1 is enabled by default when compiling for macOS 10.6 and later. At level 2, additional run-time checking is performed.
MallocCorruptionAbort—An environment variable that tells 32-bit applications to abort if a malloc call would fail because of a corrupted heap structure. Aborting on heap corruption is automatically enabled for 64-bit applications.
除了-pie和之外-allow_stack_execute,以下标志对安全性有影响:
-fstack-protector或-fstack-protector-all-Enables堆栈canaries(特殊值,如果已修改,则表示相邻字符串溢出其边界)并更改堆栈中项目的顺序以最大程度地降低损坏风险。当您的函数调用可能会溢出的其他函数时,编译器随后会插入其他代码以验证是否未修改canary值。
该-fstack-protector标志仅允许堆栈canaries用于包含超过8个字节的缓冲区的函数(例如,堆栈上的字符串),并且在编译macOS 10.6及更高版本时默认启用。
该-fstack-protector-all标志启用所有功能的堆栈canaries。
-D_FORTIFY_SOURCE-Adds附加静态和动态边界检查到许多功能,通常提供无(sprintf,vsprintf,snprintf,vsnprintf,memcpy,mempcpy,memmove,memset,strcpy,stpcpy,strncpy,strcat和strncat)。如果设置为级别1(-D_FORTIFY_SOURCE=1),则仅执行编译时检查。编译macOS 10.6及更高版本时,默认启用级别1。在级别2,执行额外的运行时检查。
MallocCorruptionAbort- 一个环境变量,它告诉32位应用程序在malloc由于堆结构损坏而导致调用失败时中止。64位应用程序会自动启用对堆损坏的中止。