CS 0019 Debugging Memory Allocator
Introduction
C programmers (that would be us) allocate and free memory explicitly. This means we can write
fast code for modern machines, because we have full control over memory. The bad news is that
it’s all too easy to write programs that crash due to memory problems. But wait: as systems
programmers, we can build tools to help us debug memory allocation problems. For instance, in
this coursework, you will transform a simple memory allocator (e.g., implementation of malloc
and friends) into a debugging memory allocator.
Tasks
1. Transform the malloc library we give you into a debugging malloc library that:
-
Tracks memory usage;
-
Catches common programming errors (e.g., use after free, double free);
-
Detects writing off the end of dynamically allocated memory (e.g., writing 65 bytes into a 64-byte piece of memory);
-
Catches less common, somewhat devious, programming errors, as described in the remainder of this handout.
2. Augment your debugging malloc library with heavy hitter reporting, which tells a program- mer where most of the dynamically allocated memory is allocated.
While the above tasks may at first sound imposing, they are achievable in not all that much code. The remainder of this handout provides guidance in how to achieve them (as do the tests we provide for your implementation). Read this handout in its entirety carefully before you begin!
-
Context
C memory allocation uses two basic functions, malloc and free. void *malloc(size t size)
Allocate size bytes of memory and return a pointer to it. This memory is not initialized (it can contain anything). Returns NULL if the allocation failed (because size was too big, or memory is exhausted, or for whatever other reason).
void *free(void *ptr)
Free a single block of memory previously allocated by malloc. The rules of malloc and free are simple: Allocate once, then free once.
-
Dynamically allocated memory remains active until explicitly freed with a call to free.
-
A successful call to malloc(sz) returns a pointer (ptr) to “new” dynamically allocated memory. This means that the sz bytes of data starting at address ptr are guaranteed not to overlap with the program’s code, its global variables, its stack variables, or with any other active dynamically allocated memory.
-
The pointer argument in free(ptr) must either equal NULL or be a pointer to active dynamically allocated memory. In particular:
-
– It is not OK to call free(ptr) if ptr points to the program’s code, or into its global variables, or into the stack.
-
– It is not OK to call free(ptr) unless ptr was returned by a previous call to malloc.
-
– It is not OK to call free(ptr) if ptr is currently inactive (i.e., free(ptr) was previously called with the same pointer argument, and the ptr memory block was not reused by another malloc()).
These errors are called invalid frees. The third error is also called a double free. Some notes on boundary cases:
-
-
malloc(0) may return either NULL or a non-NULL pointer. If ptr = malloc(0) is not NULL, then ptr does not overlap with any other allocation and can be passed to free().
-
free(NULL) is allowed. It does nothing.
-
malloc(sz) returns memory whose alignment works for any object. (We’ll discuss align- ment in class; for a preview, see CS:APP/3e §3.9.3.) On x86-64 machines, this means that the address value returned by malloc() must be evenly divisible by 16. You should do this, too.
Two secondary memory allocation functions are also commonly used in C: calloc and realloc. The calloc function allocates memory and “clears” it so that all bytes in the allo- cated region contain zeroes. The realloc function can allocate, free, or resize memory depend- ing on its arguments. These functions work like this:
void *calloc(size_t nmemb, size_t sz) { void *ptr = malloc(sz * nmemb); if (ptr != NULL)
memset(ptr, 0, sz * nmemb); // set memory contents to 0 return ptr;
}
Requirements
Your debugging allocator must support the following functionality. The code we hand out con- tains tests for all this functionality (though we may run further tests when grading). From easier to harder:
-
Overall statistics—how many allocations/frees, how many bytes have been allocated/freed, etc.
-
Secondary allocation functions (calloc and realloc) and integer overflow protection.
-
Invalid free detection.
-
Writing past the beginning/end of an allocation.
-
Reporting memory that has been allocated, but not freed.
-
Advanced reports and checking.
-
Heavy hitter reporting.
Further details on what you must implement for each of the above functionalities are provided below.Finally, your debugging allocator also must perform acceptably—i.e., it must not inordinately slow the execution of programs that use it. For this coursework, we define “acceptable” to mean that the tests we provide (which invoke your debugging malloc) must each run to completion within 5 seconds. These test programs themselves take just a fraction of a second to run on their own (not counting time spent in your malloc implementation).
All programming for this coursework must be done under Linux, with code compiled for an x86-64 CPU. Grades are determined using automated tests that the 0019 staff run on a grading server with an x86-64 CPU.1 Even small changes in the software environment (everything from OS version to compiler and library versions) and changes in CPU architecture (e.g., running on ARM vs. on x86-64) can cause the same source code to produce different results. As a consequence it is absolutely crucial that you do your work for the 0019 courseworks using the same exact software environment and CPU architecture that the grading server uses. Otherwise, there is a risk that when you test your code, you will see different behavior than the grading server does. We provide an “official,” supported development environment for the 0019 courseworks that you must use when building and running your 0019 CW code. This environment matches the one used on the 0019 grading server, and thus helps ensure that the behavior you see when you run tests yourself matches the behavior your code exhibits when run on the grading server.
We provide two main supported ways for you to work on the 0019 CW:
• by logging in remotely over ssh to 0019 Linux x86-64 machines the 0019 instructors provide that run the software for the supported development environment
• by running the supported development environment locally using Docker.
We explain both these methods below. You only need to use one! The Docker method is somewhat more effort to set up, but gives you a complete, shell-based Linux development environment for 0019 on your own machine, regardless of which OS (your “native OS”) you have installed on your machine to begin with. Our advice is that if you find you have difficulty getting Docker to work, rather than waste more valuable time on configuring your environment, you instead use the ssh remote development method, which requires less setup.
Some students’ personal machines have x86-64 CPUs, and some students’ personal machines have ARM CPUs (Apple M1, M2, or M3 CPUs). While the supported 0019 development envi- ronment produces code for x86-64 CPUs, both ways of developing (ssh and Docker) will work for you, even if your own machine has an ARM CPU. ssh simply has you develop on a server with an x86-64 CPU, whereas the Docker environment we provide accurately emulates an x86- 64 CPU if you have an ARM machine (as we explain in the information on Docker we provide below).
If while using the supported 0019 development environment (either via ssh or under the 0019 Docker setup), you consistently (i.e., for many runs) get different results than you do when your code is tested by the grading server (which we describe below), please contact the course staff via an Ed private message. We are happy to answer student questions about difficulties encountered when doing the coursework in the supported 0019 development environment, but we cannot support any other Linux installation.
-
-
-