i’m so full of ideas

Archive for the 'linux' category

printf() sucks

June 23, 2009 1:44 pm

I am a big fan of printf()-style debugging. It helps you get an overview of a problem that traditional debuggers are not so good at. So it’s a bit unexpected that I do not like printf() itself.

Why printf() sucks

printf() is optimized for a weird corner case that you almost never need. You’ll no doubt recognize this as standard usage:

  printf("Hello, World!\n")

Notice that newline character at the end. It’s a pain to type. As best as I can tell, it exists so you can do this:

printf("Starting long-running operation ..."); // long-running operation goes here printf(" finished.\n");

… which causes the output from both printf() calls to be printed on the same line. Swell. How many times have you needed to do that? I first started programming in C in the late eighties, and my lifetime total so far is zero. I typed thousands of unnecessary newline characters before I finally wised up and wrote a replacement.

Format strings are error-prone

The most likely problem you’ll have with printf() and functions like it is a mismatch between the format string and the variables presented to it. For example:

  printf("Two strings: %s %s\n", "text");

The format string calls for two strings to be printed, but you’ve only provided one. The call to printf() might work fine, crash, or print weird results, depending on what happens to be lying around on the stack. If your compiler is GCC — and it probably is, if you’re programming for any UNIX variant, including Mac OS X — there is a good workaround. Here’s my definition for echo(), my printf() replacement:

  void echo(const char* tfmt, ...) __attribute__((format(printf, 1, 2)));

That weird __attribute__ business is a GCC-ism that means “this function works like printf(), and here’s the argument numbers to use for the format string and the variable args, respectively.”

This feature doesn’t work unless you specifically enable the proper GCC warning. If you’re using makefiles or the command line, pass -Wformat to the compiler. If you’re using Xcode, bring up the project information window. In the “Build” tab, there’s a section called “GCC 4.0 – Warnings.” The warning you want is labeled “Typecheck Calls to printf/scanf,” which should be enabled. Once you do that, then when you write code like this:

  echo("bad format: %s %s", "text");

The compiler will give you this warning:

  warning: too few arguments for format

… which saves you from the undefined behavior your program was about to be subjected to.

(C++ introduced cout, which is a printf() replacement. It handily works around the format string issue discussed here. I’ve always felt that cout introduces more problems than it solves, so I personally avoid it.)

Functions you can use

The sample code that follows includes three functions you might want to use in your own programs.

echo() — works exactly the same as printf(), except it doesn’t require a newline at the end of its format string.

sfmt() — works exactly the same as echo(), except it puts the formatted contents into a std::string object, rather than printing to stdout.

fmtArg() — useful if you want to build your own printf()-like function similar to echo() or sfmt(). Have a look at how echo() uses it, which should be enough for you to get started.

Example code

First the header file, printf_sucks.h:

// printf_sucks.h -- printf() alternative // by allen brunson  june 18 2009 #ifndef PRINTF_SUCKS_H #define PRINTF_SUCKS_H // sfmt() and support functions std::string fmtArg(const char* tfmt, va_list args); std::string fmtArgLarge(int32_t byteCount, const char* tfmt, va_list args); std::string sfmt(const char* tfmt, ...) __attribute__((format(printf, 1, 2)));   // echo(), a better printf() void echo(const char* tfmt, ...) __attribute__((format(printf, 1, 2))); #endif  // PRINTF_SUCKS_H

Now the source file, printf_sucks.cpp:

// printf_sucks.cpp -- printf() alternative // by allen brunson  june 18 2009 #include <assert.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string> #include "printf_sucks.h" void echo(const char* tfmt, ...) {     va_list      args = NULL;     std::string  text;         va_start(args, tfmt);     text = fmtArg(tfmt, args);     va_end(args);         puts(text.c_str()); } std::string fmtArg(const char* tfmt, va_list args) {     static const int32_t  kBufferSize = 2 * 1024;         // the extra four bytes are to guard against buffer overruns     char     cbuf[kBufferSize + 4];     int32_t  size = 0;     assert(tfmt && tfmt[0]);     size = vsnprintf(cbuf, kBufferSize, tfmt, args);         if (size < kBufferSize)     {         return std::string(cbuf);     }     else     {         return fmtArgLarge(size, tfmt, args);     } } // called when fmtArg() didn't have a big enough buffer std::string fmtArgLarge(int32_t byteCount, const char* tfmt, va_list args) {     char*        cbuf = NULL;     int32_t      clen = byteCount + 10;     int32_t      size = 0;     std::string  text;         cbuf = static_cast<char*>(malloc(clen + 4));     if (!cbuf) return "";         size = vsnprintf(cbuf, clen, tfmt, args);     assert(size < clen);         text.assign(cbuf);         free(cbuf);     cbuf = NULL;         return text; } int main(int argc, const char** argv) {     echo("hello from echo: %s %d", "text", 2);     return 0; } // works like echo(), but puts the formatted contents into a std::string std::string sfmt(const char* tfmt, ...) {     va_list      args = NULL;     std::string  text;     assert(tfmt && tfmt[0]);     va_start(args, tfmt);     text = fmtArg(tfmt, args);     va_end(args);         return text; }

linux: app memory usage

May 10, 2009 5:39 pm

Once upon a time, I wrote a Linux app in C++ that used the database access libs from a well-known corporate database vendor. The app was very important to the employer I wrote it for. It had to run 24 hours a day, seven days a week. Unfortunately, the database access libs leaked like a screen door on a submarine. I did not have the source code for them and the vendor was not responsive to my pleas.

So I set up the app so it could restart itself every few hours, in order to clear its leaks. The app ran inside a script, and when it exited with a certain return value, the script would know to restart it immediately. This worked pretty well for a couple of months. Then I discovered that certain usage patterns would cause the app to completely exhaust its heap in as little as 30 minutes. It crashed, taking a lot of important data down with it.

My next try was a tad more sophisticated. I figured out how to determine how much memory the app was using currently, and it would restart itself when its heap size had grown to about 1.5GB. Now it didn’t matter how long it took for the program to exhaust its heap, it would always restart itself long before that happened.

I had planned to show you source code for getting a Linux app’s resident set size. Seems like a good complement to my earlier blog entry which had code to do the same thing for Apple’s platforms. But it requires a lot of code, due to boring details like file access, string manipulation, and so on. So I’ll just describe the algorithm, and let you implement it yourself, if you’re interested.

Getting a program’s resident set size on Linux involves querying the /proc pseudo-filesystem. First you must create a pretend “filename” that looks like this:

    /proc/(pid)/statm

where (pid) is the pid of your program, which you can get with the posix function getpid(), converted to a textual representation. Open the “file” with whatever file manipulation APIs you normally use. fopen() is a good choice. Read the contents of the pseudo-file with fread(), then close it with fclose(). You should end up with a line of text that looks like this:

    38546 48861 1677 104 0 36313 0

This is a list of various memory statistics about your program. The only one we’re interested in is the second number, 48861 in this example. This is your app’s resident set size, in pages. I’d prefer to have that value in bytes, so you can multiply the number of pages by the memory page size, which you can retrieve by calling getpagesize(). That’s it, now you’ve got a more-or-less accurate count of the number of bytes your Linux program is currently using.