Creating an Open-source Project in C With Autotools (Part 2 of 2)

« More entries

In the previous post, I created a stub for a C program using Autotools. It didn’t have any functionality, so now I am going to implement it, exploring some of the aspects involved in the creation of a program like dfym. I will leave a few of them for future posts (such as profiling, testing and debugging), as they deserve proper attention on their own.

This post is not about this particular program, but rather about topics commonly encountered when creating a new program in C. Here, I’ll deal with program structure, argument processing, input, output, code formatting, documentation and packaging. These are topics that can be useful for many other programs, so I believe they can be found useful for different purposes.

1. Code formatting

The first area I decided to tackle is code formatting. Although it seems a superficial and procrastinative aspect to start with, I firmly believe in clean and well-formatted code from the beginning of a project development.

Taking a look at GNU Coding Standards, I can find a section about source code formatting. The quickest way to code using GNU’s style guide is to simply process the files with the indent program. The flags -nbad -bap -nbc -bbo -bl -bli2 -bls -ncdb -nce -cp1 -cs -di2 -ndj -nfc1 -nfca -hnl -i2 -ip5 -lp -pcs -psl -nsc -nsob will set the precise options, according to the manual. However, I prefer to use the program Artistic Style simply because I find it easier to configure. With astyle, all I need is the option —style=gnu, although I also add some others (check the documentation for their meaning):

Callin astyle with options
astyle -s2 --style=gnu --pad-header --align-pointer=name --indent-col1-comments --pad-first-paren-out

As a Vim user, I make my life easier adding this autocommand to the .vimrc configuration file, so I can indent the code using the command gg=G:

Setting the program for the = command (indentation) in Vim
autocmd BufNewFile,BufRead *.{c,C,cpp,cxx,cc,h,hpp} set equalprg=astyle\ -s2\ --style=gnu

Adding more flags (like in the command-line example), requires escaping the space with a backlash (\ ).

This is an entirely optional step, of course, but having these wonderful tools to help me achieve nice formatting and style with zero effort is of great value. Uniformity in code formatting definitely pays off, indepently of the details of each style and personal preferences. For this project, however, I choose to follow GNU’s rules. This is an example of the code style defined by those flags:

C code formatting example
if (!strcmp ("tag", argv[1]))
  {
    if (argc != 4)
      {
        fprintf (stderr, "Wrong number of arguments. Please refer to help using: \"dfym help\"\n");
        exit (EXIT_FAILURE);
      }
    else
      {
        exit (EXIT_SUCCESS);
      }
   }

2. Program structure

Now I get into the nitty-gritty of the program development. I don’t start analyzing and designing the program, but rather follow a more incremental (and iterative) way by roughly structuring the program modules and subsystems and immediatly jumping into coding. I assume multiple redesigns and refactorings. In this case, the program is rather small (the resulting C code is around 900 LOC), so this turns out to be a reasonable approach.

The original idea was this:

Program structure

It seemed a good, simple structure. As this is a small program, it actually works well.

3. System integration

Another part to be taken care of from the beginning is how the program will integrate with the system. Besides user interfacing, I need to define if the program will accept any other kind of input and what the output will be. I decided that the program will only receive command line arguments as input, and its output will respect the following rules:

  • Standard output will be used for regular output.
Standard output
printf ("Thanks for using dfym\n");
  • Standard error will be used for errors.
Standard error
fprintf (stderr, "Needs a command argument. Please refer to help using: \"dfym help\"\n");
  • Return codes (EXIT_SUCCESS, EXIT_FAILURE) will be used for informing the system on program’s status.
exit with error code
exit (EXIT_FAILURE);

4. Commands and arguments

Dfym is designed to interface users with command line, using a command/options approach. That means that I first need a command as first argument, and then I can set flags and command’s arugments. Dfym commands follow this structure:

Dfym command line interface
dfym [command] [options] [argumens...]

The first thing to do is to structure the main program to accommandate this requirement. I need to test if a command argument is given, otherwise, print an error through Standard Error. If the command argument is actually found, I compare the text of the first argument (the command) and check if is equal to “help” or any other command I plan have.

Check if command is supplied
int main (int argc, char **argv)
{
  if (argc < 2)
    {
      fprintf (stderr, "Needs a command argument. Please refer to help using: \"dfym help\"\n");
      exit (EXIT_FAILURE);
    }
  /* help command */
  else if (!strcmp ("help", argv[1]))
    {
      printf ("HELP TEXT\n");
      exit (EXIT_SUCCESS);
    }

  /* Commands: */
  if (!strcmp ("tag", argv[1]))
    {
      printf ("TAG COMMAND\n");
    }
  else
    {
      printf ("Wrong command. Please try \"dfym help\"\n");
      exit (EXIT_FAILURE);
    }

  return EXIT_SUCCESS;
}

The help command and the rest of commands are separated because these will require accessing the database, so I’ll place all the code relating to opening and setting up the database in those lines prior to the rest of commands (around line 14).

Options (flags) are processed inside each one of the commands, as different commands don’t necessarily offer the same flags. For this purpose I’ll use the canonical getopt.

The code I’ll show here defines 3 yes/no flags (r, f and d), and an option requiring an argument (n). That’s written as “rn:fd” in getopt’s way. The structure is based on a while loop for processing all arguments, but I use a little trick here: I move the argv pointer one position forward, and decrease the argument count (argc). This way, the command will act as the program name, and I can cleanly process the rest of arguments. After processing is done, I also increment the global optind, in order to take into account the previous offset introduced by me.

Argument processing
/* SEARCH command */
else if (!strcmp ("search", argv[1]))
  {
    int opt;
    char flags = 0;
    char *number_value_flag = NULL;
    /* Command flags */
    while ((opt = getopt (argc-1, argv+1, "rn:fd")) != -1)
      {
        switch (opt)
          {
          case 'r':
            flags |= OPT_RANDOM;
            break;
          case 'n':
            number_value_flag = optarg;
            break;
          case 'f':
            flags |= OPT_FILES;
            break;
          case 'd':
            flags |= OPT_DIRECTORIES;
            break;
          case '?':
            if (optopt == 'n')
              fprintf (stderr, "Option -n requires an argument.\n");
            else if (isprint (optopt))
              fprintf (stderr, "Unknown option `-%c'.\n", optopt);
            else
              fprintf (stderr,
                       "Unknown option character `\\x%x'.\n",
                       optopt);
            exit (EXIT_FAILURE);
            break;
          default:
            abort ();
          }
      }
    optind++; /* we are looking into the command, not the executable */
    if ((argc - optind) != 1)
      {
        fprintf (stderr, "Wrong number of arguments. Please refer to help using: \"dfym help\"\n");
        exit (EXIT_FAILURE);
      }
    else
      {
        unsigned long int number_flag = 0;
        if (number_value_flag) number_flag = atoi (number_value_flag);
        switch (dfym_search_with_tag (db, argv[optind], number_flag, flags))
          {
          case DFYM_OK:
            break;
          default:
            fprintf (stderr, "Database error\n");
            exit (EXIT_FAILURE);
          }
      }
  }

The second aspect I’d like to talk about is the use of bit flags. Bit flags are based on bits and boolean operations (\| and &) to define, combine and later check enabled/disabled options. 0x1, 0x2, 0x4… will define a byte with all its bits set to 0, except bit number 0, 1, 2 (from the right) respectively. This of course means that we can have 8 options max per byte.

00000000 Meaning    Binary  Hexadecimal
│││││││└ Option 1   2^0      1 
││││││└─ Option 2   2^1      2 
│││││└── Option 3   2^2      4 
││││└─── Option 4   2^3      8 
│││└──── Option 5   2^4     10 
││└───── Option 6   2^5     20 
│└────── Option 7   2^6     40 
└─────── Option 8   2^7     80

The trick here is to use bitwise shift operators to simplify the definition, thus giving a name to the different options. This is the definition currently used in dfym, found in dfym_base.h header:

Defining bit flags
typedef enum
{
  OPT_FILES = 1 << 0,          /**< Select files */
  OPT_DIRECTORIES = 1 << 1,    /**< Select directories */
  OPT_RANDOM = 1 << 2          /**< Return results in random order */
} query_flag_t;

In the user side, they are combined with OR, both setting them all at once…

Setting bit flags
OPT_FILE | OPT_DIRECTORIES

…or accumulating the options in a variable (unsigned char for a byte), as done in the above example:

Accumulating bit flags
unsigned char flags = 0;
flags |= OPT_FILE;
flags |= OPT_RANDOM;

Finally, I would check in the implementation if they are set. I can also combine them to check if two or more flags are set simultaneously, as in the second conditional in the following example:

Checking bit flags
if (flags & OPT_FILE)
  {
    printf("OPT_FILE flag is set\n");
  }
if (flags & (OPT_FILE | OPT_DIRECTORIES))
  {
    printf("OPT_FILE flag is set\n");
  }

This is roughly all that is needed for setting flags both at the command line level and at the function implementation level. These flags can be passed themselves as unsigned chars (or wider types) to and from functions in the internal API. It is an easy and flexible method, and widely used in the C world.

5. Changes to configure.ac and Makefile.am

Although I won’t dive into SQLite3 interfacing, as that is a rather specific part of this particular program, I’ll explain a couple of changes that were necessary to be done from the code as it was in part I.

First, I added a line to check for the existence of the function strdup. This is easily done like this in configure.ac:

Checking existence of a function in ./configure via configure.ac
AC_CHECK_FUNCS([strdup])

This can be extended to many other functions, as necessary, to check if the system supports them. I’d generally use this in case I’m going to use functions found in many but not all similar systems (like GNU/Linux, BSD and POSIX-compliant).

The next change I made was necessary to support Glib and SQLite3 in the library code, which is the one actually interfacing SQL. As the library needs to be linked to SQLite3 and Glib, I need to add these lines to the lib’s Makefile.am, just after the DEBUG conditional:

Linking the library to Glib and SQLite3
if DEBUG
  AM_CFLAGS =-I$(top_srcdir)/src/lib -Wall -g -O3
else
  AM_CFLAGS =-I$(top_srcdir)/src/lib -Wall
endif

AM_CFLAGS += $(GLIB_CFLAGS)
AM_CFLAGS += $(SQLITE3_CFLAGS)

6. Documentation with Doxygen

Documenting a C project with Doxygen is also straightforward. The best source of documentation regarding how to create a doxygen project is the Getting Started guide. Basically, I need to run this command in the root directory of the project:

Generating default Doxygen configuration file
doxygen -g

This will create a file called Doxyfile, which is a long text file with very well-commented configuration options for the project. I tend to read all of them, but I’ll leave on ready for use with C programs for download. Then, I just need to run the Doxygen program without arguments.

Generating documentation with Doxygen
doxygen

The only remaining part for generating documentation is, well, the most important: commenting the code with blocks that can be understood and automatically parsed by Doxygen. First of all, files that I want Doxygen to process, need to have a comment like this one on the first lines:

Telling Doxygen to parse this file
/** \file
  * dfym: Library functions using a SQLite3 backend */

After I’ve informed Doxygen that what follows is a parsable source file, I can document a function with a comment like this:

Function documentation for Doxygen
/** Open the database if it exists, create it otherwise.
 *
 * The database will be placed in ~/.dfum.db by default. Currently, no means of
 * changing this default are provided.
 * \param db The SQLite3 database.
 * \return Error code \ref dfym_status_t.
 */
sqlite3 *dfym_open_or_create_database (char *const db_path)
{
  return NULL;
}

Doxygen offers several ways of defining a comment as “doxygen documentation”, but I find starting a comment with double star the cleanest and most uniform.

As a nice way of navigating documentation, Doxygen offers the possibility of creating groups, which will put a set of functions together as if they belong to the same group in the documentation. A way to add a set of functions to a group works like this: first I open the group, and then I close it, as if it were a couple of curly braces defining a block. The \addtogroup Doxygen command requires 2 arguments: the name of the gruop, and a description of the group:

Creating function groups
/**
 * \addtogroup database Database query functions
 */
/**@{*/

/* ...function definitions and documentation... */

/**@}*/

Doxygen offers many more options and functionality, in the form of comments inside C code (and many other languages).

7. Packaging the project

Packaging the project with Autotools is as simple as this:

make dist

This will create a file called dfym-0.1.tar.gz that contains all the necessary files for dfym to be compiled an installed with the regular ./configure, make and make install commands.

8. Conclusion

Writing a program in C with Autotools is not as easy task as using higher-level languages, with better batteries included and more modern tools for development. However, once I get a grip on its different parts, I can enjoy development as in more fast-paced languages, while harnessing the ancient power of C.


Did you find it useful? Please share!

Comments