kaashif's blog

Programming, with some mathematics on the side

Introduction to C

2013-11-27

This tutorial is designed for those who have programmed before, perhaps in a higher level language like Python or Ruby. It's not too hard to understand for those who are completely inexperienced, but some knowledge of functions, data structures and pointers might help. Most of the low-level stuff will be new to high level programmers, however.

Setting up the compiler

On GNU/Linux, most BSDs and quite a few other operating systems, gcc, the C compiler of the GNU Compiler Collection, is included. If not, it'll be accessible through your package manager. For example, on FreeBSD, where clang is the default, you'd install gcc with a simple pkg install gcc. OpenBSD comes with gcc by default, and so do the vast majority of GNU/Linux distros. If not, you know what to do. The process of actually compiling source will be elaborated on later. For now, here is a command you can use to compile programs:

$ gcc -o hello hello.c

This compiles the source code in the text file hello.c and compiles it into the executable hello. The use of libraries will be covered later, or maybe not at all. Look it up on StackOverflow or something.

Hello, World!

The standard "Hello, world!" program is the de facto standard for introducing a programming language. Here is a basic implementation:

// This is a comment
#include <stdio.h>

int main ()
{
    printf("Hello, world!");
    return 0;
}

The first line includes the standard header containing IO functions. In C, when you include a header, the C preprocessor just inserts the text contained in that header wherever you place the #include. There really isn't anything more to it than that, no complex module importing hoops to jump through, like in Python or Java.

The main function is self-explanatory, but requires a little exposition. Main functions in C always return an int. This is the return code of the program, which is used in scripts and in the shell to determine whether the program succeeded. Generally, zero means a success while anything else means a failure. You could also use some macros (basically constants) defined in stdlib.h to denote failure and success exit codes. If we did that, we could rewrite the source as follows:

/* You can also do comments like this */
#include <stdlib.h>
#include <stdio.h>

int main ()
{
    printf("Hello, world!");
    return EXIT_SUCCESS;
}

The advantages of using EXIT_SUCCESS and EXIT_FAILURE generally include things like portability across platforms where 1 could mean failure on one, while -1 could mean the same thing on another. These constants are still ints, so you don't have to change anything else to get the main function working. I'd advise you use these constants, since you'll have included stdlib.h most of the time anyway. I'll use them for the rest of this tutorial.

The printf function isn't very complicated. Its function is obvious, but it does have a few quirks you might need to know about.

Printing

Let's say you had an int and you wanted to print it out, along with some words. In Python this would be simple:

print("The value of my integer is %d" % my_int)

That syntax actually comes from C, printf has a very similar syntax:

printf("The value of my integer is %d", my_int);

The format string and its arguments are all arguments to printf, and you can have as many of them as you like. For example:

printf("My car cost %d pounds and weighs %f kg", my_int, my_float);

Simple so far, right? You may already be familiar with the idea of output streams if you use the command line. Everything printf prints, by default, goes to standard output or stdout. This is the output that is piped to other programs and written to text files when you redirect the output using ">" or "|". Sometimes you want to output debug or error information which is supposed to only be read by a human. For this you use stderr, or the standard error output stream. Since printf only prints to stdout, we must use another function, fprintf. It is used as follows:

fprintf(stderr, "This is an error");

That is essentially all the printing you'll ever need to know.

Functions

C is a statically, strongly typed language. There is no type coercion or automatic choosing of types for you. This keeps programs simple and easy to understand, and helps avoid undefined behaviour. This is relevant because function definitions start with a return type and a list of parameters, all specifying their type. If we look at the following function, we can see what I mean.

void say_hello(char* name)
{
printf("Hello, %s!", name);
}

You should know what void means. If you tried to pass a float or a struct of some kind to this function, it would not compile. I assume returning other types doesn't require any explanation or patronising "exercises".

Arrays, Strings and Pointers

You might have noticed that the syntax for passing a string to a function is a bit odd. Well, that's because all strings in C are really just arrays of characters, represented by the char data type. This will become clear in the following code snippet:

char hello[10] = "hello";

That compiles and behaves as expected. You can pass the variable hello to any function that expects a C-style string and it will work. But isn't hello an array, not a char*, whatever that means? Well, you might think so in a language without manual memory management, but in C, arrays are merely pointers to the first element in an array.

The thing which confuses most new C programmers is the concept of pointers. It all becomes vey simple if you just ditch the analogies and realise that a pointer is a variable which stores the address of the data being pointed to. So the pointer points to the data in RAM, but is not actually the data itself, merely an address. If you know that, the act of "dereferencing" a pointer is also very easy, it's just accessing the data being pointed to. Here is some code to explain:

int my_int = 4;
// Creates an integer with the value of 4

int* to_int = &my_int; 
// Creates a pointer to an integer, which is then set to the 
// address of the integer

printf("%d", *to_int);
// Dereferences the pointer and prints the value

There are some new operators in there: the & operator, which takes a variable of any type and returns the memory address at which the variable is stored. The * operator takes a pointer and returns the data stored there. It essentially reverses the & operation.

Memory allocation

We have established that pointers hold the address of a block of memory, which usually has some data in it. Letting the compiler allocate the appropriate amount of memory is fine in some cases, but what if we need to allocate an amount of memory which we only know at runtime. An obvious use case is when copying large files - we cannot simply allocate 4 GB at compile time and hope the file fits, we must get the file size from the filesystem and allocate that much memory. This is done using dynamic memory management.

The two most important functions for this are malloc and free. malloc takes an integer denoting the number of bytes to allocate, and returns a pointer to the allocated block of memory. There is a problem here - you do not always know the number of bytes taken up by a data type, it could vary from platform to platform. You can find out using the sizeof function, which takes a data type and returns an int telling you how many bytes that data type takes up per instance. Let's say you wanted to allocate memory for an array of 24 ints. You'd combine sizeof and malloc to produce the following:

int* my_array = malloc(24*sizeof(int));

And now the pointer my_array points to an allocated block of memory which can fit 24 ints. But how do you access the ints? Well, the variable's name should give you a clue - it's just an array. You can use this like an array, because it is an array. This is all there is to arrays in C, pointers to memory.

After you're done with the array of ints, you may not want the memory to stay allocated. If you allocate memory throughout your program but never deallocate it, this causes the memory usage of your program to grow and is known as a memory leak. These can be avoided by deallocating (or freeing) memory using the free function, which takes a pointer and frees the memory it points to.

free(my_array);

In your programs, every call to malloc should be accompanied by a call to free at some point, to eliminate memory leaks. This is very important when you're dealing with large files or databases, where leaking memory could cause gigabytes of damage, causing the OS to swap pages to disk and slow down or even crash.