Introduction to C
2013-11-27
This tutorial is designed for those who have programmed before, perhaps in a higher level language like Python or Ruby. It's not too hard to understand for those who are completely inexperienced, but some knowledge of functions, data structures and pointers might help. Most of the low-level stuff will be new to high level programmers, however.
Setting up the compiler
On GNU/Linux, most BSDs and quite a few other operating systems, gcc
, the C
compiler of the GNU Compiler Collection, is included. If not, it'll be
accessible through your package manager. For example, on FreeBSD, where clang
is the default, you'd install gcc
with a simple pkg install gcc
. OpenBSD
comes with gcc
by default, and so do the vast majority of GNU/Linux distros.
If not, you know what to do. The process of actually compiling source will be
elaborated on later. For now, here is a command you can use to compile programs:
$ gcc -o hello hello.c
This compiles the source code in the text file hello.c
and compiles it into
the executable hello
. The use of libraries will be covered later, or maybe not
at all. Look it up on StackOverflow or something.
Hello, World!
The standard "Hello, world!" program is the de facto standard for introducing a programming language. Here is a basic implementation:
// This is a comment
#include <stdio.h>
int main ()
{
printf("Hello, world!");
return 0;
}
The first line includes the standard header containing IO functions. In C, when
you include a header, the C preprocessor just inserts the text contained in that
header wherever you place the #include
. There really isn't anything more to it
than that, no complex module importing hoops to jump through, like in Python or
Java.
The main function is self-explanatory, but requires a little exposition. Main functions in C always return an int. This is the return code of the program, which is used in scripts and in the shell to determine whether the program succeeded. Generally, zero means a success while anything else means a failure. You could also use some macros (basically constants) defined in stdlib.h to denote failure and success exit codes. If we did that, we could rewrite the source as follows:
/* You can also do comments like this */
#include <stdlib.h>
#include <stdio.h>
int main ()
{
printf("Hello, world!");
return EXIT_SUCCESS;
}
The advantages of using EXIT_SUCCESS
and EXIT_FAILURE
generally include
things like portability across platforms where 1 could mean failure on one,
while -1 could mean the same thing on another. These constants are still ints,
so you don't have to change anything else to get the main function working. I'd
advise you use these constants, since you'll have included stdlib.h most of the
time anyway. I'll use them for the rest of this tutorial.
The printf
function isn't very complicated. Its function is obvious, but it
does have a few quirks you might need to know about.
Printing
Let's say you had an int and you wanted to print it out, along with some words. In Python this would be simple:
print("The value of my integer is %d" % my_int)
That syntax actually comes from C, printf
has a very similar syntax:
printf("The value of my integer is %d", my_int);
The format string and its arguments are all arguments to printf
, and you can
have as many of them as you like. For example:
printf("My car cost %d pounds and weighs %f kg", my_int, my_float);
Simple so far, right? You may already be familiar with the idea of output
streams if you use the command line. Everything printf
prints, by default,
goes to standard output or stdout
. This is the output that is piped to other
programs and written to text files when you redirect the output using ">" or
"|". Sometimes you want to output debug or error information which is supposed
to only be read by a human. For this you use stderr, or the standard error
output stream. Since printf
only prints to stdout
, we must use another
function, fprintf
. It is used as follows:
fprintf(stderr, "This is an error");
That is essentially all the printing you'll ever need to know.
Functions
C is a statically, strongly typed language. There is no type coercion or automatic choosing of types for you. This keeps programs simple and easy to understand, and helps avoid undefined behaviour. This is relevant because function definitions start with a return type and a list of parameters, all specifying their type. If we look at the following function, we can see what I mean.
void say_hello(char* name)
{
printf("Hello, %s!", name);
}
You should know what void
means. If you tried to pass a float or a struct of
some kind to this function, it would not compile. I assume returning other types
doesn't require any explanation or patronising "exercises".
Arrays, Strings and Pointers
You might have noticed that the syntax for passing a string to a function is a
bit odd. Well, that's because all strings in C are really just arrays of
characters, represented by the char
data type. This will become clear in the
following code snippet:
char hello[10] = "hello";
That compiles and behaves as expected. You can pass the variable hello
to any
function that expects a C-style string and it will work. But isn't hello an
array, not a char*
, whatever that means? Well, you might think so in a
language without manual memory management, but in C, arrays are merely pointers
to the first element in an array.
The thing which confuses most new C programmers is the concept of pointers. It all becomes vey simple if you just ditch the analogies and realise that a pointer is a variable which stores the address of the data being pointed to. So the pointer points to the data in RAM, but is not actually the data itself, merely an address. If you know that, the act of "dereferencing" a pointer is also very easy, it's just accessing the data being pointed to. Here is some code to explain:
int my_int = 4;
// Creates an integer with the value of 4
int* to_int = &my_int;
// Creates a pointer to an integer, which is then set to the
// address of the integer
printf("%d", *to_int);
// Dereferences the pointer and prints the value
There are some new operators in there: the &
operator, which takes a variable
of any type and returns the memory address at which the variable is stored. The
*
operator takes a pointer and returns the data stored there. It essentially
reverses the &
operation.
Memory allocation
We have established that pointers hold the address of a block of memory, which usually has some data in it. Letting the compiler allocate the appropriate amount of memory is fine in some cases, but what if we need to allocate an amount of memory which we only know at runtime. An obvious use case is when copying large files - we cannot simply allocate 4 GB at compile time and hope the file fits, we must get the file size from the filesystem and allocate that much memory. This is done using dynamic memory management.
The two most important functions for this are malloc
and free
. malloc
takes an integer denoting the number of bytes to allocate, and returns a pointer
to the allocated block of memory. There is a problem here - you do not always
know the number of bytes taken up by a data type, it could vary from platform
to platform. You can find out using the sizeof
function, which takes a data
type and returns an int telling you how many bytes that data type takes up per
instance. Let's say you wanted to allocate memory for an array of 24 ints. You'd
combine sizeof
and malloc
to produce the following:
int* my_array = malloc(24*sizeof(int));
And now the pointer my_array
points to an allocated block of memory which can
fit 24 ints. But how do you access the ints? Well, the variable's name should
give you a clue - it's just an array. You can use this like an array, because it
is an array. This is all there is to arrays in C, pointers to memory.
After you're done with the array of ints, you may not want the memory to stay
allocated. If you allocate memory throughout your program but never deallocate
it, this causes the memory usage of your program to grow and is known as a
memory leak. These can be avoided by deallocating (or freeing) memory using the
free
function, which takes a pointer and frees the memory it points to.
free(my_array);
In your programs, every call to malloc
should be accompanied by a call to
free
at some point, to eliminate memory leaks. This is very important when
you're dealing with large files or databases, where leaking memory could cause
gigabytes of damage, causing the OS to swap pages to disk and slow down or even
crash.