Kunle Adeniran
5 min readJan 18, 2018

--

What Happens when you type gcc main.c?

There are a lot of waiting to be done these days: waiting for coffee, waiting at the DMV, or waiting for your code to compile. I had never considered the term compiling codes until I started learning to be a full stack engineer at School. I love to tinker and know how things work and its an amazing adventure learning how the computer works.

I am currently learning C programming and one of the key things I am learning is how C codes are compiled. But before we go on, we need to ask some important questions about my topic.

First question we need to ask is, what is gcc and why do we need gcc. The main.c is a source file that contains codes written in C programming language but it needs to undergo a process before it become executable because C is a compiled programming languages.

A compiled language is a programming language whose implementations are typically compilers — translators which generate machine code from source code -, and not interpreters. The C language was developed by Dennis Ritchie from 1969 to 1973 while at Bell Labs, where it was being used to rewrite the earlier Multics operating system in a more portable and compact fashion, eventually culminating in the Unix OS. Prior to the C language, most critical code, such as OS kernels and other system software, was written in assembly language, which was very specific to the particular architecture on which it was designed to run. In part, C was conceived as a response to the portability issues brought about when trying to transport software written for one architecture into another. A key aspect of C’s success was the ability to write code at a higher level of abstraction and compile it into assembly language in a machine dependent manner.

This is where the gcc (GNU Compiler Collection) comes into play. The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain and the standard compiler for most Unix-like Operating Systems.

Now back to the topic at hand.

COMPILATION STEPS

The steps involved in compilation the source file main.c are listed below:

  • Preprocessing
  • Compiling
  • Assembly
  • Linking

PREPROCESSING

Here, the source code is passed to cpp, the C preprocessor. This program removes comments and interprets directives and macros that are used by cpp to change and prepare the code in a variety of ways.

The header files found in #include (eg <stdio.h> <stdlib.h>) are also added to the file during preprocessing. The header file stdio.h is used if you want to use the two standard I/O functions printf and scanf.

The preprocessing output can be viewed before it is compiled by running the gcc -E main.c

COMPILING

Following preprocessing, the source code gets passed to the compiler program. cc1 is the compiler gcc uses to turn source code into assembly language. An assembly (or assembler) language,[1] often abbreviated asm, is a low-level programming language for a computer, or other programmable device, in which there is a very strong (but often not one-to-one) correspondence between the language and the architecture’s machine code instructions. Each assembly language is specific to a particular computer architecture.

The output at this stage can also be seen by running the code gcc -S main.c

This outputs a file with the same source name but with an extension of .s instead of .c.

ASSEMBLY

The next step involves converting the assembly code to machine code or binary code which is understood by machines. Numerical machine code (i.e., not assembly code) may be regarded as the lowest-level representation of a compiled or assembled computer program or as a primitive and hardware-dependent programming language. While it is possible to write programs directly in numerical machine code, it is tedious and error prone to manage individual bits and calculate numerical addresses and constants manually.

For this reason, programs are almost never written directly in machine code in modern contexts. Three exceptions are: for low level debugging, program patching, and assembly language disassembly (which would be called for in the absence of assembly source code, when only the machine language object code is available).

The this can also be seen by typing gcc -c

The output file -main.o looks illegible to humans but a better view can be seen from a machine language monitor below.

source: https://en.wikipedia.org/wiki/Machine_code

LINKING

The final step is linking the object files together into a single, monolithic binary that contains all the instructions necessary to run the program. This is accomplished by ld, the linker program. In addition to object files generated from source code, frequently ld will pull in additional precompiled code from various library functions. For example, the printf function that was declared by stdio.h in the source file will be linked to the rest of the executable in this step. By linking all these various files together in the correct order, a valid executable is generated that can run on the target architecture.

While this article is so length, I am glad I don’t have to wait for too long for the file to be compiled. The finished product is inside an output file called the a.out.

a.out is a file format used in older versions of Unix-like computer operating systems for executables, object code, and, in later systems, shared libraries. This is an abbreviated form of “assembler output”, the filename of the output of Ken Thompson’s PDP-7 assembler. The file can be found in the same directory as the source file.

--

--