x86/Debian GNU Linux/gcc
1. C source
main.c
other.c
2. compile(see object file's symbol table)
C source's compile:Translate C source code to “machine instruction” or “assembly code”.
gcc -c main.c
gcc -c other.c
Then main.o and other.o will be generated.
see object file's symbol table
main.o
nm main.o
U in_other_file_func
00000000 T main
U other_var
U printf
U means undefine. T means text(code).
other.o
00000000 T in_other_file_func
00000000 D other_var
D means define.
[1] In main.o,in_other_file_func and printf symbols are not defined but will be used.
[2] in_other_file_func,other_var, main havn't address yet, so with 00000000 instead.
If you use “gcc -c main.c -Wall” command, there will be warnnings:
main.c: In function 'main':
main.c:5:2: warning: implicit declaration of function'in_other_file_func' [-Wimplicit-function-declaration]
Do not be worried, linker will solve this problem.
3 link(see execute files'symbol table)
C source's link:link other files' content to one file(execute file).linker will “analyze symbols”and “relocate symbols' address” by modifying instructions' address.
gcc other.o main.o -o main
Then main execute file will be generated.
See execute files' symbol table
nm main > main_exe.txt
vi main_exe.txt
[ grap main_exe.txt partial as one photo as follow: ]
All symbols have their own virtual address. And All symbols defined except printf(this library function belongs to dynamic link).
4. execute
Now main is an execute file which can run on linux:
./main
re's value is: -1076686976.000000
The result is not we want.In C, using something needs define first. Main has“in_other_file_func's address” but has not “in_other_file_func's type”. In this condition, C treat “in_other_file_func”function return one integer, arguments are integer too. But, we use“%f” to read it, so print one strange number. We should tell main() the type of “in_other_file_func”:
main.c
Compile , link, execute main again:
gcc other.o main.o -o main
./main
re's value is: 4.000000
Then get the right answers.
5. call main()
Every C program must has a main() function, because _start in crtl.o will call main() function.We can see _start's disassembling code in main's disassembling file:
objdump -dS main > main_disasm.txt
vi main_disasm.txt
…...
08048330 <_start>:
8048330: 31 ed xor %ebp,%ebp
8048332: 5e pop %esi
8048333: 89 e1 mov %esp,%ecx
8048335: 83 e4 f0 and $0xfffffff0,%esp
8048338: 50 push %eax
8048339: 54 push %esp
804833a: 52 push %edx
804833b: 68 70 84 04 08 push $0x8048470
8048340: 68 80 84 04 08 push $0x8048480
8048345: 51 push %ecx
8048346: 56 push %esi
8048347: 68 28 84 04 08 push $0x8048428
804834c: e8 cf ff ff ff call 8048320<__libc_start_main@plt>
8048351: f4 hlt
…...
_start首先将一系列参数压栈,然后调用libc的库函数__libc_start_main做初始化工作,其中最后一个压栈的参数push $0x8048428是main函数的地址,__libc_start_main在完成初始化工作之后会调用main函数。
由于main函数是被启动例程调用的,所以从main函数return时仍返回到启动例程中,main函数的返回值被启动例程得到,如果将启动例程表示成等价的C代码(实际上启动例程一般是直接用汇编写的),则它调用main函数的形式是:
exit(main(argc, argv));
也就是说,启动例程得到main函数的返回值后,会立刻用它做参数调用exit函数。exit也是libc中的函数,它首先做一些清理工作,然后调用_exit系统调用终止进程,main函数的返回值最终被传给_exit系统调用,成为进程的退出状态。我们也可以在main函数中直接调用exit函数终止进程而不返回到启动例程,例如: