Daniel Bittman

Github
Twitter

Fighting With Your Compiler

I use gcc to compile SeaOS. While clang is a production quality C compiler, gcc is more established and has more documentation on porting (I also use evil gcc-specific extensions to accomplish stack and x86 magic, something I'm slowly working on removing). In order to guarantee that the compiler doesn't try to do anything stupid like use system libraries, generate code for the wrong CPU type, etc, SeaOS builds itself a gcc cross-compiler. This helps to stabilize the build environment, making it easier to find bugs in the kernel.

When writing code, it is generally a good idea to assume that the compiler is always generating correct assembly code. It is way way more likely that you've buggered up some code without realizing it, or there is some edge case that you haven't thought of, or that you forgot to declare a variable that gets modified in an interrupt handler as volatile, than the compiler generating problematic assembly. In general, the compiler is more stable and has been better evaluated than what you wrote 20 minutes ago while watching Netflix. Even if you've spent hours saying "this should work!", it will almost always come back to a mistake you made, not the compiler.

Of course, that isn't always the case. Once you've spent hours debugging you start to doubt that rule, and you disassemble the binary and evaluate the mess that is gcc-generated x86-64 assembly code.

And sometimes, you find something.

A Bit of Background

Kernel development isn't the nice happy convenient place that C userland is - wait, did I just say that C userland was convenient? Yes. Compared to kernel-space, programming in C in userland is a breeze (no matter how many Python developers say otherwise). Firstly, you get no libraries except for libgcc, but that's mostly routines for things like 64-bit division and floating point handling. You have to implement all the standard library functions that you need yourself. That includes strcmp, strcat, memcpy and memset, which is why SeaOS has a library/string directory that contains simple implementations of things like memset.

Compilers like to optimize code if you tell them to. Gcc 4.8 has an optimization -ftree-loop-distribute-patterns, that basically optimizes certain code patterns and turns them into library calls. From the documentation:

-ftree-loop-distribute-patterns:

Perform loop distribution of patterns that can be code generated with calls to a library. This flag is enabled by default at -O3. This pass distributes the initialization loops and generates a call to memset zero. For example, the loop

DO I = 1, N
  A(I) = 0
  B(I) = A(I) + I
ENDDO

is transformed to

DO I = 1, N
  A(I) = 0
ENDDO
DO I = 1, N
  B(I) = A(I) + I
ENDDO

and the initialization loop is transformed into a call to memset zero.

Wow, great! It'll optimize initialization loops by calling memset in places where I've forgotten to just call memset instead of making a loop. There is just one problem...

Implementing memset

When gcc 4.8 came out, I updated my cross-compiler so that it was based off of gcc 4.8. I quickly re-built the kernel, and booted it up... only to have it immediately crash. It instantly triple faulted the cpu. It didn't even get a change to print "hello" on the screen. I hung my head in sadness, and went looking for the bug. At this point, I assumed that there was some coding error that only showed up because of some new optimization or difference in gcc 4.8.

After searching for a long time, I traced the crash back to a call to memset. "Weird", I thought, "maybe I'm setting some memory to zero that I shouldn't be". But I wasn't. In fact, this was the first call to memset that the kernel ever makes when booting up, which I found suspicious. I decided to check out the memset code, to see if something was up.

So, what does an implementation of memset look like? Pretty simple really, all it does is set a bunch of bytes to something:

	void *memset(void *m, int c, size_t n)
	{
		unsigned char *s = (unsigned char *) m;
		while (n--) {
			*s++ = (unsigned char) c;
		}
		return m;
	}
	

Nothing immediately stood out, so I decided to look at the disassembled binary for memset. I still had the old cross-compiler around, so I compiled both a non-working version and a working version. Here are the outputs:

Non-optimized, working memset

0012d9b8 <memset>:
   12d9b8: push %rbp
   12d9b9: mov %rsp,%rbp
   12d9bc: mov %rdi,%rax
   12d9bf: mov %rdi,%rcx
   12d9c2: jmp 12d9ce <memset+0x16>
   12d9c4: mov %sil,(%rcx)
   12d9c7: mov %r8,%rdx
   12d9ca: lea 0x1(%rcx),%rcx
   12d9ce: lea -0x1(%rdx),%r8
   12d9d2: test %rdx,%rdx
   12d9d5: jne 12d9c4 <memset+0xc>
   12d9d7: pop %rbp
   12d9d8: retq

Optimized, non-working memset

0013aaf0 <memset>:
   13aaf0: test %rdx,%rdx
   13aaf3: je 13ab10 <memset+0x20>
   13aaf5: push %rbp
   13aaf6: movzbl %sil,%esi
   13aafa: movabs $0x13aaf0,%rax
   13ab04: mov %rsp,%rbp
   13ab07: callq *%rax
   13ab09: pop %rbp
   13ab0a: retq
   13ab0b: nopl 0x0(%rax,%rax,1)
   13ab10: mov %rdi,%rax
   13ab13: retq

The working one is correct, obviously. But the non-working one has some weird stuff going on. It skips over the function if the size argument is zero, which makes sense. If it doesn't skip over it, it loads the value 0x13aaf0 into %rax and then later does a function call to that location. The relevant parts highlighted:

0013aaf0 <memset>:
   13aaf0: test %rdx,%rdx
   13aaf3: je 13ab10 <memset+0x20>
   13aaf5: push %rbp
   13aaf6: movzbl %sil,%esi
   13aafa: movabs $0x13aaf0,%rax
   13ab04: mov %rsp,%rbp
   13ab07: callq *%rax
   13ab09: pop %rbp
   13ab0a: retq
   13ab0b: nopl 0x0(%rax,%rax,1)
   13ab10: mov %rdi,%rax
   13ab13: retq

Yup. Memset is calling itself. And not jumping around inside the function loop style, no. It's actually calling the beginning of the function. I'm pretty sure that I didn't ask gcc to change my memset into a recursive memset, so something fishy is going on.

The Fix

After comparing the optimizations done by gcc 4.8 and my old compiler, I found the name of the optimization. The easy fix is to disable that optimization and be on my merry way. But is there a better fix?

Short answer: no. When compiling programs, gcc emits calls to standard library functions. And that's okay, because memset is part of the standard library - it is going to be there. Except when you're implementing the standard library! Even the GNU C library had problems with that optimization, and had to disable it. In kernel space, there are NO libraries outside of libgcc! In order to compile SeaOS, the flag -nostdlib is specified, which tells gcc to not link any libraries. If I had renamed my memset implementation to fill_memory_with_a_value and called that instead, all my code would have worked, except that it wouldn't link because gcc would insist that memset was still present (since it would just optimize that function away to memset).

To be fair, gcc documentation does say that even with -nostdlib, gcc may still generate calls to memset and friends. Personally, I think that is idiotic behavior. The flag specifically means "Hey! There aren't any standard libraries! There may not be any standard functions! As much as you want to believe that memset is around, that isn't going to make it true!". The real fix would be for gcc to fix this broken behavior, or to at least add another flag called "-nostdlib-for-real-though-im-not-screwing-around".


Note: gcc has actually had this option for quite some time, but it never broke anything until gcc 4.8. This is because gcc 4.8 became much better at recognizing code that could be optimized to a call to memset.

posted 2014-11-05 by Daniel Bittman (send me an email or follow me on twitter!)