Contents
Introduction
First a warning, this is a difficult article which goes really deep inside the .Net machinery so if you don’t get it the first time (or even the second or third time…) don’t worry and come back later. 🙂
For a training session I’ve taught at the end of last year I wanted to demonstrate some subtleties of multi-threading, and more specifically some memory visibility issues that should cause a program to hang.
So I developed a small sample that I expected would be showing the issue, but instead of hanging as expected the program completed!
After manipulating the program further I obtained the behavior I wanted, the program was hanging, but it still didn’t explained why it managed to complete with my original version.
<SPOILER>
I suspected some JITter optimizations, and indeed it was the case, but I needed more information to completely explain this strange behavior.
As often, the StackOverflow platform was of great help; if you’re curious you can have a look at the original SO thread.
</SPOILER>
In this article I’ll “build” and explain the issue step by step, trying to make it more understandable than the SO thread which is indeed quite dry.
A no-brainer…
Say you are a naive developer who loves simplicity.
You’re asked to synchronize two threads, so you ask yourself this question: what’s the simplest way of synchronizing two threads?
Easy peasy: a spin loop.
So 2 minutes later you’re done with a simple, but you think brilliant, implementation:
using System; using System.Threading; namespace Tests { public class AwesomeSpin { bool ok = false; void Spin() { while (!ok) ; } void Run() { Thread thread = new Thread(Spin); thread.Start(); Console.Write("Press enter to notify thread..."); Console.ReadLine(); ok = true; Console.WriteLine("Thread notified."); } static void Main() { new AwesomeSpin().Run(); } } }
So the main thread starts another thread which should spin until we press enter to notify it that it has spun enough for today.
You compile your work:
> csc /optimize+ AwesomeSpin.cs
And you run it:
> NaiveSpin.exe Press enter to notify thread...<You press enter> Thread notified. >
The second >
indicates that the program has correctly terminated and that you’re back to the shell which is requesting more commands to execute.
Perfect! It works just as expected.
You’re the boss!
…well, almost
You commit and push your code and as you’ve done a pretty good job you have the right to recover from this long and exhausting coding session with a well-deserved coffee.
But before you’ve ended drinking your coffee you receive a message from the testing team:
Hello, your new component has been running since a few minutes now without any output and it seems stuck! The testing timeouts have been hit! Could you please check that all is OK? Regards
Like any developer in this situation your first thought is “WTF?”.
Then you decide to have a closer look at the situation and check how the testers have run your code and you realize the code has been compiled and run with many different configurations and hosts combinations.
The testing team has sent you a report like the following:
Platform | Host | Result |
---|---|---|
AnyCPU | x86 | OK |
AnyCPU | x64 | KO |
x86 | x86 | OK |
x86 | x64 | OK |
x64 | x86 | X |
x64 | x64 | KO |
Hum! Seems like there is an issue with 64-bit…
Note that by default CSC flags the resulting assembly as supporting platform “AnyCPU” meaning it will run in a 64-bit CLR if one is available on the host and in a 32-bit CLR otherwise.
As you’re a conscientious employee and/or a curious geek you try to reproduce the issue yourself.
You setup a 64-bit machine and update your code.
First you force your .Net binary to be run only by the 32-bit CLR:
> csc /platform:x86 /optimize+ AwesomeSpin.cs
And you rerun it:
> AwesomeSpin.exe Press enter to notify thread... Thread notified. >
So far so good.
Then you try in 64-bit mode:
> csc /platform:x64 /optimize+ AwesomeSpin.cs
And you rerun it again:
> AwesomeSpin.exe Press enter to notify thread... Thread notified. ^C
Oops, indeed it’s stuck and you have to CTRL-C to stop the program. :/
But this is a good thing: a bug that you can reproduce can be considered as half fixed.
Note that you could have used CorFlags.exe too to set the assembly’s cor-flags to run it with different CLRs but recompiling best illustrates the way you do it with VS.
When things become crazy
The code is quite light and the only idea you have to confirm that the issue is in the Spin
method is to use your best debugging wizardry … console output:
void Spin() { Console.WriteLine("\nBefore spin loop."); while (!ok) ; Console.WriteLine("After spin loop."); }
And here we go again:
compile:
> csc /platform:x86 /optimize+ AwesomeSpinDebug.cs
and run:
> AwesomeSpinDebug.exe Press enter to notify thread... Before spin loop. Thread notified. ^C
Ok it’s still stuck.
But … wait … I’m in x86 mode!
Just to check, you comment the two Console.WriteLine
lines:
void Spin() { // Console.WriteLine("\nBefore spin loop."); while (!ok) ; // Console.WriteLine("After spin loop."); }
One more compilation:
> csc /platform:x86 /optimize+ AwesomeSpinDebug.cs
And one more run:
> AwesomeSpinDebug.exe Press enter to notify thread... Before spin loop. Thread notified. >
And it works again!
As developers we all know these moments, when you feel you’ve lost control over things and the machine does what it wants.
This time you really think software development is not a job for you and it’ll get you crazy, and you start to ask Google if there is not an open position at the closest fast-food restaurant.
WTF?
But in a last fit of pride you decide to investigate more and you decompile your executables with your favorite IL disassembler.
(I often use ILSpy but for simple cases like this one ILDasm does the job too.)
With platform x86 without debugging output (the original version) you get:
.method private hidebysig instance void Spin() cil managed { // Code size 9 (0x9) .maxstack 8 IL_0000: ldarg.0 IL_0001: ldfld bool Tests.MemoryVisibility::ok IL_0006: brfalse.s IL_0000 IL_0008: ret } // end of method MemoryVisibility::Spin
With x64 platform, still without debugging output:
.method private hidebysig instance void Spin() cil managed { // Code size 9 (0x9) .maxstack 8 IL_0000: ldarg.0 IL_0001: ldfld bool Tests.MemoryVisibility::ok IL_0006: brfalse.s IL_0000 IL_0008: ret } // end of method MemoryVisibility::Spin
And finally with platform x86 with debugging output:
.method private hidebysig instance void Spin() cil managed { // Code size 29 (0x1d) .maxstack 8 IL_0000: ldstr "\nBefore spin loop." IL_0005: call void [mscorlib]System.Console::WriteLine(string) IL_000a: ldarg.0 IL_000b: ldfld bool Tests.MemoryVisibility::ok IL_0010: brfalse.s IL_000a IL_0012: ldstr "After spin loop." IL_0017: call void [mscorlib]System.Console::WriteLine(string) IL_001c: ret } // end of method MemoryVisibility::Spin
As you’re probably not a CIL guru (pardon if you are) let me give you a little insight.
The important part, i.e. the spinning, is in these 3 lines of code:
IL_0000: ldarg.0 IL_0001: ldfld bool Tests.MemoryVisibility::ok IL_0006: brfalse.s IL_0000
It means:
- push the first method argument, i.e. the implicit “
this
” reference, at the top of the thread stack - pop the object reference which is at the top of the stack and push the value of the object’s “
ok
” field - check the boolean value at the top of the stack: if false go to the instruction 2 lines above, else continue to the next instruction
Conclusion: the spinning part is exactly the same (except of course the labels’ offsets) for the three programs.
And you suddenly remember that the platform (x86 or x64) only instructs the C# compiler to generate metadata that will determine which CLR will run the code, without impacting the way the C# compiler generates IL code.
And this is a good thing, only a native compiler should take care of the x86/x64 dichotomy issues.
So the issue is not at the IL level and you know what that means: you’ll have to go deeper, where no .Net developer should have to go (and where 99.42% of them will never go, and this is a good thing): in the native assembly Mordor!
But as an ex seasoned C/C++ programmer you don’t fear it!
Inside the Mount Doom
In a last effort to preserve your mental sanity you run your programs again but this time you attach Visual Studio to check the resulting native assembly code of the Spin
method.
With platform x86 and with output you get:
00000000 push ebp 00000001 mov ebp,esp 00000003 push esi 00000004 mov esi,ecx 00000006 call 5BE97904 0000000b mov ecx,eax 0000000d mov edx,dword ptr ds:[03352178h] 00000013 mov eax,dword ptr [ecx] 00000015 mov eax,dword ptr [eax+3Ch] 00000018 call dword ptr [eax+10h] 0000001b movzx eax,byte ptr [esi+4] 0000001f test eax,eax 00000021 je 0000001F 00000023 call 5BE97904 00000028 mov ecx,eax 0000002a mov edx,dword ptr ds:[0335217Ch] 00000030 mov eax,dword ptr [ecx] 00000032 mov eax,dword ptr [eax+3Ch] 00000035 call dword ptr [eax+10h] 00000038 pop esi 00000039 pop ebp 0000003a ret
The spinning part being:
0000001f test eax,eax 00000021 je 0000001F
With platform x86 but no output:
00000000 push ebp 00000001 mov ebp,esp 00000003 cmp byte ptr [ecx+4],0 00000007 je 00000003 00000009 pop ebp 0000000a ret
Spinning part:
00000003 cmp byte ptr [ecx+4],0 00000007 je 00000003
And with platform x64 without output:
00000000 mov al,byte ptr [rcx+8] 00000003 movzx ecx,al 00000006 test ecx,ecx 00000008 je 0000000000000006 0000000a rep ret
Spinning part:
00000006 test ecx,ecx 00000008 je 0000000000000006
Again all this bunch of cryptic code deserves some explanations:
- the
cmp
instruction compares its two operands and set some CPU flags depending on the result: the zero-flag a.k.aZF
is set (1) if they are equals, unset (0) otherwise - the
test
instruction does a binary AND between its two operands and depending on the result set some flags: theZF
flag is set (1) if the result is 0, unset (0) otherwise - the
je
instruction jumps to the instruction at the given label if theZF
flag is set (1)
So the loops run while a zero (false) value is provided either as the first operand of cmp
or as the first and second operands of test
.
But the most important thing to notice is that:
- sometimes the .Net JITter directly compares the “
ok
” flag “from memory” in a place shared by the main thread and the spin thread (at address[ecx+4]
) - sometimes it caches the value in a CPU register (
eax
orecx
) where it will be only accessible from the spin thread
In the latter case the spin thread can’t see the new flag value because it only looks in a register: this is the memory visibility issue I wanted to demonstrate at first.
So you get the answer to the initial question: the behavior varies depending on the way the different JITters (the one of the x86 CLR and the one of the x64 CLR) optimize the code when they compile the IL code in native binary code.
Solution
So now you’ve understood why your code was behaving “strangely” in some context.
Of course you can’t release such a code into the wild and you must fix it so that it will have a consistent behavior whatever the CLR used to run it.
There is a well known solution for this “issue”: tagging the data you want to protect from any over-optimization with the “volatile” metadata.
The volatile
concept exists in most of the languages: it instructs the compilers that they should not try to do too clever optimizations because they could completely mess up the program, as demonstrated above: checking a copy of the value instead of the value is indeed not a good idea but the compiler does not understand your code’s semantics.
With languages that are directly compiled to native code like C or C++ the volatile keyword is directly interpreted by their respective compilers when they generate the native library or executable.
But the C# compiler does far less work than a C/C++ compiler as most of the optimizations are deferred to the native compilation step done by the CLRs’ JITters.
And indeed the C# volatile modifier is simply forwarded, through some assembly metadata (System.Runtime.CompilerServices.IsVolatile
) to the JITters, informing them that they should be cautious and ensure that:
- every read of the variable returns the latest value
- every write updates the variable with the latest value
This means that the JITters can’t do some optimizations anymore, like caching the value in a register for faster access, which is of course a bad idea in our case.
So let’s try with this fix:
volatile bool ok = false; void Spin() { while (!ok) ; }
If you now have a look at the IL generated by the C# compiler you see:
.method private hidebysig instance void Spin() cil managed { // Code size 11 (0xb) .maxstack 8 IL_0000: ldarg.0 IL_0001: volatile. IL_0003: ldfld bool modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) Tests.AwesomeSpinFixed::ok IL_0008: brfalse.s IL_0000 IL_000a: ret } // end of method AwesomeSpinFixed::Spin
So this is the same code protected with some “volatile” metadata.
And now it works like a charm whatever the code you put before and after the spin loop, whatever the platform you set at compile time and whatever the CLR you use at runtime.
This time you can gaze at you brilliant work with pride for good.
Conclusion
This was a tricky one, probably the trickiest thing I’ve done with .Net, but it’s a really interesting one for at least three reasons:
- it demonstrates the memory visibility issue
- it shows that multi-threaded code can be quite subtle, particularly when optimizations come into play, so that great care should be taken when writing it
- .Net does its best to encapsulate the underlying platform in a consistent manner but as a lot of abstractions it’s not a perfect abstraction but a “leaky” abstraction, meaning that the programmer has sometimes to be aware of some underlying things which are not perfectly abstracted by the higher level abstraction.
The latter point is not an issue by itself and there is more important leaky abstractions like floating point numbers (but that’s for another article).
Of course you should never synchronize your threads with such a basic construct, and if after a thorough profiling of your code you determine that you really need spinning then you can use the .Net framework SpinLock; but be aware that it’s a value type so be very cautious when using it too.
Kudos to Hans Passant for confirming the issue and to MagnatLU for providing the debug wisdom necessary to extract the native assembly code and make the issue “clearly” appear.
If you catch any typo or mistake, have additional questions or want to share this kind of crazy experience, feel free to let a comment.
Note to any future employer: this is not a real-life story, I promise that, except for demonstration purposes, I’ve never tried to spin a thread this way! 😉
Thanks for a well though out article that provided genuinely interesting reading, this is an issue I have come across before and it was nice to understand exactly what was going on under the covers rather than just accepting that marking the variable as volatile will fix it.
Glad you liked it Paul!
This issue has bitten more than one developer, including me when I was not expecting it. 🙂
Good Article