Tutorial :My 32 bit headache is now a 64bit migraine?!? (or 64bit .NET CLR Runtime issues)


What unusual, unexpected consequences have occurred in terms of performance, memory, etc when switching from running your .NET applications under the 64 bit JIT vs. the 32 bit JIT? I'm interested in the good, but more interested in the surprisingly bad issues people have run into.

I am in the process of writing a new .NET application which will be deployed in both 32bit and 64bit. There have been many questions relating to the issues with porting the application - I am unconcerned with the "gotchas" from a programming/porting standpoint. (ie: Handling native/COM interop correctly, reference types embedded in structs changing the size of the struct, etc.)

However, this question and it's answer got me thinking - What other issues am I overlooking?

There have been many questions and blog posts that skirt around this issue, or hit one aspect of it, but I haven't seen anything that's compiled a decent list of problems.

In particular - My application is very CPU bound and has huge memory usage patterns (hence the need for 64bit in the first place), as well as being graphical in nature. I'm concerned with what other hidden issues may exist in the CLR or JIT running on 64 bit Windows (using .NET 3.5sp1).

Here are a few issues I'm currently aware of:

I'd like to know what other, specific, issues people have discovered in the JIT on 64bit Windows, and also if there are any workarounds for performance.

Thank you all!


Just to clarify -

I am aware that trying to optimize early is often bad. I am aware that second guessing the system is often bad. I also know that portability to 64bit has its own issues - we run and test on 64bit systems daily to help with this. etc.

My application, however, is not your typical business application. It's a scientific software application. We have many processes that sit using 100% CPU on all of the cores (it's highly threaded) for hours at a time.

I spend a LOT of time profiling the application, and that makes a huge difference. However, most profilers disable many features of the JIT, so the small details in things like memory allocation, inlining in the JIT, etc, can be very difficult to pin down when you're running under a profiler. Hence my need for the question.


I remember hearing an issue from an IRC channel I frequent. It optimises away the temporary copy in this instance:

EventHandler temp = SomeEvent;  if(temp != null)  {      temp(this, EventArgs.Empty);  }  

Putting the race condition back in and causing potential null reference exceptions.


A particularly troublesome performance problem in .NET relates to the poor JIT:


Basically, inlining and structs don't work well together on x64 (although that page suggests inlining now works but subsequent redunant copies aren't eliminated, that sounds suspect given the tiny perf. difference).

In any case, after wrestling with .NET long enough for this, my solution is to use C++ for anything numerically intensive. Even in "good" cases for .NET, where you're not dealing with structs and using arrays where the bounds-checking is optimized out, C++ beats .NET hands down.

If you're doing anything more complicated than dot products, the picture gets worse very quickly; the .NET code is both longer + less readable (because you need to manually inline stuff and/or can't use generics), and much slower.

I've switched to using Eigen in C++: it's absolutely great, resulting in readable code and high performance; a thin C++/CLI wrapper then provides the glue between the compute engine and the .NET world.

Eigen works by template meta-programming; in compiles vector-expressions into SSE intrinsic instructions and does a lot of the nastiest cache-related loop unrolling and rearranging for you; and though focused on linear algebra, it'll work with integers and non-matrix array expressions too.

So, for instance, if P is a matrix, this kind of stuff Just Works:

1.0 /  (P.transpose() * P).diagonal().sum();  

...which doesn't allocate a temporarily transposed variant of P, and doesn't compute the whole matrix product but only the fields it needs.

So, if you can run in Full Trust - just use C++ via C++/CLI, it works much much better.


Most of the time Visual Studio and the compiler do a pretty good job of hiding the issues from you. However, I am aware of one major problem that can arise if you set your app to auto-detect the platform (x86 vs x64) and also have any dependencies on 32bit 3rd party dlls. In this case, on 64bit platforms it will try to call the dlls using 64bit conventions and structures, and it just won't work.


You mentioned the porting issues, those are the ones to be concerned with. I (obviously) don't know your application, but trying to second-guess the JIT is often a complete waste of time. The people that write the JIT have an intimate understanding of the x86/x64 chip architecture, and in all likelyhood know what performs better and what performs worse than probably anyone else on the planet.

Yes, it's possible that you have a corner case that is different and unique, but if you're "in the process of writing a new application" then I wouldn't worry about the JIT compiler. There's likely a silly loop that can be avoided somewhere that will buy you 100x the performance improvement you'll get from trying to second-guess the JIT. Reminds me of issues we ran into writing our ORM, we'd look at code and think we could tweek a couple of machine instructions out of it... of course, the code then went off and connected to a database server over a network, so we were triming microseconds off a process that was bounded by milliseconds somewhere else.

Universal rule of performance tweaking... If you haven't measured your performance you don't know where your bottlenecks are, you just think you know... and you're likely wrong.


About Quibblesome's answer:

I tried to run the following code in my Windows 7 x64 in Release mode without debugger, and NullReferenceException has never been thrown.

using System;  using System.Threading;    namespace EventsMultithreadingTest  {      public class Program      {          private static Action<object> _delegate = new Action<object>(Program_Event);          public static event Action<object> Event;            public static void Main(string[] args)          {              Thread thread = new Thread(delegate()                  {                      while (true)                      {                          Action<object> ev = Event;                            if (ev != null)                          {                              ev.Invoke(null);                          }                      }                  });              thread.Start();                while (true)              {                  Event += _delegate;                  Event -= _delegate;              }          }            static void Program_Event(object obj)          {              object.Equals(null, null);          }      }  }  


I believe the 64 JIT is not fully developed/ported to take advantage of the such 64 bit architecture CPUs so it has issues, you may be getting 'emulated' behavior of your assemblies which may cause issues and unexpected behavior. I would look into cases where this can be avoided and/or maybe see if there is good fast 64 c++ compiler to write time critical computations and algorithms. But even if you have difficulties finding info or have no time to read through dissembled code I'm quite sure that taking out heavy computation outside the managed code would decrease any issues you may have & boost up performance [somewhat sure you are already doing this but just to mention:)]


A profiler shouldn't significantly influence your timing results. If the profiler overheads really are "significant" then you probably can't squeeze much more speed out of your code, and should be thinking about looking at your hardware bottlenecks (disk, RAM, or CPU?) and upgrading. (Sounds like you are CPU bound, so that's where to start)

In general, .net and JIT frees you from most of the porting problems of 64 bit. As you know, there are effects relating to the register size (memory usage changes, marshalling to native code, needing all parts of the program to be native 64-bit builds) and some performance differences (larger memory map, more registers, wider buses etc), so I can't tell you anything more than you already know on that front. The other issues I've seen are OS rather than C# ones - there are now different registry hives for 64-bit and WOW64 applications, for example, so some registry accesses have to be written carefully.

It's generally a bad idea to worry about what the JIT will do with your code and try to adjust it to work better, because the JIT is likely to change with .net 4 or 5 or 6 and your "optimisations" may turn into inefficiencies, or worse, bugs. Also bear in mind that the JIT compiles the code specifically for the CPU it is running on, so potentially an improvement on your development PC may not be an improvement on a different PC. What you get away with using today's JIT on today's CPU might bite you in a years time when you upgrade something.

Specifically, you cite "properties are not inlined on x64". By the time you have run through your entire codebase turning all your properties into fields, there may well be a new JIT for 64 bit that does inline properties. Indeed, it may well perform better than your "workaround" code. Let Microsoft optimise that for you.

You rightly point out that your memory profile can change. So you might need more RAM, faster disks for virtual memory, and bigger CPU caches. All hardware issues. You may be able to reduce the effect by using (e.g.) Int32 rather than int but that may not make much difference and could potentially harm performance (as your CPU may handle native 64-bit values more efficiently than half-size 32-bit values).

You say "startup times can be longer", but that seems rather irrelevant in an application that you say runs for hours at 100% CPU.

So what are you really worried about? Maybe time your code on a 32-bit PC and then time it doing the same task on a 64-bit PC. Is there half an hour of difference over a 4 hour run? Or is the difference only 3 seconds? Or is the 64 bit PC actually quicker? Maybe you're looking for solutions to problems that don't exist.

So back to the usual, more generic, advice. Profile and time to identify bottlenecks. Look at the algorithms and mathematical processes you are applying, and try to improve/replace them with more efficient ones. Check that your multithreading approach is helping rather than harming your performance (i.e. that waits and locks are avoided). Try to reduce memory allocation/deallocation - e.g. re-use objects rather than replacing them with new ones. Try to reduce the use of frequent function calls and virtual functions. Switch to C++ and get rid of the inherent overheads of garbage collection, bounds checking, etc. that .net imposes. Hmmm. None of that has anything to do with 64 bit, does it?


I'm not that familiar with 64-bit issues, but I do have one comment:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- Donald Knuth

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »