share
Stack Overflow32 vs 64 bits... what's the big deal?
[+51] [19] stalepretzel
[2008-09-25 12:14:00]
[ osx 64bit 32bit ]
[ http://stackoverflow.com/questions/132930]

Why is it considered to be such a big deal to have a 64-bit computer? Why does it "change everything?" Why do applications need to be designed differently between 32- and 64-bit platforms?

And, on OS X, how do you find which one you have!?

[+63] [2008-09-25 12:19:02] Brian R. Bondy [ACCEPTED]

Basically you can do everything to a bigger scale:

  1. RAM per OS: RAM limit of 4GB on x86 for the OS (most of the time)
  2. RAM per process: RAM limit of 4GB on x86 for processes (always). If you think this is not important, try running a huge MSSQL database intensive application. It will use > 4GB itself if you have it available and run much better.
  3. Addresses: Addresses are 64bits instead of 32bits allowing you to have "bigger" programs that use more memory.
  4. Handles available to programs: You can create more file handles, processes, ... Example on Windows x64 you can create > 2000 threads per process, but on x86 closer to a few hundred.
  5. Wider programs available: From an x64 you can run both x86 and x64 programs. (Example windows: wow64, windows32 on windows64 emulation)
  6. Emulation options: From an x64 you can run both x86 and x64 VMs.
  7. Faster: Some calculations are faster on a 64-bit CPU
  8. Dividing multiple system resources: A lot of RAM memory is very important when you want to run at least one VM which divides up your system resources.
  9. Exclusive programs available: Several new programs only support x64. Example Exchange 2007.
  10. Future obsolete x86?: Over time more and more 64-bit will be used and more and more x86 will not be used. So vendors will support only 64-bit more and more.

The 2 big types of 64-bit architectures are x64 and IA64 architectures. But x64 is the most popular by far.

x64 can run x86 commands as well as x64 commands. IA64 runs x86 commands as well, but it doesn't do SSE extensions. There is hardware dedicated on Itanium for running x86 instructions; it's an emulator, but in hardware.

As @Phil mentioned you can get a deeper look of how it works here [1].

[1] http://arstechnica.com/articles/paedia/cpu/x86-64.ars?bub

(1) Um. IA64 runs x86 commands. It doesn't do SSE extensions, though. There is hardware dedicated on Itanium for running x86 instructions; it's an emulator, but in hardware. - ΤΖΩΤΖΙΟΥ
(2) A few years ago, Raymond Chen posted about the 2000 thread "limit", and it's more or less an urban legend: blogs.msdn.com/oldnewthing/archive/2005/07/29/444912.aspx - bk1e
Upvote for Arstechnica for their explanation. - Avihu Turzion
1
[+28] [2008-09-25 12:16:25] Phil Wright

The biggest impact that people will notice at the moment is that a 32bit PC can only address a maximum of 4GB of memory. When you take off memory allocated for other uses by the operating system your PC will probably only show around 3.25GB of usable memory. Move over to 64bit and this limit disappears.

If your doing serious developement then this could be very important. Try running several virtual machines and you soon run out of memory. Servers are more likely to need the extra memory and so you will find that 64bit usage is far greater on servers than desktops. Moore's law ensures that we will have ever more memory on machines and so at some point desktops will also switch over to 64bit as the standard.

For a much more detailed description of the processor differences check out this excellent article from ArsTechnica [1].

[1] http://arstechnica.com/articles/paedia/cpu/x86-64.ars?bub

(4) The 32-bit platform and the 4GB limitation is somewhat of a misnomer and is (was) mainly an operating system architectural choice/design limit. Really, the 4GB from 32-bits is really on a limit in an process VA space. The physical address supports 36-bits on Intel 32-bit CPU's - Tall Jeff
(1) You make a good point which is certainly true. But the impact in the real world of PC users is that there machine is not going to use the full 4GB that they paid for. My Dad had this issue and is still confused that the 4GB he paid for cannot be fully used. - Phil Wright
(1) Appreciate your point, but just trying to drive the notion that the fix is not in the processor or going to 64-bits, it is just a matter of a slightly improved OS design. This is addressed, for example, on the enterprise versions of Windows even back in 32-bit versions. It allows for 64GB of RAM. - Tall Jeff
Technically, the limit doesn't disappear. It moves further out to where it is impractical/impossible to install that much RAM on a machine any time in the next decade or so. - Jason Z
2
[+9] [2008-09-25 12:29:56] Greg Whitfield

Not sure I can answer all your questions without writing a whole essay (there's always Google...), but you don't need to design your apps differently for 64bit. I guess what is being referred to is that you have to be mindful of things like pointer sizes are no longer the same size as ints. And you have a whole load of potential problems with inbuilt assumptions on certain types of data being four bytes long that may no longer be true.

This is likely to trip up all kinds of things in your application - everything from saving/loading from file, iterating through data, data alignment, all the way to bitwise operations on data. If you have an existing codebase you are trying to port, or work on both, it is likely you will have a lot of little niggles to work through.

I think this is an implementation issue, rather than a design one. I.e. I think the "design" of say, a photo editing package will be the same whatever the wordsize. We write code that compiles to both 32bit and 64bit versions, and the design certainly does not differ between the two - it's the same codebase.

The fundamental "big deal" on 64bit is that you gain access to a much larger memory address space than 32bit. This means that you can really chuck in more than 4Gb of memory into your computer and actually have it make a difference.

I'm sure other answers will go into the details and benefits more than I.

In terms of detecting the difference then programatically you just check for the size of a pointer (e.g. sizeof (void*)). The answer of 4 means its 32 bits, and 8 means you are running in a 64bit environment.


(3) If you write programs that casually assume that certain pointer types are the same size as certain integral types, ur doin it rong. This has been true for a long time. - David Thornley
@David: You're absolutely right. Unfortunately, there's a ton of code out there that does exactly that. - j_random_hacker
3
[+9] [2008-10-13 10:45:24] James

Nothing is free: although 64-bit applications can access more memory than 32-bit applications, the downside is that they need more memory. All those pointers that used to need 4 bytes, now they need 8. For example, the default requirement in Emacs is 60% more memory when it's built for a 64-bit architecture. This extra footprint hurts performance at every level of the memory hierarchy: bigger executables take longer to load from disk, bigger working sets cause more paging and bigger objects mean fewer fit in the processor caches. If you think about a CPU with a 16K L1 cache, a 32-bit application can work with 4096 pointers before it misses and goes to the L2 cache but a 64-bit application has to reach for the L2 cache after just 2048 pointers.

On x64 this is mitigated by the other architectural improvements like more registers, but on PowerPC if your application can't use >4G it's likely to run faster on "ppc" than "ppc64". Even on Intel there are workloads that run faster on x86, and few run more than a 5% faster on x64 than x86.


4
[+5] [2008-09-25 12:20:04] Tom

In addition to the fact that 64-bit machines can easily address more memory (it isn't true to say that 32-bit machines can only access 4GB as PAE can be used in many cases to use more) the 64-bit processors also often have additional hardware registers and other hardware optimizations. These additional features can often significantly increase the performance of apps compiled for 64-bit processors, even if they don't use a lot of memory.


5
[+4] [2008-09-25 12:34:13] Mecki

A 32 Bit process has a virtual addresses space of 4 GB; this might be too little for some apps. A 64 Bit app has a virtually unlimited address space (of course it is limited, but you will most likely not hit this limit).

On OSX there are other advantages. See the following article [1], why having the kernel run in 64 Bit address space (regardless if your app runs 64 or 32) or having your app run in 64 Bit address space (while the kernel is still 32 Bit) leads to much better performance. To summarize: If either one is 64 Bit (kernel or app, or both of course), the TLB ("translation lookaside buffer") doesn't have to be flushed whenever you switch from kernel to use space and back (which will speed up RAM access).

Also you have performance gains when working with "long long int" variables (64 Bit variables like uint64_t). A 32 Bit CPU can add/divide/subtract/multiply two 64 Bit values, but not in a single hardware operation. Instead it needs to split this operation into two (or more) 32 Bit operations. So an app that works a lot with 64 Bit numbers will have a speed gain of being able to do 64 Bit math directly in hardware.

Last but not least the x86-64 architecture offers more registers than the classic x86 architectures. Working with registers is much faster than working with RAM and the more registers the CPU has, the less often it needs to swap register values to RAM and back to registers.

To find out if your CPU can run in 64 Bit mode, you can look at various sysctl variables. E.g. open a terminal and type

sysctl machdep.cpu.extfeatures

If it lists EM64T, your CPU supports 64 Bit address space according to x86-64 standard. You can also look for

sysctl hw.optional.x86_64

If it says 1 (true/enabled), your CPU supports the x86-64 Bit mode, if it says 0 (false/disabled), it does not. If the setting is not found at all, consider it being false.

Note: You can also fetch sysctl variables from within a native C app, no need to use the command line tool. See

man 3 sysctl
[1] http://www.appleinsider.com/articles/08/09/04/road_to_snow_leopard_twice_the_ram_half_the_price_64_bits.html

error: "machdep.cpu.extfeatures" is an unknown key - alamar
I guess it isn't called EM64T, also, if you aren't unfortunate enough to have intel. - alamar
6
[+4] [2008-09-25 12:36:07] Michael Dorfman

Besides the obvious memoryspace issues that most people are mentioning here, I think it is worth looking at the notion of "broadword computing" that Knuth (among others) has been speaking about lately. There are a lot of efficiencies to be gained through bit manipulation, and bitwise operations on a 64-bit word go a lot further than on a 32-bit word. In short, you can do more operations in registers without having to hit memory, and from a performance perspective, that's a pretty huge win.

Take a look at Volume 4, pre-Fascicle 1A for some examples of the cool tricks I am talking about.


7
[+3] [2009-05-01 21:19:47] Marco van de Voort

Note that addressspace can be used for more than (real) memory. One can also memory map large files, which can improve performance in more odd access patterns because the more powerful and efficient block-level VM level caching kicks in.

Some of the things said in this thread (like the doubling of # registers) only apply to x86-> x86_64, not to 64-bit in general. Just like the fact that under x86_64 one guaranteed has SSE2, 686 opcodes and a cheap way to do PIC. These features are strictly not about 64-bit.

Moreover quite often people point to doubling of registers as the cause of the speedup, while it is more likely the default SSE2 use that does the trick (accelerating memcpy and similar functions). If you enable the same set for x86 the difference is way smaller. () (*)

Also keep in mind that there is often an initial penalty involved because the average data structure will increase simply because the size of a pointer is larger. This has also cache effects, but is more significantly noticable in the fact that the average memcpy() (or whatever the equivalent for memory copy is in your language) will take longer. This is only in the magnitude of a few percent btw, but the speedups named above are also in that magnitude.

Usually alignment overhead is also bigger on 64-bit architectures, blowing up structures even more.

Overall, my simple tests indicate they will roughly cancel each other out, if drivers and runtime libraries have fully adapted, giving no significant speed difference for the average app. However some apps can suddenly get faster (e.g. when depending on AES) or slower (crucial datastructure is constantly moved around/scanned/walked and contains a lot of pointers)

Note that most JIT-VM languages (Java, .NET) use a significantly more pointers on average (internally) than e.g. C++. Probably their memory use increases more than for the average program, but I don't dare to equate that directly to slowing effects (since these are really complex and funky beast and often hard to predict without measuring)

(*) a little known fact is that the number of SSE registers also doubles in 64-bit mode

(**) Dr Dobbs had a nice article about it a few years ago.


8
[+2] [2009-05-03 15:05:47] knweiss

Apart from the already mentioned advantages here are some more regarding security:

  • x86_64 cpus do have the no-execute bit in their page tables. I.e. this can prevent secruity exploits cause by buffer overruns. 32-bit x86 cpus do only support this feature in the PAE mode.
  • Bigger address space allows for better address space layout randomization (ASLR) which makes exploitation of buffer overruns harder.
  • x86_64 cpus feature position-independent code i.e. data access relative to the instruction pointer register (RIP).

Another advantage that comes to mind is that the amount of virtual contiguous memory allocated with vmalloc() in the Linux kernel can be larger in 64 bit mode.


9
[+2] [2008-11-25 20:35:52] Die in Sente

This thread is too long already, but ...

Most of the replies focus on the fact that you have a larger, 64-bit address space, so you can address more memory. For about 99% of all applications, this is totally irrelevant. Large whoop.

The real reason 64-bit is good is not that the registers are bigger, but there are twice as many of them! That means that the compiler can keep more of your values in register instead of spilling them to memory and loading them back in a few instructions later. If and when an optimizing compiler is unrolling your loops for you, it can unroll them roughly twice as much, which can really help performance.

Also, the subroutine caller/callee conventions for 64-bit have been defined to keep most of the passed parameters in registers instead of the caller pushing them onto the stack and the callee poping them off.

So a "typical" C/C++ application will get about a 10% or 15% performance improvement just by recompiling for 64-bit. (Assuming some portion of the app was compute bound. Of course, this is not guarenteed; All computers wait a the same speed. Your Mileage May Vary.)


While the instruction set is better for x64 than x86, that will typically be unimportant. 64-bit code can be slower than 32-bit, also, because the instructions can get bigger, so fewer of them fit in the cache. (Unrolling loops, BTW, is a very questionable technique nowadays, since it will increase the number of cache misses.) Where I work, we need 64 bits for increased memory addressing. - David Thornley
David, The x64 and x86 instruction sets are almost identical, except for the operand size and some register prefixes. With IA64, aka Itanium aka Itanic, 64-bit codes would typically be 3x the x86 codes, and stress the instruction cache exactly as you say. That was a big factor in why that architecture failed miserably. But with x86 aka AMD64 aka EM64T, that code growth is typically only 10-20%. - Die in Sente
Although x64 makes more registers addressable, I'm not sure by how much it actually increases the number of physical registers available -- all recent x86 processors have many (> 100) "shadow" registers, and use "register renaming" + speculative execution to allow independent code paths to execute in parallel to a degree. In effect, if n independent code paths are executing, n times as many registers are available (until all shadow registers run out). - j_random_hacker
@j_random_hacker. You are absolutely right that those tricks are going on underneath the architecture. But no matter how many shadow registers are available, if the program needs to work with more than 8 data items and only 8 registers are exposed in the instruction set, the compiler must generate the store/reload instructions. So yes, X64 really does make twice as many registers "available" - Die in Sente
My experience is that this is way less, and is offset by the fact that the avg memblock to be moved is larger. - Marco van de Voort
10
[+2] [2008-09-25 12:31:34] plinth

Think about image processing for a moment. If you look at medical imaging, you're routinely dealing with moderately high resolution images that are 32 bits per channel, so if they're color, that's 96 bits per pixel. A typical image may take up 200M or more when uncompressed. Processing that into a target buffer will require another 200M, so in one operation you would be using up 1/5 of your entire address space on a 32 bit processor. Without a great deal of care, heap fragmentation makes that operation impossible. Virtual memory doesn't help because the address space itself isn't there. 64M is much more breathing room.


11
[+1] [2008-09-25 12:21:12] stu

1) speed. if an atomic 1 cycle operation can move 64 bits instead of just 32, a lot of operations go faster. 2) I haven't kept up on memory management schemes so maybe it doesn't work this way anymore but this also means you can directly address more memory. 3) With great power comes great responsibility. Okay, that's not as applicable here. 4) Marketing. Intel and AMD can put a big number 64 on their box instead of just 32. Everybody knows bigger numbers are better.


12
[+1] [2008-09-25 12:24:16] mmaibaum

On OS X, you have a 64bit CPU if you have a G5 or almost any of the Intel machines (the very first Yonah based machines were 32bit, everything with a Core 2 is 64bit).

As far as the OS is concerned, Leapord is the first version of the OS to support 'GUI' 64bit programs.


No, Linux was 64 bits much before. Mac OS was released in oct 2007. Mandriva released 64 bits version since 2004. And it was possible to compile Linux yourself in 64 bits even before that. - rds
13
[+1] [2008-09-25 12:17:17] Mark Cidade

With a 32-bit machine you only have 4,294,967,295 bytes of memory to address. With a 64-bit machine you have 1.84467441 × 10^19 bytes of memory.

Wikipedia says this [1]

64-bit processors calculate particular tasks (such as factorials of large figures) twice as fast as working in 32-bit environments (given example is derived from comparison between 32-bit and 64-bit Windows Calculator; noticeable for factorial of say 100 000). This gives a general feeling of theoretical possibilities of 64-bit optimized applications.

While 64-bit architectures indisputably make working with large data sets in applications such as digital video, scientific computing, and large databases easier, there has been considerable debate as to whether they or their 32-bit compatibility modes will be faster than comparably-priced 32-bit systems for other tasks. In x86-64 architecture (AMD64), the majority of the 32-bit operating systems and applications are able to run smoothly on the 64-bit hardware.

Sun's 64-bit Java virtual machines are slower to start up than their 32-bit virtual machines because Sun has only implemented the "server" JIT compiler (C2) for 64-bit platforms.[9] The "client" JIT compiler (C1), which produces less efficient code but compiles much faster, is unavailable on 64-bit platforms.

It should be noted that speed is not the only factor to consider in a comparison of 32-bit and 64-bit processors. Applications such as multi-tasking, stress testing, and clustering (for high-performance computing), HPC, may be more suited to a 64-bit architecture given the correct deployment. 64-bit clusters have been widely deployed in large organizations such as IBM, HP and Microsoft, for this reason.

[1] http://en.wikipedia.org/wiki/64-bit#32_vs_64_bit

(2) Physical address bus length is independent of whether it's a 32 or 64-bit processor. Some 32-bit processors have address buses larger than 32 bits, and no 64-bit processor has a 64-bit address bus. - Nick Johnson
Agreed. In theory, the address space is 2^64. In practice, CPU manufacturers are using smaller values...like 2^40 or 2^48. - Stu Thompson
14
[0] [2008-09-25 12:18:26] Chii

64-bit architectures refers to the way memory is addressed. In a 64-bit machine, the cpu can address up to 2^64 bytes, which is significantly larger than what a 32-bit machine can address (which is 2^32 bytes, or about 3.2gigs). come when new applications and servers need more and more ram, 64-bit will become the norm.


(2) Actually, current 64-bit machines can't address that much physical ram - they have shorter address buses than that. The virtual address space for each process is 2^64 bytes, though. - Nick Johnson
(1) In theory, the address space is 2^64. In practice, CPU manufacturers are using smaller values...like 2^40 or 2^48. - Stu Thompson
(2) 2^32 is not about 3.2 GB. It is exactly 4GB (given 1KB == 1024bytes). The reason why Windows boxes only use 3.2 GB or so is a Windows not taking full advantage of PAE thing, combined with things like video cards with huge amounts of RAM. Server editions of on PAE enabled CPUs can use 64 GB of RAM. - Evan Teran
15
[0] [2008-09-25 12:20:00] Ben Hoffstein

Here is the obligatory Wikipedia [1] article.

The reason why it's a big deal is because in a 32-bit system, programs can only utilize 2^32 addresses (4 GB), whereas in a 64-bit system they can utilize 2^64 addresses (17.2 billion GB).

Intel-powered Macs are all 64-bit as far as I know.

[1] http://en.wikipedia.org/wiki/64-bit

16
[0] [2008-09-25 13:25:57] Martín Marconcini

To answer the second part of your question, OS X Leopard is designed to run on 32 and 64 bit machines. When you run a 64 bit processor Leopard will use the 64 bits libraries.

See the Leopard website [1]

[1] http://www.apple.com/macosx/technology/64bit.html

17
[0] [2008-09-25 18:07:55] JB King

Another point to this in regards to Microsoft Windows is that for many years there has been the Win32 API which is intended for 32-bit operating systems and isn't optimized for 64 bit compiling. When I write some DLLs for my applications, I generally compile in Win32 which isn't the 64 bit version of things. Prior to Vista, there haven't been many successful 64 bit versions of Windows I believe as where I work my new machine has 4 GB of RAM but I'm still using 32-bit Windows XP Pro as it is a known stable O/S relative to XP64 or Vista.

I think you may want to also look back on when there was the shift from 16-bit to 32-bit for some more details on why the shift may be a big deal for some folks. The mission-critical applications that a company may run on a desktop, e.g. small accounting packages, may not run on a 64-bit operating system and thus there is the need to keep a legacy machine around, virtual or real.

Changing the size of an address can have some big ramifications and repercussions.


18
[0] [2008-09-26 07:22:11] Hugh Allen

Some game-playing programs use a bit-board [1] representation. Chess, checkers and othello for example have an 8x8 board, ie 64 squares, so having at least 64 bits in a machine word significantly helps performance.

I remember reading about a chess program whose 64-bit build was almost twice as fast as the 32-bit version.

[1] http://en.wikipedia.org/wiki/Bitboard

19