Intel has in an ironic twist launched its new, most expensive processor just about the same time as the stores start their sales after Christmas. We have taken a closer look at the beast Intel has named Pentium Extreme Edition 955.

At the transition to a 90nm manufacturing process, something that would be a long down period for Intel started and the cause – heat. For several years it struggled with refining its manufacturing process to be able to increase the frequencies of the Prescott core. The goal was to release a processor at 4GHz, but this turned out to be too much. Intel has for the last year been working on setting their factories to use 65nm technology and it has according to information we’ve received already reached a productivity equal to the already refined 90nm process.

Intel launches its 900 series to replace the 800 series processors. This time we’re talking dual cores, which with the help of the 65nm manufacturing process has reached higher frequencies. With the flagship it has removed all limitations by increasing the processor bus to 1066FSB, left the multipliers unlocked and last but not least activated HyperThreading on both cores. The name of this beast is of course Intel Pentium Extreme Edition 955, simply 955XE

Let’s take a closer look.


There’s two main ways to make a dual core processor. One is to design a circuit which contains two calculation units and the other method is when you simply take two processors and put them in one package. Of course it’s not really that simple. The 900 series processors is designed using the latter of the two methods, it’s made with two Cedarmill cores in the package that Intel calls Presler. The biggest change since the 800 series is as we mentioned in the introduction, the transition to 65nm manufacturing process, which greatly increases the number of circuits you get out of one so called wafer in the factories. You usually get less heat dissipation when you change to a finer manufacturing process, but that wasn’t the case with the Prescott core that had a noticeably higher power leakage, compared to Northwood. Let’s hope Intel learned their lesson and can manage to keep it under control. Let’s list some of the main features of the 900-series processors compared to earlier series.


Processor comparison
600 series 800 series 900 series 955XE
Core Prescott Smithfield Presler Presler
Manufacturing process 90nm 90nm 65nm 65nm
Cores 1 2 2 2
Frequencies 3.0 – 3.8GHz 2.8 – 3.2GHz 2.8 – 3.4GHz 3.46GHz
Front Side Bus 200MHz/800FSB 200MHz/800FSB 200MHz/800FSB 266MHz/1066FSB
Level 2-cache 2MB 2MB (2x1MB) 4MB (2x2MB) 4MB (2x2MB)
HyperThreading (HT) Yes No No Yes
EIST Yes Yes Yes Yes
EM64T Yes Yes Yes Yes
Execute Disable Bit Yes Yes Yes Yes
Vanderpool No No Yes Yes
LaGrande No No Yes Yes

Let’s look closer at how dual core works and how it looks with HT.


Above, you can see a picture of two Cedarmill cores, which together make a Presler. The neighboring cores communicate with each other at same speed as the processor bus. With the protective heat plate on top, there’s no visible difference between the Presler and any other Pentium 4. Using two separate processor cores, like this, is called dual chip, while a CPU that’s been designed to use two calculation units in the same core, is called a dual core. In everyday speech however, both are called dual core. The two large, dark areas on the picture above is the L2 cache, which represent an essential part of the 376 million transistors the CPU is made of. Cedarmill’s got a 2MB L2 cache, setting a new record in the desktop CPU genre for the Presler with its 4MB L2 cache (2*2MB).

Single core + HT
Hyper-Threading (HT) is an interesting feature Intel’s been using in their Northwood core P4’s. When activated, Windows thinks there are two cores to perform calculations. Thanks to this simulation of a logical core, the CPU can relocate some of the data, enabling the CPU’s capabilities to be used optimally. We can see a graphical example of this here.

As we can see, the HT enabled CPU is, in some cases, better capable of executing calculations than the same CPU without HT enabled, even tough there’s only one physical calculation unit. Then what happens when we’ve got two physical cores? Read on.


Dual core
A dual core processor has two physical calculation units completely separated. This means that the processor can calculate two things at the same time, which in theory doubles the capacity. To be able to use the extra calculation power you need multiple threads that can be executed at the same time by running multiple programs or using a multithreaded program. Here we can see a schematic picture of the difference between a single core HT processor and a dual core processor.

Worth mentioning is that HT only increase the efficiency of the processor, the theoretical maximal calculation power is still the same. The reason for HT to work is because of the fact that a processor normally has large number of cycles where it does nothing, e.g. the processor is waiting for data from the cache, or RAM. With the help of HT you could say that the processor is given the opportunity to execute other instructions while waiting for the data for the other instruction. The obvious follow-up question is of course: Can you use HT with a dual core processor?

Dual core + HT
Sure you can, and that is just what Intel has done with its Extreme Edition processor. Windows now recognizes 4 cores where two are physical and two are logical, and then it looks like this.

The equivalent schematic picture for how threads are executed with both dual core and HT looks as follows.

Lets move on and take a look at the other features.


Enhanced Intel SpeedStep Technology (EIST)
EIST is something that comes from the mobile market where the processor has the opportunity to downclock itself when the raw amount of power isn’t necessary. EIST was introduced to the desktop market with the 600 series where the processor can change its multiple from default (15x – 19x) down to 14x. This happens at very low load on the processor and as soon as the load increases the multiple moves back to its default setting.

Extended Memory Technology (EM64T)
Intel’s expansion of the 32bit system to 64bit. Some of the advantages of 64bit technology is that a larger amount of memory can be addressed (max for a 32bit system is 4GB) and that you can get better precision during calculations. To use these functions, you just as with AMD’s Athlon64, need to be running a 64bit operating system.

Execute Disable Bit (XD)
A function that limits the area where code can be executed. Many of today’s viruses use the weakness in other programs to be able to execute its own code. With XD the processor can discover this and with the help of the operating system terminate the execution of the virus.

Vanderpool Technology (VT)
A technology for creating virtual systems to run different operating systems or run critical tasks separated from other parts of the system. Program with increased security/privacy demands can be run in an isolated and secure environment.

LaGrande Technology (LT)
LaGrande is a collection of security features which Intel is planning on introducing on the platform level. This is just not a feature of the processor but also something that will be integrated into future motherboard chipsets.

As you can see Intel likes adding technologies of all sorts. Many are relatively fresh and further support for these will arrive with future motherboard chipsets. Many of these new technologies are aimed towards what they call ”Trusted Computing”, where they in a convenient manner tries to implement protection against anything from virus to piracy directly in the hardware.

Next up is our test platform.





Test system
Hardware Intel AMD
Motherboard Intel D975XBX (i975X)
ASUS P5WD2 (i955X)
Abit Fatal1ty AN8 SLI
Processor Intel 955XE, 3.46GHz
Intel P4 660, 3.6GHz
AMD Athlon64 FX-57
Memory Corsair XMS 5400UL (2x512MB) Corsair XMS 3200 (2x512MB)
Graphic card nVidia GeForce 7800GTX512
Power supply OCZ PowerStream 520W
Software
Operating system Windows XP (SP2)
Drivers Intel Chipset Driver 7.2.2.1006
nVidia Forceware 81.95
nVidia nForce 6.67
Monitoring program ASUS AI Booster
Benchmarking program SiSoft Sandra 2005 SR3
SuperPi 1.4
3DMark2003 3.6.0
3DMark2005 1.2.0
AquaMark 3
VirtualDub 1.6.10, XviD 1.0.3
WinRAR 3.42

Explanations
Idle One hour in Windows without load
Load One hour of four instances of Prime95 running
Stable No errors reported by Prime during load
Multitasking Tests done with one instance of SuperPi running a 32M calculation in the background
Processor temperature The temperature in the processor reported by AIBooster

During our tests we will simulate a multitasking environment by running a 32M calculation of SuperPi in the background. That way we can show the effect a processor with several cores easy as the major portion of today’s games are singlethreaded we can .

Power consumption is a really hot topic right now so we will start with investigating how Intel has done with taming the power consumption of the new manufacturing process.


The power consumption if a high priority at both Intel and AMD and it is constantly working on developing new methods for for reducing the consumption with its circuits. Moving on to a finer manufacturing process usually means less power consumed and we will now investigate whether it has succeeded. To our disposal we have an Ampere-meter which is connected before the power supply. The result therefore shows the sum of all of the components (not including the monitor for obvious reasons) and just not the processor. Idle is the consumption without load in Windows while Load is with four instances of Prime running. This is to really make sure that all four logical processors are working.







There is no doubt about it, Presler gets hot, just as Prescott. However, there are twice as much of everything with Presler. A more fair comparison had perhaps been with the predecessor to Presler, the 800series, but we didn’t have any to our disposal.

First of the performance tests are Sisoft Sandra and WinRAR.


Sisoft Sandra is a program suite with applications for analysis, diagnostics and last but not least a series of performance measuring benchmarks for computer systems. We will focus on the last of these here.











 


As you can see Sisoft Sandra can use both of the processor’s cores which leads to that 955XE simply outruns ths single core processors.

We will now move on to file and video compression.



WinRAR Benchmark simply measures how good the processor is at packing files, but this test also depends a lot on the overall memory performance of the system.






File packing means a lot of irregular calculations which we already before the test suspected that Intel’s processors would run into problems here because of its long pipeline. The 955XE processor managed to defeat the single core P4 despite much lower frequency, but none of them can reach the FX-57 which has an advantage of its shorter pipeline and integrated memory controller.








Intel has earlier performed well in video compression tests and this time is no exception. Virtual Dub seems to be multithreaded, but judging from the Task manager only one core is used to the max while the other rests at 33%. This of course gives dual cores a slight advantage, which the results also show. The 955XE processor takes home the title and leaves the 660 and the FX-57 battling for a very even second place.


Next up are some 3D tests.



3DMark is the first thing that comes to mind when you’re thinking about benchmarking computers and no test is complete without one of these.




















3DMark2001 has always been a trump for AMD’s Athlon64 processors and the FX-57 is no exception. We got more surprised when the 955XE-processor proved to have so much better performance compared to the 660-processor despite the lower frequency. A big part of this probably has to do with the fact that the 955XE processor has a 266MHz FSB while the P4 only has a 200Mz FSB. 3DMark03 is the version that’s been evenly matched between Intel and AMD’s processors and with all the processors within a 300 point radius from 20000 it’s evenly matched to say the least. The 995XE comes out in front closely followed by the FX-57 and the 660 not to far behind.


Let’s move on and look at how they behave in some games.



The game tests we chose are three different FPS games of different character. UT2004 is an older game with lower system requirements, Far Cry is a newer and more advanced game and Doom3 which is similar to the latter but uses OpenGL.
















With Unreal Tournament 2004 we see similarities with 3DMark2001 where AMD comes out strong, followed by 955XE and P4 660. With Far Cry things get a bit more even and we see that the FSB helps the 955XE processor keeping up with the FX processor’s gaming performance. With Doom3 955XE beats the other two processors with quite a margin and since games are mainly singlethreaded we suspect that the FSB has a big effect on the result here, which we will investigate during our overclocking session.



Before we move on to the overclocking though, we will test how the processors behave during multitasking.



Dual CPU systems have been running as servers for long now, but it is just now that regular users on the desktop market can start to take advantage of the pros with multiple units, and then mainly dual core. The most obvious advantage, that we earlier mentioned, being that you can do more things at the same time. Making one thing won’t go any faster, as we have seen in the tests so far but what happens if we run some tests together with some other load on the system? We chose to run some of our regular benchmarking program again but this time together with SuperPi doing a 32M calculation in the background.




















Here we see that Intel’s 955XE processor might come in handy as the performance loss only varies between 6% and 27% with Aquamark. When we compare the single core processor we clearly see how HT comes into play. P4 660 drops about 25-30% overall while the FX-57 doesn’t like the situation at all and drops between 30% and 50% with these tests.

We check how multitasking affects real games next.















The scenario repeats itself and the 955XE processor once again takes home the title without any problems what so ever.


Time for some overclocking..



When processor manufacturers transit to a finer manufacturing process overclockers usually awake with a warm curiousity. The reason is that these processors are often more overclockable than their earlier siblings. The 955XE processors are released with only a 266MHz higher clock frequency than its predecessor, 840XE, so we chose to increase the clock frequencies to show what it really can do.








Because of lack of time in relation to the review we have chosen to wait with our extreme overclocking for a later article, and therefore we have only focused on overclocking with the original HSF (heatsink/fan). The heatsink that’s shipped with the 955XE has had a real increase in mass with a lot thicker copper base together with thinner and tighter aluminium fins (to the right in the middle picture). We chose not to touch the voltage to the processor which is at 1.3v default and thanks to the free multipliers only change these to minimize the difference in FSB.






It went well when we increased from 13x to both 14x, 15x and 16x, but a 16x multiplier and the resulting frequency at 4256MHz wasn’t stable enough to handle our rigorous stability tests consisting out of four instances of Prime95. The temperature went well over 70°C and the processor started to throttle to keep the temperatures down. The 15x multiplier and a 266MHz FSB lands us at 4GHz where it passes all of our demands for being stable.



Then what about the performance when we run the processor overclocked?



Because of several reasons we chose to not include any results of the FX-57 processor when overclocked. One of these reasons is that we did not have any original heatsink for the CPU, which would likely make it a better overclocker compared to the P4 processors. Instead we have chosen to concentrate on what the higher processor bus can do for the performance on for the 955XE processor by running the P4 in the exact same configuration, and of course the higher clock frequency for both the processors. The overclocked results are run on both processors running at 267×15, which gives a clock frequency of 4006MHz.




















It showed that the key for performance does not only rely on the processor bus but also the fact that the processor has dual cores. This might sound strange since we earlier said that these tests are only single threaded. What’s worth noting is that even if the test program itself is single threaded it is being run in a multi threaded environment, i.e. Windows. Vital background processes are relieved by processors with more cores and today’s graphics card drivers are also optimized for dual cores.



We summarize what we have found out in the conclusion.



Intel continues to develop their P4/Netburst architecture and even if it only looks like two 2MB Prescott cores welded on the paper the performance numbers shows that they have done some fundamental changes. To conclude our experiences with this processor there’s some special points that we’d like to mention.




  • The performance has together with a smaller addition, optimizations and larger L2 cache made the 955XE processor noticeably better than the equivalent of the 800 series.

  • The multitasking capacity has with help of HT increased the processor’s efficiency.

  • The transition to 65nm manufacturing process has lowered the heat dissipation and increased the overclocking possibilities.

  • We have already seen results from around the net that indicates that this processor clocks very well with good cooling devices, which is something that we would like to investigate further within a not too far future. Not everything is gold about this processor though, it still has some Prescott franchises.




  • The power consumption is still something that’s not completely under control among Intel’s desktop processors. Simply put, the processor gets very hot, and even more so in a closed case.

  • The P4/Netburst architecture got during the Northwood series a remarkable reduction of the performance per clockcycle, which is something that Cedarmill and Presler also got.

  • It is wide known that Intel already has a new architecture with a lot of these things fixed which probably will be launched in the middle of 2006. We look forward to what Intel has to offer with the new performance per watt concentration.













    Intel Pentium Extreme Edition 955

    Pros:
    + Very good multitasking performance

    + More performance per clock cycle compared to the 800 series

    + Overclocking potential


    Cons:
    – Power consumption
    – Most likely very expensive


    We would like to thank Intel who sent the processor and mainboard for evaluation and Corsair for the memory for the test system

    Leave a Reply

    Please Login to comment
      Subscribe  
    Notifiera vid