PS3 Hardware: Explained
This guide was written by EbonySeraphim.
This post has been cross-referenced in places where I can’t even foresee anymore. Since it is reaching a widespread audience, I will try to remove just about any of my “final” judgments, and leave things at a point where perhaps you know the difference, and if you think it’s better or worse is your own decision. In the discussion on this forum, I’ll be less reserved about expressing my views freely. If you found this article through Google or a link from another site or forum, you’re welcome to join the forums and engage in the discussion if you have questions.
A few things should be cleared up before you read or reply
-I’m not trying to make anyone happy. Sony, Microsoft, and Nintendo fans, or on the off chance you’re a PC gaming fan - I don’t care about you. PC and XBox360 comparisons were thrown in because PS3 is most commonly compared to those platforms. I’m trying to differentiate how the machines operate and what hardware components accomplish what tasks. You should get whatever console best matches your price, functional needs, and games. Differences are just that – differences.
-Yes, I am biased and doesn’t take a rocket scientist to figure out where it is. Get over it. Everyone has bias and it exists even in printed material. I am not a journalist. You’re just going to have to deal with the fact that this “publication”(if you could call it that) is flawed and sort through it if I couldn’t filter it. already. Take what I say with a grain of salt.
-If you’re going to correct me, pointing out where my bias lies is of no help. Helping would be stating the facts or analysis that is wrong or unreasonable. If the facts/research is so horribly wrong and you don’t want to explain it to me, point me in the right direction for where I can find the information and what I should be looking for there, and what it relates to. I’ll be happy to make changes or amend it to a later revision.
-I can accept criticism of both kinds. I’m not 100% correct. No analysis can be 100% correct. I’m pretty clear on indicating where things are fuzzy for me or if I don’t completely trust my analysis. You should be far more cautious in accepting my speculation where it is noted as it holds very weak ground over anyone else’s. You’re welcome to share your disproval of anything I say, but invalidating knowledge cannot be done by labeling bias.
With all that crap said, welcome to revision 1. The first section is mostly the same with some grammatical or structural corrections, but the later sections have been extensively redone in certain areas where the most negative feedback has been received. I’m still not trying to make anyone happy, but it needed serious review.
The Playstation 3 is a gaming console(or computer system) that utilizes a Cell processor with 7 operational SPEs with access to 256MB of XDR RAM, an RSX graphics chip with 256MBs of GDDR3 RAM and access to the Cell’s main memory, a blu-ray drive for gaming and movie playback, and a 2.5” hard disc drive. Other components of the system are Bluetooth support used for wireless motion-sensing controllers, 4 USB ports, and a gigabit Ethernet port. On the more expensive version of the Playstation 3 there is also a Sony Memory Stick reader, Compact Flash reader, SD card reader, WiFi (basically an extra network interface which is wireless), and an HDMI audio/video output port.
The Playstation is capable of outputting 1080p signals through all of its outputs, though it is possible that with blu-ray movie playback, an Image Constraint Token(ICT) might be present which forces a 1080p signal down to 540p if the signal is going through a non-HDCP encrypted and certified connection. Right now, aside from an actual physical HDMI port, only DVI can support HDCP encryption as far as I know. This DRM scheme only applies for movies, and games will not enforce such a restriction. If you only care about the possibility of playing 1080p games, then HDMI is not necessary unless your TV will not accept a 1080p through a non-HDMI port.
The Playstation 3’s audio processing will be handled by the Cell processor. There are many officially supported audio formats representing the highest quality formats for digital entertainment media today. Since it is done on the Cell processor, game developers are at leisure to output any format they wish. Theoretically this means 6.1, 7.1, 8.1, or beyond audio is possible unless a later change actually does restrict what is possible to output. However, this audio is going through the RSX in some form where the hardware there has been labeled to only support 7.1 channel output. Because of this, audio can be said to be limited at 7.1, but it is likely that games will not exceed 5.1 audio.
The Cell Processor:
The Cell inside the Playstation 3 is an 8 core asymmetrical CPU. It consists of one Power Processing Element(PPE), and 7 Synergistic Processing Elements(SPE). Each of these elements are clocked at 3.2GHz and are connected on a 4 ring Element Interconnect Bus(EIB) capable of a peak performance of ~204.8GB/s. Every processing element on the bus has its own memory flow controller and direct memory access (DMA) controller. Other elements on the bus are the memory controller to the 256MB XDR RAM, and two Flex I/O controllers.
I recommend you skip down to the second post/section and look at the Cell and Playstation 3 hardware diagrams before you continue reading so you get a better idea of how things come together on the system as I explain it in more detail.
The FlexIO bus is capable of ~60GB/s bandwidth total. A massive chunk of this bandwidth is allocated to communicate with the RSX graphics chip, and the remaining bandwidth is where the south bridge components are connected such as optical media(blu-ray/dvd/cd), Ethernet, hard drive, USB, memory card readers, Bluetooth devices(controllers), and WiFi. The FlexIO bus actually has two controllers to divide up this bandwidth. The Cell’s design accommodates for a multiple CPU configuration and in order to support this, one of the I/O controllers is meant to be extremely fast for CPU to CPU type communication. Since the PS3 has a single CPU, the other fast component it talks to is the RSX. The dedicated memory controller and bus to the RSX supports a total of 35GB/s bandwidth and can be considered parallel to a PC architecture north bridge. The other FlexIO controller is more akin to a south bridge and does the other stuff I explained.
I realize that if you actually went through and added the north bridge and south bridge bandwidth needs, it doesn’t even add up to 60GB/s. I actually do not know where this bandwidth dropped off, but it could be due to realistic speed limitations being factored in already.
Power Processing Element:
The PPE is based on IBM’s POWER architecture. It is a general purpose RISC(reduced instruction set) core clocked at 3.2GHz, with a 32kb L1 instruction cache, 32kb L1 data cache, and a 512kb L2 cache. It is a 64-bit processor with the ability to fetch four instructions and issue two in a single clock cycle. It is able to handle two hardware threads simultaneously. It comes with a VMX unit with 32 registers. The PPE is an in-order processor with delayed execution and limited out-of-order support for load instructions.
PPE Design Goals:
The PPE is designed to handle the general purpose workload for the Cell processor. While the SPEs(explained later) are capable of executing general purpose code, they are not the best suited to do so heavily. Compared to Intel/AMD chips, the PPE isn’t as fast for general purpose computing considering its in-order architecture and comparably less complex and robust branch prediction hardware. Writing code for an in-order processor with limited out of order execution makes writing code more difficult for developers to get the same speed they may see on an Intel/AMD processor. That being said, if extra work was put in, the difference in speed execution can be made considerably smaller and theoretically go away. This difficulty will prevent the Cell(or any PowerPC architecture) from replacing or competing with Intel/AMD chips on desktops, but in the console and multimedia world, the PPE is more than capable in terms of keeping up with the general purpose code used in games and household devices. Playstation 3 will not be running MS Word.
The PPE is also simplified to save space and improve power efficiency with less heat dissipation. This also allows the processor to be clocked at higher rates efficient. To compensate for some of the hardware shortcomings of the PPE, IBM put an effort to improve compiler generated code to utilize better instruction level parallelism. This would help reduce the penalties of in-order execution.
The VMX unit on the PPE is actually a SIMD unit. This gives the PPE some vector processing ability, but as you’ll read in the next section; the SPEs are far better equipped for vector processing tasks since they are designed to do it. The vector unit on the PPE is probably there due to inheritance from the existing POWER architecture. It still serves a purpose if code is mostly general purpose heavy but has significant gains from having a SIMD processing unit close by. Sending every SIMD task to an SPE may not be worth the complications and penalties.
Synergistic Processing Element and the SIMD paradigm:
The SPEs on the Cell are the computing cores of the Cell processor. They are independent vector processors running at 3.2GHz each. A vector processor is also known to be a single instruction multiple data (SIMD) processor. This means a single instruction(let’s say addition) can be performed in one cycle on more than one operand. This can effectively add pairs, triples, quadruples of numbers in one cycle instead of taking up 4 cycles in sequence. Here is an example of what this means in a sample problem. Consider the 4 sums of the numbers 1 and 2, 3 and 4, 5 and 6, and 7 and 8.
On a traditional desktop CPU (scalar), the instructions are handled sequentially.
1. Do 1 + 2 -> Store result somewhere
2. Do 3 + 4 -> Store result somewhere
3. Do 5 + 6 -> Store result somewhere
4. Do 7 + 8 -> Store result somewhere
On a vector/SIMD CPU (superscalar) the instruction is issued once, and executed simultaneously for all operands.
1. Do [1, 3, 5, 7] + [2, 4, 6, 8] -> Store result vector [3, 7, 11, 15] somewhere
You can see how SIMD processors can outdo scalar processors by an order of magnitude when computations are parallel. The situation does change when the task or instructions aren’t parallel. Consider the situation of adding a chain of numbers like, 1 + 2 + 3. A processor has to get the result of 1 + 2, before adding 3 to it and nothing can avoid the fact that this operation will take 2 instructions that cannot occur simultaneously. Just to get your mind a bit deeper into this paradigm, consider 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8. Using the previous logic, you might thing 7 instructions are necessary to accomplish this problem assuming each of the sums have to be calculated before moving forward. However, if you try to SIMD-ize it, you would realize that this problem can be reduced to 3 operations. Allow me to walk you through it:
1. Do [1, 3, 5, 7] + [2, 4, 6, 8] -> Store result in vector [SUM1, SUM2, SUM3, SUM4]
2. Do [SUM1, SUM2, SUM3, SUM4] + [SUM1, SUM2, SUM3, SUM4] -> Store result in vector [SUM1 + SUM2, SUM3 + SUM4, 0, 0].
3. Do [SUM5, SUM6, 0, 0] + [SUM5, SUM6, 0, 0] -> Store result in vector [SUM5 + SUM6, 0, 0, 0].
Careful inspection of the previous solution would show two possible flaws or optimization issues. One issue is that parts of the intermediate sum vectors were not being used in the operations. Those unused parts of the vector could have been used to perform operations useful that it would have had to complete later anyways. It would be a huge investment on time if developers tried to optimizing this. In most cases the return for this optimization may not be that huge to worry about. That type of optimization is something IBM should be looking at to analyze and implement in compilers. The buzz word for this is parallelism – specifically instruction level parallelism
(ILP) with a bit of SIMD mixed in.
The other visible problem (which I know is there but know less about), is in the fact that vector processors naturally store results in a single result vector and generally operate across two vectors with operands aligned in the same vector positions. More advanced SIMD processors are capable of performing operations across operands in the same vector. This would be something like adding the 1st and 3rd elements in the same vector together and putting it in the 1st element of the result vector, while simultaneously adding the 2nd and 4th elements and putting the result in the 2nd element of the result vector. Another advancement of SIMD is performing misaligned operations that add the 1st element in one vector, to the 3rd in another vector, and storing the result in a defined position of a result vector. The in such operations, a control vector needs to be defined to define how the operation occurs and stores the result.
The SPE inside of the Playstation 3 sports a 128*128bit register file (128 registers, at 128bits each), which is a lot of room to also unroll loops to avoid branching. At 128 bits per register, this means that an SPE is able to perform binary operations on 4 operands 32bits wide each. Single precision floating point numbers are 32 bits which also explains why Playstation 3 sports such a high single precision floating point performance. Integers are also typically 32bits, but are much less of a bottleneck of a computing system so integer performance is typically not examined too closely. Double precision floating point numbers are 64-bits long and slows the SIMD processing down an order of magnitude because only 2 operands can fit inside a vector, and it also breaks the SIMD processing ability since no execution unit can work on 2 double precision floating points at the same time. This means an SPE has to perform double precision instructions in a scalar fashion and not superscalar fashion.
– Cell microprocessor wiki.
The important thing to note is that vector processing, and vector processors are synonymous with SIMD architectures. Vectorized code, is best run on a SIMD architecture and general purpose CPUs will perform much worse on these types of tasks. Related topics are streams processing, array processing, and DSP. Those paradigms or hardware architectures are typically close to SIMD.
Digital signal processing (DSP), is one of the areas where vector processors are used. I only bring that up because *you know who* would like to claim that it is the only
practical application for SIMD architectures.
3D graphics are also a huge application for SIMD processing. A vertex/vector(term used interchangeably in 3D graphics) is a 3D position, usually stored with 4 elements. X, Y, Z, and W. I won’t explain the W because I don’t even remember exactly how it’s used, but it is there in 3D graphics. Processing many vertices would be relatively slow on a traditional CPU which would have to individually process each element of the vector instead of processing the whole thing simultaneously. This is why MMX, SSE, and 3DNow! were created for x86 architectures to improve 3D acceleration. Needless to say, GPUs most definitely have many SIMD units (possibly even MIMD), and is why they vastly out perform CPUs in this respect. Operations done on the individual components of a 3D vector are independent which makes the SIMD paradigm an optimal solution to operate on them.
To put this in context, remembering 3D computer gaming between low end and high end computers between 1995 and 2000 might bring up some memories of an issue that existed around then. Although graphics accelerators were out in that time period, some of them didn’t have “Hardware T&L”(transform and lighting). If you recall games that had the option to turn this on or off (assuming you had it in hardware), you would see a huge speed difference if it was done in hardware versus in software. The software version still looked worse after they generally tried to hide the speed difference by using less accurate algorithms/models. If a scaled down equivalent of the Cell existed back then, the software emulation would not have been as slow compared to a GPU.
It is worthwhile to note that “hardware” in the case of 3D graphics generally refers to things done on the GPU, and “software” just means it is running on the CPU. In the end, some piece of hardware does accomplish the task.. Software emulation just indicates that a programmer has written software to complete a task that might be able to be done with less instructions(or a single one) on a particular piece of hardware that is unavailable.
There are image filters algorithms that occur in applications like Adobe Photoshop which are better executed by vector processors. Many simulations that occur on super computers are better suited to run on SPEs. Some of these simulations include cloth simulation, terrain generation, physics, and particle effects. For the purposes of gaming, accuracy on types simulations would be toned down to match the availability of computing resources of the Cell on the PS3.
SPE Design Goals – no cache, such small memory, branch prediction?
The SPEs don’t have a traditional cache under hardware control. It uses 256kb of on-chip, software controlled SRAM. It reeks of the acronym “RAM” but offers latency similar to those of a cache and in fact, some caches are implemented using the exact same hardware – for all practical performances purposes, this memory is a controlled cache. Although to a programmer, since it is not a cache, they have to exercise control over it.
Having this memory under software control places the work on the compiler tools, or programmers to control the flow of memory in and out of the local store. For games programming, this may actually be the better approach if performance is a high priority and needs to be hand tweaked. Traditional caches have the downside of being non-deterministic for access times. If a program tries to access memory that is in discovered in cache(cache-hit), the latency is only around 5-20 cycles and not much time is lost. Such cases is why caches exist and aim to predict what memory is needed ahead of time. If the memory is not discovered in cache(cache-miss), the latency is in the hundreds of cycles as it waits for the data to arrive. A traditional cache varies in performance and is can be very undesirable in games. Programmers can combat this by using specific techniques to allow the cache to better predict what they need, but sometimes due to the complex nature of the code, it might be easier and better if a programmer handled it explicitly.
It is possible for compilers or middleware libraries to implement algorithms that automatically handle the local store. IBM is placing importance on compiler technology so it may be possible that the local story is managed transparaently unless the application wishes to take explicit control of this memory themselves (which higher end games will probably end up doing). If it is accomplished by compilers or libraries, then to a programmer, that local storage is a cache either way since they don’t have to do anything to manage it.
The local storage is the location for both code and data for an SPE. This makes the size seem extremely limited but rest assured that code size is generally small, especially with SIMD architectures where the data size is going to be much larger. Additionally, the SPEs are all connected to other elements at extremely high speeds through the EIB, so the idea is that even though the memory is small, data will be updated very quickly and flow in and out of them. To better handle that, the SPE is also able to dual-issue instructions to an execution pipe, and to a load/store/permute pipe. Basically, this means the SPE can simultaneously perform computations on data while loading new data and moving out processed data.
The SPEs have no branch prediction except for a branch-target buffer(hardware), coupled with numerous branch hint instructions to avoid the penalties of branching through software controlled mechanisms. Just to be clear right here – this information comes from the Cell BE Programming Handbook itself and thus overrides the numerous sources that generally have said “SPEs have no branch prediction hardware.” It’s there, but very limited and is controlled by software and not hardware, similar to how the local storage is controlled by software and is thus not called a “cache” in the traditional sense. In many cases, if a software branch is predicted correctly, the branch penalty can be completely avoided.
How the Cell “Works”:
This could get very detailed if I really wanted to explain every little thing about the inner workings of the Cell. In the interest of time, I will only mention some of the key aspects so you may get a better understanding of what is and isn’t possible on the Cell.
There are 11 major elements connected to the EIB in the Cell. They are 1 PPE, 8 SPEs, 1 FlexIO controller(two buses), and 1 XDR memory controller(split into two buses again). In the setup for the Playstation 3, one SPE is disabled so there are only 10 operational elements on the EIB. When any of these elements needs to send data or commands to another element on the bus, it sends a request to an arbiter that manages the EIB. It decides what ring to put the data on, what direction it moves in, and when to perform the operation. This allows the arbiter to efficiently distribute resources and avoid contention for resources on the EIB. With the exception of the memory controller (connected to XDR RAM), any of the elements on the EIB can initiate requests to read or write data from other elements on the EIB. IBM has actually filed quite a number of patents on how the EIB works alone to make the most efficient use of its bandwidth. The system of bandwidth allocation does breakdown in detail, and in general, I/O related requests are handled with the highest priority since they are slowest.
Each processing element on the Cell has its own memory controller. For the PPE, this controller is transparent since it is the general purpose processor. A load/store instruction executed on the PPE will go through L2 cache and ultimately make changes to the main system memory without further intervention. Underneath the hood though, the memory controller the PPE sets up a DMA request to the arbiter of the EIB to send its data to the XDR memory controller to make the change to system memory. The SPEs are under a different mode of operation. To the SPEs, a load/store instruction works on its local storage only. The SPE has its own memory controller to access system RAM just like the PPE, but it is under software control. This means that programs written for the SPE have to set up manual requests on their own to read or write to the system memory that the PPE primarily uses. The DMA requests put on the EIB are also used to send data or commands to or from another element on the EIB and isn’t strictly limited to communicated with system memory.
This is important to remember because it means that all of the elements on the EIB have equal access to any of the hardware connected to the Cell on the Playstaiton 3. Rendering commands could originate from the PPE or and SPE seeing as they both have to ultimately send commands and/or data to the Flex IO controller which is where the RSX is connected. On the same idea, if any I/O devices connected through FlexIO have a need to read or write from system memory, it can also send messages directly to the XDR memory controller, or send a signal to the PPE to initiate a transfer instead.
The communication system between elements on the Cell processor is highly advanced and planned out. It probably constitutes a huge portion of the research budget for the Cell processor. It allows for extreme performance and flexibility for whoever develops any kind of software for the Cell processor. There are several new patents IBM has submitted that relate to transfers over the EIB and how they are setup alone. After all, as execution gets faster and faster, the general problem is having memory flow keeping up to speed with the demands of the execution hardware.
Note: The section is extremely scaled down and simplified. If you are wondering how something would or should be accomplished on the Cell, you’d have to dive deeper into the problem to figure out which method is the best to use. The messaging system between elements on the EIB is extremely complex and detailed in nature and just couldn’t be explained in a compact form. I recommend reading the Cell BE Handbook or checking out the Power.org reference.
A thread is simply a sequence of execution. It could be a separate program, or two simultaneous series of executions that could happen in parallel. Technically, a single core CPU can handle infinite threads. The issue is that performance drops significantly depending on what the individual tasks are doing and how many threads there are. The PPE is able to support two threads on the same processor without the need to context switch. This communication between two threads on the same core is easier since they are using the exact same memory resources and possibly even share the same program scope. Sharing data between these threads is only an issue of using the same variables in code and keeping memory access synchronized.
On the other hand, the SPEs are more isolated execution cores that have their own primary memory which is their local store. Code and memory running on an SPE can be thought of as an entirely separate application. Sharing data between SPEs and the PPE means putting a request on the EIB, where a receiving thread has already asked for certain data, is expecting it, or knows what to do if it arrives. There are various methods for doing this depending on what needs to be transferred and how both ends are using the data. Needless to say, synchronization between code running on SPEs and the PPE is a harder problem than traditional multithreading considering the hardware involved. It is better to think of the code running on separate SPEs as separate programs rather than threads to treat the synchronization and communication issues appropriately.
That being said, it isn’t a problem that hasn’t been seen before as it is pretty much the same as inter-process communication between programs running on an operating system today. Each application individually thinks it has exclusive access to the hardware. If it becomes aware of other programs running, it has to consider how to send and receive data from the other application too. The only added considerations on the Cell are the hardware implementation details of the various transfer mechanisms to maximize performance even of more than one method works.
Programming Paradigms/Approaches for the Cell:
Honestly, the most important thing to mention here is that the Cell is not bound to any paradigm. Any developer should assess what the Cell hardware offers, and find a paradigm that will either be executed fastest, or sacrifice speed for ease of development and find a solution that’s just easy to implement. That being said, here are some common paradigms that I have seen in various sources.
PPE task management, SPEs task handling:
This seems to be the most logical approach to many sources due to the SPEs being the computational powerhouse inside of the Cell while the PPE is the general purpose core. The keyword is computational which should indicate that the SPEs are good for computing tasks, but not all tasks. Tasks in the general purpose nature would perform better on the PPE since it has a cache and branch prediction hardware – making coding for it much easier without having to deal with those issues heavily. Limiting the PPE to dictating tasks is stupid if the entire task is general purpose in nature. If the PPE can handle it alone, it should do so and not spend time handing off tasks to other elements. However, if the PPE is overloaded with general purpose tasks to accomplish, or has a need to certain computations which the SPEs are better suited for, it should hand it off to an SPE as the gain in doing so will be worthwhile as opposed to being bogged down running multiple jobs that can be divided up more efficiently.
Having the PPE fill a task manager role may also means that all SPEs report or send its data back to the PPE. This has a negative impact on achievable bandwidth as the EIB doesn’t perform as well when massive amounts of data are all going to a single destination element inside the Cell. This might not happen if the task the elements are running talk to other elements including external hardware devices, main memory, or other SPEs.
The solution can also be thought of as the safe solution as programmers already know how to program for one general purpose CPU. Their minds only look to how to utilize extra hardware as needed instead of assuming it is there and molding the solution to require it. The PPE serves as a secure base if any of the SPEs sink – or become to difficult to utilize. Any multiplatform game will probably inevitably take an approach like this. Off the bat, they are only seeing one PowerPC like processor, and some optimizations from the SPEs that can occur.
This solution is basically using the SPEs in sequence to accomplish steps of a task such as decoding audio/video. Basically, an SPE sucks in data continuously, processes it continuously, and spits it out to the next SPE continuously. The chain can utilize any number of SPEs available and necessary to complete the task. This setup is considered largely due to the EIB on the Cell being able to support massive bandwidth, and the fact that the SPEs can be classified as an array of processors.
This setup doesn’t make sense with everything as dependencies may require that data revisit certain stages more than once and not simply pass through once and be done. Sometimes, due to dependencies a certain amount of data has to be received before processing can actually be completed. Lastly, various elements may not produce output that a strict “next” element needs. Some of it may be needed by one element, and more to another. For digital signal processing tasks or encoding/decoding tasks this software approach may be ideal, but I don’t see it being used very often in games.
CPU cooking up a storm before throwing it over the wall:
This honestly was a paradigm I initially thought about independently early into my research on the details of the Cell processor. It’s not really a paradigm, but rather is an approach/thought process. The reason I bring it up is because the Warhawk Lead Designer vaguely mentioned an approach like this in the preview video. The Cell is a really powerful chip and can do a lot of computational work extremely fast as long as it remains inside the processor. The problem is bandwidth to other components outside of the chip(RSX, XDR) bring in communication overheads and those bottlenecks as well. It seems like a less optimal use of computing resources if the PPE on the Cell writes output to memory, and all of the SPEs pick up work from there if the PPE can directly send data to the SPEs, removing the bottleneck of them all sharing the 25.6GB/s bandwidth to system memory. It makes more sense to let the Cell load and process the game objects as much as possible to completion, before handing it off to the RSX or writing back to system memory where more work may occur later. Tossing the data back and forth between the Cell and memory or the RSX is a slow operation that loses efficiency.
This approach does make sense, but by no means is a restriction if a game has serious uses and demands for a tight relationship between the RSX or with other off chip components and Cell throughout the game loop.
Where does the operating system go?
Sony’s official operating system for the Playstation 3 has been reported to continually reserve an SPE. The purpose of this is to probably minimize the impact on the rest of the system’s performance as programmers will probably be unhappy if performance on the PPE was affected by something out of their controller. Using an SPE also allows for some hardware security features to be enabled to help Sony with some DRM protection.
Any other system that uses the Cell processor would probably seek to use the PPE as the center of operating system functionality.
The RSX Graphics Chip:
The RSX specs are largely undefined and unknown at this point, and I will refrain from even analyzing it too deeply if it comes to the clearly unknown aspects. The only information available has been around since E3 2005 and is likely to have changed since then. Various statements have been made after this point that compare the RSX to other graphics chips nVidia has made. Some press sources have used these statements to analyze the RSX as if they actually knew what it was or in a speculative manner, but readers should not forget that they simply do not know for sure. I have read a lot of those sources and am throwing out specific execution speed numbers and am focusing on the more likely final aspects of the RSX specs.
The only thing that can be said with a pretty high degree of certainty is that the RSX will have 256MB of GDDR3 video memory, access to the Cell’s 256MB XDR memory, and a fixed function shader pipeline – meaning dedicated vertex shader pipelines and dedicated pixel shader pipelines, as opposed to a unified shader architecture that the Xenos on the Xbox360 has. The RSX will also be connected to the Cell(and XDR memory) through the FlexIO interface.
Due to the nature of the SPEs on the Cell, there is quite an overlap in function concerning vertex processors on the RSX. It would be up to the programmer to decide where to accomplish those tasks depending on the flexibility they need, and what resources they have available to them. The Cell could also handle some post processing(pixel/frame buffer) effects if the Cell is better suited to run it than the RSX. This will probably not happen due to a frame buffer not being available until late in the rendering pipeline only for it to be taken out of the pipeline and put back in again. The Cell would have to significantly outperform the RSX for something like that to be beneficial.
What can the PS3 do for gaming? What can’t it do?
I challenge you to answer that question mostly by yourself mostly, but here is my view on it:
To me, it seems as if the Cell is a simulation monster. “Supercomputer on a chip” isn’t entirely far from the truth. If the developers fall into a computational mindset for accomplishing tasks on the Playstation 3, the Cell’s advantage with SIMD and parallelism will be utilized and it could bring some truly impressive physics, graphics, and sound to the industry. These things will not be done to the level of accuracy as supercomputers since they are fully dedicated to usually one of those tasks at a time, but the accuracy would be reduced to a realistic enough level for the purposes of game play visuals or mechanics. Basic/static AI routines are probably better done on general purpose processors, but I can see certain routines being developed with a computational approach in mind. I wouldn’t expect any “oh sh*z” from Playstation 3 game AI anytime soon though unless major advancements are made in the field entirely.
Most fun game play elements that aren’t technically deep and are a breeze to run on processors today. Consider that fun games with many varying elements of game play have been available since the 16-bit era or earlier and have only expanded due to features related to external devices like networking and controllers. Don’t expect games to necessary be “more fun” in the game play aspect just because the hardware is more powerful.
Powerful is also by no means a linear term for CPUs or processing in general. There are different dimensions of power. For general purpose code, Intel and AMD processors are still considerably more powerful than anything else out there including the Cell. Comparisons that propose that the Cell may be able to outperform those processors are generally considering where the Cell would pick up slack where a general purpose processor would lag in. Or they are considering fine tuned code to run the general purpose task on the Cell processor. General purpose processing is somewhat of an all-inclusive axis of every type of processing that can be done, and anything the Cell does better, technically does raise it on the general axis too. Unless tools make general purpose programming for the SPEs acceptable, don’t expect Cell to really step in and take some kind of lead in general purpose computing. But there is room for the PS3 to run considerable general purpose applications efficiently if they are coded properly.
What can and can’t be upgraded?
Honestly, many people do not realize how close the lower price point can come up to the more expensive version of the Playstation 3. In short, HDMI is the only real functionality that you’d completely miss if you didn’t get the $600 version. There is reasonable assurance that for the next 4-5 years, ICT wont be turned on which would allow 1080p signals through component video during Blu-Ray movie playback which the $500 dollar version supports. As for the other features the $500 version lacks:
USB Compact Flash/SD/Memory Stick Pro Duo readers are sold in computer stores and online shops like newegg.com. They cost anywhere from 10-50 bucks depending on how many formats you want to be able to read. Will the Playstation 3 work with them? There’s a very high chance the PS3 implements standard USB protocols that will allow USB hubs/devices to be connected transparently. The difference is, the memory card device wouldn’t be distinguishable from the viewpoint of the PS3 if it was connected through the USB interface as opposed to the pre-installed version – i.e. it wouldn’t know if it was an SD card, Memory Stick Pro Duo or Compact Flash drive. It would just see “USB Storage Device.”
WiFi can be made up easily by buying a wireless router with Ethernet ports on it. Simply connect the PS3 to the Ethernet and any other devices on the router’s wireless network can be talked to. This would not be transparent to the Playstation 3 due to the more expensive version having two separate network interfaces as opposed to one. If a feature was implemented that only looks for wireless devices to talk to through the wireless network interface attached to the $600 Playstation 3, they wouldn’t find it and never attempt to see if the same device exists on the network the Ethernet network card is connected to. Although if a feature was implemented such that it attempted to communicate with devices through any available network interface, it would find the Ethernet NIC on the $500 PS3, and attempt to search for devices on the network – wireless or not – through that interface. If a wireless device is on the network, it will find it, and talk to it just the same. It’s kind of up in the air if Sony developers will be smart enough to realize this and implement the feature in a safe and universal manner. Sony has also said that purchasing a wireless network interface would allow the PS3 to perform wireless communication too. Doing this would require more work on Sony’s end as they would have to implement drivers for USB wireless network cards, and I don’t know how that would affect a game developer detecting if it exists.
Are developers in for a nightmare?
I would LOL if someone seriously asked me this, but it is a reasonable question that I’m sure people have on their minds.
Some developers would piss in their pants upon looking at the Cell and realizing what they have to do to get it to accomplish certain things that it is good at. The amount of mathematical, scientific, and computer science talent and knowledge needed to tackle the whole setup of the Playstation 3 is astounding. While there are many things the Cell naturally excels at, some of these problems sets aren’t as obvious and it requires a deeper understanding of the base problem area which may be sound, physics, graphics, and AI just to understand the many ways of possible solving the problem. Then in addition to understand the problem better, the developer must figure out the most efficient way to implement it on the Playstation 3 and have the skills to actually write it in code. This is a very high bar for games programmer.
Other developers wouldn’t piss in their pants and would be confused at what SIMD actually means for them. They might be too stuck in their old ways to see how SIMD processors can drastically increase game performance and only consider the general purpose abilities of the SPEs scoffing at them for not having a cache. If they think want this type of computing power, they would think the PS3 is probably a piece of crap to program for and clearly measure Xbox360 to be superior or closely matched with its 3 cores and much easier to use developement tools.
Undoubtedly, there are developers who don’t already have the knowledge to implement the efficient SIMD solutions to games processing problems. Thankfully the nature of the Playstation 2 Emotion Engine has already been related to SIMD processing as the V0 and V1 units were vector processors which developers had to make good use of to push the system to its limits – and they did. Unlike the days of Playstation 2, they now have an extreme amount of SIMD processing power coming out of the SPEs so there is far more developers can do on the CPU. They could actually render 3D worlds entirely in real time on the Cell alone if they wanted to ignore the RSX. That being said, they wouldn’t do this due to not being able to show much else for the game, and it would waste an entire component in the PS3.
Look for the development studios that pushed the PS2 to its limits to do similar with the Playstation 3. Multiplatform titles are not going to do much justice for the Playstation 3’s hardware as the transition between SIMD and non-SIMD processing presents a performance gap and they don’t want to alienate platforms that don’t have large amounts of that processing ability.
The important thing with technology advancements is certain steps at taken at the right time. Ten years ago, a processor like the Cell would fall flat on its face due to complexity and the games industry not supporting games with costs as high as they are now. But it isn’t ten years ago and game budgets are high. Some of the budgets still aren’t enough to support Playstation 3. Others are. As time goes on and the knowledge is more widespread, developing for the Playstation 3 will be cheaper as more people will have experience working with it. Part of Sony’s effort in combating this is having widely available information on the Cell. You don’t need a developer’s kit to understand in detail what is in the Cell processor, and how to program for it.
A large part of cutting Playstaiton 3 developer costs should be hiring a good lead designer or architect that understands the Playstation 3 extremely well to make sure developers aren’t trying to solve a problem in a difficult or inefficient manner on the Playstation 3. Doing so incurs a huge penalty as it takes more time that isn’t going where it should. Right now, developers aren’t used to using driving on the type road that Sony has just paved. Driving a car meant for the old type of road is something developers do because they know how to use those cars, but they get much worse mileage when put on the new road. If developers want to use the Playstation 3 effectively they need to invest in a new car, and learn how to drive it to get further on Sony’s road and ahead of the competition.
The Future for Playstation 3:
Playstation 3 is a console built with the future in mind. Playstation 1 had a very long life, and Playstation 2 is still going strong. Considering the length of time people will continue to play these consoles, it is important that they are not outdone by future advancements. The best a console can do is look to the horizon to see what’s coming, and that is what Sony is doing with the Playstation 3.
Blu-ray may not be the next generation movie format. If it is, then all the more reason to buy one. If not, the vast space is still there for games should something come up that does motivate the increase the size of games. The single most obvious function that will probably use the extra space is high resolution video. Other uses may be less obvious at this point, but could arise in the future. It is important to remember that even if blu-ray doesn’t win a movie format war, it still offers its capacity to games for the Playstation 3.
HDMI is future proof in the sense that even though the image constraint token(ICT) may not be used until 2010, if it ever comes into play the $600 Playstation 3 can still support it. The fact that it will support the newest HDMI 1.3 spec that HDTVs don’t even support yet also shows that once these things become mainstream, Playstation 3 will be right there to utilize it.
Gigabit Ethernet may not be commonplace today, but in 2-5 years, I’m positive that gigabit Ethernet home routers will be down to the price of 100mbps routers today. Although the internet will likely not be moving that fast because of ISP limitations, at least your internal home networks will be and local devices could take advantage of this bandwidth for something like HD video streaming between devices. Sony has already looked to this direction and has provided full IPv6 support(IP protocol, version 6 – version 4 is in widespread use today), to allow for full use and integration of DNLA.