Speed (mhz) != Performance

Locked
User avatar
dwchang
Sad Boy on Site
Joined: Mon Mar 04, 2002 12:22 am
Location: Madison, WI
Contact:
Org Profile

Post by dwchang » Thu Nov 13, 2003 4:49 pm

dj-ohki wrote:then the over 1 meg 'effective' cache listing on their site is pretty much a load of crap. or a really streched spin.
My guess is that it's a typo or someone being lazy (probably copied and pasted the FX chart). The Athlon 64 has 256k L2 and 128 (64/64) of L1. I know that. Oh and the FX *does* have 1MB of effective cache since 1 MB of L2 and 128 of L1 (= 1128 kb). That's the reason for my suspicion of a typo since the FX equates correctly.
dj-ohki wrote:true, but since ath64 is NUMA and not SMP, you wouldnt have to worry about cache coherency and all that rot. *shrug* oh well, there's always operterion.
Why would cache coherency have to do with a single CPU. The Athlon 64 isn't a multi-processor die period (whether SMP or whatever). There wouldn't be an cache coherency problems since it is all internal.

Also why would the HT link have anything to do with the cache coherency if it's not on an MP processor. Effectively for an Athlon 64, it's just a really fast bus on the northbridge. I imagine that's why there is only one HT link...since it's not necessary to have more. But don't quote me on that...I didn't design the thing..just making logical guesses.
dj-ohki wrote:i though we were having 2 distinct converstations, one about ath64/ath64fx (consumer/prosumer) and one about the opterion, which is amd's forray into the high end server market. 64 bit NUMA architecture with 8 way glueless MP, thats pretty high end.

and no, consumer level programs have no need for 8 meg l2s. it would be nice to be able to fit an entire filter chain + video frame in cache, but again, not needed.

anyway, im hoping to get a hold of a dualie sledgehammer in the next few months, once the price goes down. either that, or a dualie P4EE (quad dispatch engines is pretty sweet, and before you jump on that, i know it only helps poorly written code) all depends on how things sit in the future.
No, I thought we were only talking about Desktop :P. If you talk about servers well...I imagine an 8-way Opteron can maul a Power4 or anything with an 8 MB cache. 8-way system vs. 8 MB L2...that's an easy choice in terms of efficiency and performance ne?

As you have already stated though, an 8 MB cache is ridiculous for consumers and as for high-end servers...I have already presented an alternative in the 2, 4 and 8-way systems. That obviously would be more cost efficient then something that is very difficult to fabricate and in the end less processing power. I imagine fabricating 8 chips with a smaller cache is *a lot* easier than fabricating one with 8 MB cache. Especially when our prevoius chips have had nothing above 512, the jump to 8 MB would be ridiculous both on our end (for product verification/fabrication) and for motherboard manufacturers and so on. It probably would require an entire revamping of the infratstructure....which is ultimately stupid.

The p4EE can go dual? I didn't know that. I mean I know all it is is a Xeon (which Intel denies ha), so I guess it could since the Xeon can go Dual. At the same time, benchmarks clearly show the Opteron mauling the P4EE...oh and the P4EE doesn't have 64-bit capability. You have to *laugh* Itanium for that...haha.
-Daniel
Newest Video: Through the Years and Far Away aka Sad Girl in Space

User avatar
dj-ohki
Joined: Tue Apr 17, 2001 12:49 pm
Contact:
Org Profile

Post by dj-ohki » Fri Nov 14, 2003 3:36 pm

dwchang wrote:
dj-ohki wrote:then the over 1 meg 'effective' cache listing on their site is pretty much a load of crap. or a really streched spin.
My guess is that it's a typo or someone being lazy (probably copied and pasted the FX chart). The Athlon 64 has 256k L2 and 128 (64/64) of L1. I know that. Oh and the FX *does* have 1MB of effective cache since 1 MB of L2 and 128 of L1 (= 1128 kb). That's the reason for my suspicion of a typo since the FX equates correctly.
the ath64 pic was up before the FX was announced. so dunno. we'll end this part cause it is most likely a typo.
dj-ohki wrote:true, but since ath64 is NUMA and not SMP, you wouldnt have to worry about cache coherency and all that rot. *shrug* oh well, there's always operterion.
Why would cache coherency have to do with a single CPU. The Athlon 64 isn't a multi-processor die period (whether SMP or whatever). There wouldn't be an cache coherency problems since it is all internal.
im talking about NUMA in a machine. and by the design of the whole ath64 arch, being a numa system, it doesnt have to deal with cache cohearncy period when in a MP environment.
Also why would the HT link have anything to do with the cache coherency if it's not on an MP processor. Effectively for an Athlon 64, it's just a really fast bus on the northbridge. I imagine that's why there is only one HT link...since it's not necessary to have more. But don't quote me on that...I didn't design the thing..just making logical guesses.
^. see above. when the opterion is in a MP environment, the HT link is used for interprocessor communication. since there are 3 HT links, it supports up to 8 way glueless (no support needed from the host chipset). a ath64 with 2 HT links would be able to go dualie with no added support needed from the chipset. looks like the ath64 is gonna be a strictly uniprocessor solution unless something changes. this is suprising to me for the FX, cause of its prosumer nature. a lot of prosumers prefer MP systems, cause of all the added benifits of MP, which is not possible with the ath64 at this time, but is possible with the P4EE.
dj-ohki wrote:i though we were having 2 distinct converstations, one about ath64/ath64fx (consumer/prosumer) and one about the opterion, which is amd's forray into the high end server market. 64 bit NUMA architecture with 8 way glueless MP, thats pretty high end.

and no, consumer level programs have no need for 8 meg l2s. it would be nice to be able to fit an entire filter chain + video frame in cache, but again, not needed.

anyway, im hoping to get a hold of a dualie sledgehammer in the next few months, once the price goes down. either that, or a dualie P4EE (quad dispatch engines is pretty sweet, and before you jump on that, i know it only helps poorly written code) all depends on how things sit in the future.
No, I thought we were only talking about Desktop :P. If you talk about servers well...I imagine an 8-way Opteron can maul a Power4 or anything with an 8 MB cache. 8-way system vs. 8 MB L2...that's an easy choice in terms of efficiency and performance ne?
but can it maul a 8 way power4. or a 4 way power4? dont know what the price/performace on that chip is ATM.
As you have already stated though, an 8 MB cache is ridiculous for consumers and as for high-end servers...I have already presented an alternative in the 2, 4 and 8-way systems. That obviously would be more cost efficient then something that is very difficult to fabricate and in the end less processing power. I imagine fabricating 8 chips with a smaller cache is *a lot* easier than fabricating one with 8 MB cache. Especially when our prevoius chips have had nothing above 512, the jump to 8 MB would be ridiculous both on our end (for product verification/fabrication) and for motherboard manufacturers and so on. It probably would require an entire revamping of the infratstructure....which is ultimately stupid.
define high end. for most servers, yea, 1meg is perfect. some could use 2 meg, but thats picking nits. there are some specialized cases where 8 meg would be VERY useful, but they are far between.


but yea, the infrastrutre change required to implement that would be staggering, and thus pointless.
The p4EE can go dual? I didn't know that. I mean I know all it is is a Xeon (which Intel denies ha), so I guess it could since the Xeon can go Dual. At the same time, benchmarks clearly show the Opteron mauling the P4EE...oh and the P4EE doesn't have 64-bit capability. You have to *laugh* Itanium for that...haha.
true, whats the price points on both though?

and this whole 'OMG 64 bits! it makes everything better' attitude i keep seeing all over the internet irks me. since the ath64 is nice in that 32 bit code executes at the same speed as 64 bit code, the whole 64 bit part is pointless at this stage in time. how many people do you know work with apps that use over 4 gig of memory?

User avatar
Savia
Chocolate teapot
Joined: Wed Apr 02, 2003 3:40 pm
Location: Reading, UK
Org Profile

Post by Savia » Fri Nov 14, 2003 3:37 pm

:shock:

That's the last time I wonder into Video Hardware threads.
"A creator needs only one enthusiast to justify him." - Man Ray
"Restrictions breed creativity." - Mark Rosewater

A Freudian slip is where you say one thing, but mean your mother.

User avatar
dwchang
Sad Boy on Site
Joined: Mon Mar 04, 2002 12:22 am
Location: Madison, WI
Contact:
Org Profile

Post by dwchang » Fri Nov 14, 2003 6:46 pm

dj-ohki wrote:^. see above. when the opterion is in a MP environment, the HT link is used for interprocessor communication. since there are 3 HT links, it supports up to 8 way glueless (no support needed from the host chipset). a ath64 with 2 HT links would be able to go dualie with no added support needed from the chipset. looks like the ath64 is gonna be a strictly uniprocessor solution unless something changes. this is suprising to me for the FX, cause of its prosumer nature. a lot of prosumers prefer MP systems, cause of all the added benifits of MP, which is not possible with the ath64 at this time, but is possible with the P4EE.
No, I understand the 3 HT links think (2^3 = 8) and all that, but again, the FX and 64 are 1 HT link which would still provide an MP solution (2 processor...2^1 = 2). At the same time, currently they are single processor as you have concluded. And yes, I do agree that MP has a lot of benefits for prosumer...I should know since I run a dual Athlon at home :).

I might be mistaken on the HT part (since I'm not *that* familiar with it), but even still....I imagine that a Motherboard/chipset could handle the MP part...although slower. It's quite possible chipsets wanted it that way so they could make $$$ *shrug*.
dj-ohki wrote:but can it maul a 8 way power4. or a 4 way power4? dont know what the price/performace on that chip is ATM.
From what I've heard, I'd say yes. The Power4 is pretty outdated (well not *that* badly)...that's the reason IBM has Power5. And like you said...price. Opterons (btw you keep saying Opterions...there's no i in it) are much cheaper. I hear you can get an 8-way for pretty darn cheap. Hell I think Phade is looking into a two/four-way for the .org. He messaged me about it :)
dj-ohki wrote:define high end. for most servers, yea, 1meg is perfect. some could use 2 meg, but thats picking nits. there are some specialized cases where 8 meg would be VERY useful, but they are far between.
but yea, the infrastrutre change required to implement that would be staggering, and thus pointless.
I'd say high-end are major business. They have major loads to support. If it's any vote of confidence, some major banks and business have already bought 4-way or higher Opteron servers.

And yeah the infrastructure change alone would make it stupid...and again..yields (which are VERY important in the PC industry).

dj-ohki wrote:true, whats the price points on both though?
Well considering the P4EE isn't even out yet (even thought it was paper launched)...it's a moot point. HOWEVER, I hear it's 990 or so. Again, it's just a P4 Xeon, so a bit overpriced for something that they've had for years.
dj-ohki wrote:and this whole 'OMG 64 bits! it makes everything better' attitude i keep seeing all over the internet irks me. since the ath64 is nice in that 32 bit code executes at the same speed as 64 bit code, the whole 64 bit part is pointless at this stage in time. how many people do you know work with apps that use over 4 gig of memory?
I agree to a degree. Sure 64-bit hasn't caught on yet (gotta recompile things, but it is starting..like I said in another thread...Windows XP 64 is already in beta and longhorn after that)...HOWEVER the big selling point is the future. Businesses can buy this system and have great 32-bit capability now and when *they* feel like it, they can make the migration at their own pace. The major selling point is customer centric...that they can choose to migrate when they feel like it.

It's quite an awesome deal...get two generations of processors and change when you feel like it. And it's also comprable to the prices of *just* 32-bit processors from Intel. Seems like a good deal to me :).

As for over 4 gigs of memory..I know plenty of people even on this board who would want over 4 gigs of memory. Scary ain't it? :)
-Daniel
Newest Video: Through the Years and Far Away aka Sad Girl in Space

User avatar
dj-ohki
Joined: Tue Apr 17, 2001 12:49 pm
Contact:
Org Profile

Post by dj-ohki » Sat Nov 15, 2003 3:46 pm

dwchang wrote:No, I understand the 3 HT links think (2^3 = 8) and all that, but again, the FX and 64 are 1 HT link which would still provide an MP solution (2 processor...2^1 = 2). At the same time, currently they are single processor as you have concluded. And yes, I do agree that MP has a lot of benefits for prosumer...I should know since I run a dual Athlon at home :).

I might be mistaken on the HT part (since I'm not *that* familiar with it), but even still....I imagine that a Motherboard/chipset could handle the MP part...although slower. It's quite possible chipsets wanted it that way so they could make $$$ *shrug*.
this is the only part of the post im gonna reply to, cause i agree with the rest of it.

a single HT cannot support a glueless MP soultion. the single HT link is from core to chipset. you require 2 HT links to do 2 way MP. C1 <-> HT <-> C2 <-> chipset. with 3, you can do up to 8 way with a hop count of 3.

and after half an hour digging, cant find the diagram. basicly you've got a grid, 2 x 4. col 2, row 3 chip has a HT that goes to the chipset, and the rest are connected to their row neighbor and column neighbor via a HT link, except the one connected to the chipset, which is missing a link to its row neighbor. or somethingl ike that. there's a diagram floating on the internet somewhere..

sure you can do MP with a single HT link, you're just gonna need a MP chipset, which none of the chipset makers are producing, nor have plans that i know to do so.

User avatar
dwchang
Sad Boy on Site
Joined: Mon Mar 04, 2002 12:22 am
Location: Madison, WI
Contact:
Org Profile

Post by dwchang » Sun Nov 16, 2003 7:18 pm

dj-ohki wrote:
dwchang wrote:No, I understand the 3 HT links think (2^3 = 8) and all that, but again, the FX and 64 are 1 HT link which would still provide an MP solution (2 processor...2^1 = 2). At the same time, currently they are single processor as you have concluded. And yes, I do agree that MP has a lot of benefits for prosumer...I should know since I run a dual Athlon at home :).

I might be mistaken on the HT part (since I'm not *that* familiar with it), but even still....I imagine that a Motherboard/chipset could handle the MP part...although slower. It's quite possible chipsets wanted it that way so they could make $$$ *shrug*.
this is the only part of the post im gonna reply to, cause i agree with the rest of it.

a single HT cannot support a glueless MP soultion. the single HT link is from core to chipset. you require 2 HT links to do 2 way MP. C1 <-> HT <-> C2 <-> chipset. with 3, you can do up to 8 way with a hop count of 3.

and after half an hour digging, cant find the diagram. basicly you've got a grid, 2 x 4. col 2, row 3 chip has a HT that goes to the chipset, and the rest are connected to their row neighbor and column neighbor via a HT link, except the one connected to the chipset, which is missing a link to its row neighbor. or somethingl ike that. there's a diagram floating on the internet somewhere..

sure you can do MP with a single HT link, you're just gonna need a MP chipset, which none of the chipset makers are producing, nor have plans that i know to do so.
Yeah, but with 3 HT links, how could you achieve 8-way? I imagine you need at least 3 links since 2^3 = 8. Again, I'm not *that* familiar with HT, but if you're right, you're right.

In any case, I still stand by my statement that I imagine it will be on the chipset/motherboard end. I mean I'm sure they would *love* to be able to do that and charge more. It's also easier on our end *shrug*. And even though people haven't said anything doesn't mean they're not doing it. I imagine Tyan is doing something (they seem to do well with the MP line). OR you could just go buy an opteron and then you'll be fine (although a lot more expensive).
-Daniel
Newest Video: Through the Years and Far Away aka Sad Girl in Space

User avatar
Quu
Joined: Tue Dec 26, 2000 1:20 pm
Location: Atlanta, GA
Contact:
Org Profile

Post by Quu » Mon Nov 17, 2003 10:50 am

i know this one ^_^

with three HT links you are forgetting the memory interface

with three HT links you can do basically infinite glueless SMP... imagine a ladder

two base Opteron cpues are at the bottom... the south most Ht link on one goes to a chipset... and the southmost on the other also goes to a chipset (IE one a direct link to a HT aware SCSI controller ^_^)

then the connect to each other... and above them... like a ladder...

its somethign we investigate here at work for scalability
<table>
<tr><td></td><td></td><td>PCI-X</td><td></td><td>PCI-Exprs</td><td></td><td></td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td>Mem</td><td>-</td><td align="center">CPU</td><td align="center">----</td><td align="center">CPU</td><td>-</td><td>Mem</td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td>Mem</td><td>-</td><td align="center">CPU</td><td align="center">----</td><td align="center">CPU</td><td>-</td><td>Mem</td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td>Mem</td><td>-</td><td align="center">CPU</td><td align="center">----</td><td align="center">CPU</td><td>-</td><td>Mem</td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td>Mem</td><td>-</td><td align="center">CPU</td><td align="center">----</td><td align="center">CPU</td><td>-</td><td>Mem</td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td>Mem</td><td>-</td><td align="center">CPU</td><td align="center">----</td><td align="center">CPU</td><td>-</td><td>Mem</td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td></td><td></td><td align="center">SCSI</td><td></td><td>Chipset</td><td>-</td><td>Video</td></tr>
<tr><td></td><td></td><td align="center">|</td><td></td><td align="center">|</td><td></td><td></td></tr>
<tr><td></td><td></td><td>Hard Drive</td><td></td><td>South Brdg</td><td></td><td></td></tr>
</table>
Lead me not to temptation, for I have deadlines

User avatar
dwchang
Sad Boy on Site
Joined: Mon Mar 04, 2002 12:22 am
Location: Madison, WI
Contact:
Org Profile

Post by dwchang » Mon Nov 17, 2003 2:50 pm

Whoa Quu! That drawing makes a lot of sense. Ok I figured *all* the processors integrated to each other instead of indirectly linked. I guess that would be *a lot* more difficult, but it'd kick ass ;).

Either way, I still imagine Tyan or some other company will come up with an MP solution for non-Opterons. I mean the market is obviously there. And let's not forget that MP did take awhile to catch on with Athlons and even PIII's.

Oh and we just announced a deal with Sun. w00t!
-Daniel
Newest Video: Through the Years and Far Away aka Sad Girl in Space

User avatar
Quu
Joined: Tue Dec 26, 2000 1:20 pm
Location: Atlanta, GA
Contact:
Org Profile

Post by Quu » Mon Nov 17, 2003 4:01 pm

well... with the hyper transport controllers being inter connected on the die... when a Ht packet bound for a CPU further long in the chain it simply gets passed on, with out the local CPU needing to interfear... same with memory requests from a foriegn cpu... it comes across the HT buss, and is processed by the memory controller... with out the local cpu on that bus hving to handle it... really makes it scary exapandable....

if you wanted to make an athelon64 of FX multi processor than you could make a northbridage with two Ht links... and it becomes the "middle point" for th two CPUs... doing pass through like normal...

the problem is why woudl they.... the Operton 2XX series is speced and priced for the dual CPU workstation market.... its the 8XX thats for quads and higher...

I believe that the 2XX series only has two HT links enabled on the chip.... and the 1XX series opteron has only onee enabled...

you can't put a 1XX opteron chip in a multi processor motherboard... at least i don't think...

i think
Lead me not to temptation, for I have deadlines

Locked

Return to “Hardware Discussion”