open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: TQ Level Up
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: first : previous : 1 2 3 [4] 5 6 7 8 9 ... : last (11)

Author Topic

CCP Yokai

Posted - 2010.06.16 18:39:00 - [91]
 

Tobin Shalim - I know that there have been rankings in the past comparing Eve's cluster to other supercomputers/systems in the world. How does the new hardware compare/improve the listing?

Answer - Not even close to the top 500... the lowest one on the list this year was 5136 cores... while cores alone do not quantify performance... 280 vs. 5136 is still pretty far away from that group.


Andrea Griffin - I thought the TQ cluster had many more machines than this!

Answer - It has changed over the years and at one point had alot more servers. but with multi core the quantity of servers needed actually goes down even through player numbers and PCU goes up. Love progress.


Qoi - Can i visit you and fondle those blades? Just once.

Answer - No, but how about some video and pics? It's on the way.


Dakisha - How come, given the seriousness of lag in 0.0 these days - that you've not upgraded to modern cpu's?

Answer - The list today is well... from today. We are looking for good reasons to make changes, but they have to make significant impact. Since peak capacity on a node is so important for fights, 3.33GHz even on an older generation is still very high end. Give me some 10GHz CPU's and I'd be all over it.


Lord XSiV - fire your architect/systems integrator and spend the money on a qualified one.

Answer - I'm the new guy... give me a few days ;) We do have Brocade, and for Cisco it is not as simple as better or worse... finding the right solution... WS6748-ge-tx with DFC3's makes a significant impact on side to side switching. In anycase... as mentioned, only so much can be done on the internal network to help. Clock speed is still the big issue today.


G'Kar5
Gallente
Intaki Research and Manufacturing
Distorted Percepts
Posted - 2010.06.16 18:41:00 - [92]
 

Originally by: Lord XSiV
Originally by: CCP Yokai
Edited by: CCP Yokai on 16/06/2010 14:26:51
"Did you guys consider the Cisco UCS for the blade servers?"

We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.




Well this partially explains the reason for the lag. Cisco is low to mid tier equipment at best - if you want to push some serious traffic then you need to move up to some real gear from Juniper or even Brocade. Heck you could even use a Fortinet solution and get superior performance more cost effectively and add in security....



What are you smoking? Cisco is some of the best networking gear out there. Granted, Juniper is in the same class but nothing else is even close. You could also maybe use Brocade(Foundary) or Extreme for low end switching. I take it you work for Fortinet?

While the 7600 is not the greatest platform out there (being replaced by Juniper MX and Cisco ASR9K), its still one of the best performing platforms for the price. It is WAY overkill for what CCP needs. Most of the lag is either propagation delay (good luck going faster than 2/3 speed of light) or server lag anyway.

RatKnight1
Gallente
Mahdi Followers
Posted - 2010.06.16 18:47:00 - [93]
 

Didn't think I recognized that name, lol.

I am a networking major, and I will say that my one issue with Cisco is some of their outdated interfaces. I had to learn how to use the Cisco console... not too much fun, gimme DOS any day, lol.

It is nice to see that they have someone who knows what they are talking about handling this.

Now, I am setting gallente industrial to train to 5, which should, unless you guys drop the servers while you are moving them, be able to handle any amount of downtime, lol.

Good Luck!

Musical Fist
Gallente
NAP Coalition
Posted - 2010.06.16 18:49:00 - [94]
 

Will this affect the lag issue at all?

CrazzyElk
Big Shadows
Initiative Mercenaries
Posted - 2010.06.16 18:50:00 - [95]
 

Is there an approximate guesstimate on the ETA of the next blog. As in during this summer or more during the fall or even longer?

Spurty
Caldari
V0LTA
VOLTA Corp
Posted - 2010.06.16 18:55:00 - [96]
 

Something funny about a character named "Fallout" talking about a hardware upgrade.

I suspect I'll be at about day 15 into one of too many 45day skills I'm working on during 2010.

Hawk TT
Caldari
Bulgarian Experienced Crackers
Posted - 2010.06.16 19:46:00 - [97]
 

Edited by: Hawk TT on 16/06/2010 19:51:53
I am very happy about CCP's plans to unveil more info about the EVE Cluster ;-) Thanks CCP Fallout & CCP Yokai

I have several questions:

1. SOL Nodes
Currently all SOL nodes use IBM HS21 blades with 2 x Intel Xeon 5260 3.3GHz Wolfdale CPUs (45nm, Dual-Core part w/ the same microarchitecture as Xeon 5400 series).
The HS21 blades are limited to 4 memory sockets (8 with HS21XM), so 32GB of DDR2 RAM is the maximum with 8GB DIMMs and it is EXPENSIVE (4x8GB).
You've mentioned in this Dev Blog that you are going to test Nehalem (a.k.a. Xeon 5500 series)

Ok, have you tested HS22 w/ Intel Xeon 5667 3.06/3.43GHz? It's a 4 core Westmere 32nm CPU (one generation ahead of Nehalem)?
The fastest CPU bin IBM would fit in a blade (95W TDP limit) is Xeon X5667 @ 3.06GHz, but it has turbo-boost, so one of the cores could go up to 3.43GHz.
If you combine this with the QPI benefits and other microarchitecture enhancements, you would have much better performance.
Let's see Intel X5600 vs X5200/X5400 series & HS22 vs HS21 blades:
- 200-300% memory bandwidth improvement (QPI vs. FSB)
- 40% memory latency reduction (QPI vs. FSB)
- 200% cache memory size w/ completely different cache hierarchy and much lower latency
- 50% more RAM capacity w/ the fastest cheap 4GB DDR3 1333MHz modules (12 x 4GB vs 4 x 8GB)

Of course it depends if SOL nodes are memory-performance-sensitive and also if some 50% extra memory capacity would benefit their scalablility, i.e. Jita & Fleet Battles.
Apart from the QPI benefits the IPC performance per core of Westmere vs. Wolfdale are not so significant, though there are some specific cases with unaligned cache access etc. Anyway, in most real-world benchmarks 5500/5600 series show 40-50% single-threaded performance increase @ same clock speed!

Correct me if I am wrong, but SOL nodes should benefit from Intel QPI, larger & faster caches, more memory, more memory channels & bandwidth, Intel Turbo-Boost feature for single-threaded apps?

2. Database Nodes
Nehalem / Westmere could speed up your SQL servers by 200-300%...No brainer here ;-)

3. Mechanical HDDs in Blades?
This is strange...Why not "Boot from SAN"? No, I am not talking about FC HBAs, iSCSI could do it for you without expensive HBAs for each blade...
Replacing faulty HDDs is a hassle...

4. Blade I/O modules
What I/O modules do you use in the IBM BladeCenters: Pass-through modules or Blade Switches?
Pass-through modules = more cables & clutter, but less latency and better features

Last, but not least, the consolidation of your racks in a (what seems to be) dedicated "Cold Aisle Containment" means you could shortned the cable lentghts...
Any news on the Infiniband HPC stuff? We all know that the bottleneck is in the single-OS-threaded server code, but still... ;-)

Best regards,
Hawk

P.S. Keep up the amazing work!

Vahz Rex
Posted - 2010.06.16 19:50:00 - [98]
 

Yokai, you briefly mention management tools in the dev blog, any chance you will cover this more in future blogs?

Curious if you use tools from Microsoft such as OpsMgr, or tools from IBM/HP/Cisco.

Great post btw!, as an IT admin it was appreciated very much. =)

Jason Edwards
Internet Tough Guy
Spreadsheets Online
Posted - 2010.06.16 19:53:00 - [99]
 

Quote:
Step three: Pics or it didn't happen

We are going to continue the information sharing about the infrastructure that makes EVE work on the next installment. Although not everyone gets excited about cabinets and a datacenter, there are a few that do.

I so do. Infact I work with those very same ibm racks at work :)

Quote:
Servers
64 x IBM HS21
2x Dual Core 3.33GHz CPU's
32GB of RAM Each
1x72GB HDD Each

dont the HS21s usually come with RHEL? odd choice.
Quote:

Cores
- 280 total Cores
- ~1 THz

Flops?
Linpack gogogogo

Have you guys actually deployed Microsoft's HPC?

Avenger1
Posted - 2010.06.16 19:58:00 - [100]
 

Nice one CCP, Cool can I have an invite to watch it happen? :D

Hustomte
The Knights Templar
R.A.G.E
Posted - 2010.06.16 20:07:00 - [101]
 

Is it possible to get this very important announcement added to the Eve-Gate calendar?
I know people who don't regularly check the homepage and having this up there may help...

Steve Celeste
Overdogs
Posted - 2010.06.16 20:08:00 - [102]
 

CeeCeePee I DEMAND next level fleet fights!!
Whyyyyy am I even paying you for current level fleet fights *shakes fist



Wink
Keep up the good work guys.

Barakkus
Posted - 2010.06.16 20:10:00 - [103]
 

Edited by: Barakkus on 16/06/2010 20:11:34
Originally by: CCP Yokai
"why not have a huge VM array?"

We get this question a lot and the answer is pretty simple. Think of a server, even a very big one as a loaf of bread. Each time you make a slice you leave some crumbs behind (the overhead of VMís) no matter how small or efficient the slicing the fact is you donít get the peak capacity you could if it were dedicated to the one service.

In Eve we already virtualize so to speak by distributing solar systems onto servers based on usage data. But we donít need the overhead of many of those popular virtualization software providers when we do need to dedicate a node to Jita, Fleet Fights, etc. So, in some waysÖ Eve is very virtualized and very good at it.



This is what I've been beating into my boss' head for the last 2 years. He found out about virtualization and he wanted to virtualize EVERYTHING in the office...I fought tooth and nail to keep some things on dedicated servers. Had to do the same thing when he found out about iSCSI...he figured a heavily loaded Exchange server and PostgreSQL server would be great on iSCSI Rolling Eyes

Kasturi Levolor
Posted - 2010.06.16 20:16:00 - [104]
 

Edited by: Kasturi Levolor on 16/06/2010 20:18:08
Originally by: CCP Yokai
Edited by: CCP Yokai on 16/06/2010 14:26:51
"Did you guys consider the Cisco UCS for the blade servers?"

We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.

The UCS solution is very good for virtualized and for solutions where many if not most of the servers need lots of connection types (fiberchannel, Gig-E, Infiniband, Etc...) The EVE code is quite amazing that we can get 60,000 plus players on 64 servers with only Gig-E connectivity. Do some research and see how many servers someone like a game about secondary life needs to operate at that level.

Not to keep pimping future blogs... but, the next one is all About how we map EVE's 7929 solar systems (w/wormholes) onto those 50 or so nodes that handle solar systems and make sure the one you are playing in has the correct load. That's where we'll make the most noticable impact on performance in the short term.






I could be wrong but wouldnt it be better if you kept your eyes open for things that would make TQ better?


.

Dacil Arandur
Habitual Euthanasia
Pandemic Legion
Posted - 2010.06.16 20:17:00 - [105]
 

Several of the hosting providers my company works with have done server moves in the past, and most of them had webcams or some sort of streaming video of the operation.

What are the chances something like that could happen for this big move?
Might not have the same audience as the alliance tournament, but I'm sure some of us would love to watch. It seems there are lots of us professional IT folks here. I find the best part of my IT job is that I can play a little bit of EVE at work... lol

Miriallia Clyne
Gallente
Silver Snake Enterprise
Against ALL Authorities
Posted - 2010.06.16 20:19:00 - [106]
 

Awesome upgrades are always welcome! Very Happy

What about the Infiband interconnects that were mentioned a while back to increase the compute power of the whole cluster ?

Also see you have APC in there but why not going for APC InRow cooling for high density ?

Reborn Master
Posted - 2010.06.16 20:20:00 - [107]
 


HenkieBoy
Sebiestor Tribe
Posted - 2010.06.16 20:27:00 - [108]
 

"the lag doesn't seem as bad as long as your keeping us in the loop as to how your trying to fix it" <- this Very Happy

What I do wonder: When a player jumps from one system to another the player 'object' gets moved to another server which hosts that system? And if so, does it mean if communications between servers gets better we get less lag and 'trafic controls' when large fleets jump into other systems?

Krimishkev
GONE RETARD BACK LATER
Posted - 2010.06.16 20:36:00 - [109]
 

What you say!!

Next level fleet fights?


WHATEVER DOES THIS MEANS???

PLEASE TELL. THX!

Clansworth
Good Rock Materials
Posted - 2010.06.16 20:43:00 - [110]
 

Edited by: Clansworth on 16/06/2010 20:44:14
Originally by: HenkieBoy
What I do wonder: When a player jumps from one system to another the player 'object' gets moved to another server which hosts that system? And if so, does it mean if communications between servers gets better we get less lag and 'trafic controls' when large fleets jump into other systems?
I doubt there is an 'object' representing your character that moves from here to there. Instead, I'm near certain all that happens is the location data in the SQL for your ship is changed to reflect your new location, and the client reconnects to the new 'session' (hence the in-game terminology), which may or may not be processed by a different node in the cluster. In the end, i don't think there's many transactions at all from SOL to SOL. Instead, everything is SOL to/from the database servers, with the SOL's simply managing how that database info changes, and serving requests from the client.

Lee Dalton
Sniggerdly
Pandemic Legion
Posted - 2010.06.16 20:47:00 - [111]
 

What are the RAM SAN and SSD SAN dedicated to?

(I'm guessing the RAMSAN holds the main DB? ... )

Liorah
Posted - 2010.06.16 20:48:00 - [112]
 

Originally by: Forum Alterson
It's annoying how hard Virtualization has been pushed, that people think it's a solution everyone needs. VMs are great if you want to buy large hardware, to run multiple server instances. If the EVE Server Application moves nodes between servers to load balance, what could they possibly gain by adding a Hypervisor overhead on-top of that?

Virtualization is great for some things. In my experience with it, and based on the descriptions of what TQ had to accomplish as far as reinforcing nodes and whatnot, with the problems they're having now at actually doing this, VMware's VMotion would seem to be an appropriate fit. Sure, there is undoubtedly room for improvement in VMware's code; no software is ever perfect.

However, as I noted, this was written not understanding exactly what they're doing now, which a future devblog promised to discuss. If they're already using a custom solution, then you're right, they wouldn't need an extra layer of complexity to do what they're already doing. Although, if they -are- using a custom solution, depending on the specifics, it may not be the most efficient, and something else tried and tested by millions of other customers may be better. By definition a custom solution doesn't have widespread testing or adoption. It all depends on what they're doing now and whether these upcoming changes help solve the problem, or only merely buy them some extra time.

We're not the ones who can make that decision anyway; all we can do is make suggestions based on our own experiences and knowledge, looking at the particular problem set presented by this game. If their network team feels their current methods are superior, they're the ones who should know the most about it to make the best decisions. However, if after the move things don't improve, revisiting vetoed suggestions might not hurt.

If the blade servers don't turn out to be powerful enough, they may even need to move away from them and make each hardware cluster node something more powerful. Anything could be possible, and we the player-base wouldn't really know the right solution.


CCP Yulai, I look forward to reading your next devblog. Thanks!

Grez
Neo Spartans
Laconian Syndicate
Posted - 2010.06.16 20:57:00 - [113]
 

It's a shame Intel still tout the FSB when HT is far, far faster. One hopes that they'll change it one day, and CCP will take-up the servers!

Much <3 on this dev blog, PLEASE do more like it VERY soon!

Doctor Ungabungas
Caldari
GoonWaffe
Goonswarm Federation
Posted - 2010.06.16 21:40:00 - [114]
 

Originally by: CCP Yokai
"why not have a huge VM array?"
We get this question a lot and the answer is pretty simple. Think of a server, even a very big one as a loaf of bread. Each time you make a slice you leave some crumbs behind (the overhead of VMís) no matter how small or efficient the slicing the fact is you donít get the peak capacity you could if it were dedicated to the one service.



You don't consider the ability to seamlessly transfer nodes to new hardware and dynamically reallocate resources to struggling nodes worthwhile?

Rather than buying all that IBM hardware you should have sat down with an IBM consulting team and figured out how you can improve your product.

falcon216
Posted - 2010.06.16 21:45:00 - [115]
 

Well it is good to see that the server's are getting the attention they deserve, especially the cooling part. Now I have read through the thread, but i could have missed the questions i am about to ask:

1st) Will large scale fleet battles be less prone to lag?
2nd) With the upgrades that were mentioned will loading between systems increase?
3rd) Will these upgrades to TQ also be a part of DUST 514, or will that itself be a stand alone server?

-falc Ω



Hawk TT
Caldari
Bulgarian Experienced Crackers
Posted - 2010.06.16 21:50:00 - [116]
 

Edited by: Hawk TT on 16/06/2010 21:55:04
Originally by: Grez
It's a shame Intel still tout the FSB when HT is far, far faster. One hopes that they'll change it one day, and CCP will take-up the servers!

Much <3 on this dev blog, PLEASE do more like it VERY soon!


You could read my post above...

Intel introduced QPI (Quick-Path-Link, which is their version of HT) with Nehalem (Core i7 / Xeon 55xx).
Now we are talking about Westmere-EP a.k.a. Xeon 56xx series CPUs with even better I/O, Memory & Cache performance...

CCP/EVE need less CPU cores, but cores with the highest frequency and IPC performance (Intstructions per Cycle), because of the single-threaded (at OS level!!!) nature of their server code.
Without going into details, Stackless Python has great benefits, but also some serious drawbacks...Google it ;-)
Most of the server-side logic is written in Stackless Python and runs on thousands of micro-threads in parallel, but one SOL grid, with all its micro-threads runs on one OS-thread, on a single CPU core...And it's not that CCP was stupid - 8-10 years ago, when EVE was still in development, Intel's "mantra" was "MHz is KING", so they were promising 10GHz / 20GHz CPUs...The old days of the "Netburst architecture"...CCP needed really multi-threaded language in a "Single-CPU-Core World", so they've picked Stackless Python...10 years later, "MHz is not King", but "Multi-Core / Many-Core" is king ;-). It's too late to re-write milions of lines of code...


Anyway - about AMD HT vs. Intel FSB vs. Intel PI:
Read this page carefully: 12C Magny-Cours vs. 6C Xeon 56xx - Memory Bandwidth, Cache & Memory Latency. Xeon is more efficient, but the latest Opteron has 4 vs. Intel's 3 memory channels per socket.
The Wolfdales used by CCP for the SOL Blades still use the old FSB @

Doctor Ungabungas
Caldari
GoonWaffe
Goonswarm Federation
Posted - 2010.06.16 21:53:00 - [117]
 

Originally by: Forum Alterson
It's annoying how hard Virtualization has been pushed, that people think it's a solution everyone needs. VMs are great if you want to buy large hardware, to run multiple server instances. If the EVE Server Application moves nodes between servers to load balance, what could they possibly gain by adding a Hypervisor overhead on-top of that?


It's pretty clear that the Eve Server Application doesn't do anything under load except fail. (The NC are being Y-2'd even as we speak.)

Lord XSiV
Amarr
Kodar Innovations
Posted - 2010.06.16 22:05:00 - [118]
 

Originally by: CCP Yokai


Lord XSiV - fire your architect/systems integrator and spend the money on a qualified one.

Answer - I'm the new guy... give me a few days ;) We do have Brocade, and for Cisco it is not as simple as better or worse... finding the right solution... WS6748-ge-tx with DFC3's makes a significant impact on side to side switching. In anycase... as mentioned, only so much can be done on the internal network to help. Clock speed is still the big issue today.




Ok I will cut you some slack then. Plus the mess you got left with isn't a fun thing to deal with (been there, done that, in fact way too often) and imho, you should be able to flog the ignorant who created the mess. But you should take the opportunity to toss around some cliche terms - "forklift" and "nuking" comes to mind. They always get management on edge which is where they should be.

But really, you do have an extremely inefficient design - looks to be a typical 'IBM' solution where they get you to buy way too much underpowered junk than you need and would be better served with a simpler solution. Realistically for the horsepower stated you could easily cut down the rack space to 3-4 using some of the more modern solutions out there. Factor in the cost savings from electrical, hvac and dc space you are looking at some serious cost savings right away.

And yeah, this isn't even close to being considered a 'super computer', in fact it wouldn't even be considered a 'mini' these days.


Lord XSiV
Amarr
Kodar Innovations
Posted - 2010.06.16 22:16:00 - [119]
 

Originally by: G'Kar5
Originally by: Lord XSiV
Originally by: CCP Yokai
Edited by: CCP Yokai on 16/06/2010 14:26:51
"Did you guys consider the Cisco UCS for the blade servers?"

We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.




Well this partially explains the reason for the lag. Cisco is low to mid tier equipment at best - if you want to push some serious traffic then you need to move up to some real gear from Juniper or even Brocade. Heck you could even use a Fortinet solution and get superior performance more cost effectively and add in security....



What are you smoking? Cisco is some of the best networking gear out there. Granted, Juniper is in the same class but nothing else is even close. You could also maybe use Brocade(Foundary) or Extreme for low end switching. I take it you work for Fortinet?

While the 7600 is not the greatest platform out there (being replaced by Juniper MX and Cisco ASR9K), its still one of the best performing platforms for the price. It is WAY overkill for what CCP needs. Most of the lag is either propagation delay (good luck going faster than 2/3 speed of light) or server lag anyway.


Your ignorance is obvious.

Cisco isn't used for anything important, just low end enterprise class. People think it is a good choice mainly because they don't know any better. When you get into moving massive amounts of traffic across large areas (or need very low latency, high bandwidth locally) then you go with some real hardware from Juniper.

I can let you in on a little secret in the tele/datacom space - Cisco isn't a hardware manufacturer, they create software. They make their money off IOS support contracts. The hardware is only a fraction of the cost (less than 5%) of purchase price leaving 95% for the services end. Essentially they outsource all of their production to the lowest cost bidders and use 'jit' production/distribution. This is why Cisco has one of the highest rate for port failures on their switching equipment; it is easier (and cheaper) for them to replace the gear under service contract as it ensures the client continues to pay for that contract.

Real companies that are part of critical infrastructure can't risk the downtime and poor performance offered by the Cisco line which is why you only see Juniper, Alcatel, Eriksson and even some old Nortel stuff used for major backbones by any of the 3 IECs. The smaller guys compromise and go with the Cisco junk and more often than not have a higher rate of occurrence for outages which at the end of the day costs them high end customers.


joe hamil
Posted - 2010.06.16 22:22:00 - [120]
 

this is the one of the things that CCP does really well, i have played many MMO's before and givena lag issue or something server side, it is all very "hush hush" like it never exisited, and that the players somehow would never comprehend the complex workings of their servers or system.

but here the devs and engineers of our game, and the people that play all seem to be on the same level, honest and willing to help

in a way this is what i think makes eve online brilliant

and i am glad to have a small part in it.

ty for the insight into your end CCP, i will keep on turning up to mass testing as often as i can


Pages: first : previous : 1 2 3 [4] 5 6 7 8 9 ... : last (11)

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only