open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: TQ Level Up
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: first : previous : 1 2 [3] 4 5 6 7 8 9 ... : last (11)

Author Topic

Serpents smile
Posted - 2010.06.16 17:00:00 - [61]
 

Oooh, finally pictures.Surprised Didn't take us that long to get them did it? Laughing
Looks a bit scary though. Shocked

Anyway I shut up so the more knowledge peeps can ask difficult questions.

Any chance the infiniband thread can get some dev love?

Bloodcrow
Posted - 2010.06.16 17:02:00 - [62]
 

Supprised your still running TQ on HDDs. when Solid State Drives are the way forward, and soooo much faster!

ihcn
Posted - 2010.06.16 17:03:00 - [63]
 

Originally by: Bloodcrow
Supprised your still running TQ on HDDs. when Solid State Drives are the way forward, and soooo much faster!


I think you missed the post where it said that the app servers barely touch the hard drives.

Marlinea
Minmatar
United Front Alliance
Posted - 2010.06.16 17:08:00 - [64]
 

i never encounter lag yet, but that seem its will be great for large fleet battle :D cant wait to see teh result :D

Luke S
Zeta Corp.
Posted - 2010.06.16 17:09:00 - [65]
 

Thanks for keeping us in the loop. When I "USE" to play wow. the servers were BAD! and I never herd the devs from Blizzard say that they were upgrading their servers. instead I heard they were adding more servers. CCP kicks any other company in the pants when it comes to maintaining their servers for an MMO.

Derelicht
Posted - 2010.06.16 17:17:00 - [66]
 

Originally by: Silverlinings
Ehm...

When you place the hamsters back, PLEASE do not forget to feed them or TQ will def stay offline.

Great news to see "our EVE" growing again...


The hamster jokes never get old.

Rolling Eyes

Amarok Tonrar
5ER3NITY INC
Posted - 2010.06.16 17:17:00 - [67]
 

Originally by: CCP YokaiAs
I mentioned, upcoming blog posts will talk about what we are doing for:

-Remapping EVE



Originally by: CCP YokaiAs
Not to keep pimping future blogs... but, the next one is all About how we map EVE's 7929 solar systems (w/wormholes) onto those 50 or so nodes that handle solar systems and make sure the one you are playing in has the correct load.


Aww....and for the briefest of moments, I thought you were meanin our in-game maps. :) Would have been devilishly amusing. Twisted Evil

"Hey! Where'd Jita go?!?!"

Yeah...what do I know about server infrastucture...

Commander Azrael
Red Federation
Posted - 2010.06.16 17:20:00 - [68]
 

Edited by: Commander Azrael on 17/06/2010 10:22:47
Moar datacentre pics!!!

mechtech
SRS Industries
SRS.
Posted - 2010.06.16 17:23:00 - [69]
 

Originally by: Luke S
Thanks for keeping us in the loop. When I "USE" to play wow. the servers were BAD! and I never herd the devs from Blizzard say that they were upgrading their servers. instead I heard they were adding more servers. CCP kicks any other company in the pants when it comes to maintaining their servers for an MMO.


Yep, when I was a new player, dev blogs like these quickly pointed out that CCP was a higher quality dev team that really cares about their game.

Also, I think you'd be surprised by how many people actually love the techy bits in a dev blog.

Gornax Garrul
Minmatar
Posted - 2010.06.16 17:24:00 - [70]
 

I'm surprised you haven't upgraded to 10Gb ethernet or IB connections on your internal network. Or have you found that 1G ethernet is not a bottleneck when server population spikes?

Guillame Herschel
Gallente
NME1
Posted - 2010.06.16 17:27:00 - [71]
 

Edited by: Guillame Herschel on 16/06/2010 17:27:58
Thanks for the specifics about what IT assets are needed to run EVE. I was always curious about what level of scale your IT operation is at.

Quote:

Servers
64 x IBM HS21
2x Dual Core 3.33GHz CPU's
32GB of RAM Each
1x72GB HDD Each

2 x IBM X3850 M2's
2x Six Core 2.66GHz
128GB of RAM
4 x 146GB HDD



Since I work there, by way of comparison, I can say that Ticketmaster.com uses about 10 times the number of comparable servers spread over 5 datacenters and two continents to sell tickets every weekend. Our Beijing operation, which ticketed the 2008 Olympics, ran on a cluster about the same size as this one for TQ.


Terrax Norik
Posted - 2010.06.16 17:28:00 - [72]
 

Originally by: Gornax Garrul
I'm surprised you haven't upgraded to 10Gb ethernet or IB connections on your internal network. Or have you found that 1G ethernet is not a bottleneck when server population spikes?



Originally by: CCP Yokai
Edited by: CCP Yokai on 16/06/2010 14:26:51
"Did you guys consider the Cisco UCS for the blade servers?"

We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.

The UCS solution is very good for virtualized and for solutions where many if not most of the servers need lots of connection types (fiberchannel, Gig-E, Infiniband, Etc...) The EVE code is quite amazing that we can get 60,000 plus players on 64 servers with only Gig-E connectivity. Do some research and see how many servers someone like a game about secondary life needs to operate at that level.

Not to keep pimping future blogs... but, the next one is all About how we map EVE's 7929 solar systems (w/wormholes) onto those 50 or so nodes that handle solar systems and make sure the one you are playing in has the correct load. That's where we'll make the most noticable impact on performance in the short term.




Bagehi
Association of Commonwealth Enterprises
Posted - 2010.06.16 17:33:00 - [73]
 


Forum Alterson
Posted - 2010.06.16 17:34:00 - [74]
 

Originally by: CCP Yokai
Liorah,

VM Ware and similar solutions are pretty damn cool for that kind of thing. Again, not that it's a bad idea, but there are complexities to moving sessions around on the virtual nodes even in seconds.

Right now we have some very dedicated guys that make sure the systems get reallocated, and part of the tools I'm talking about in "Predicting Hot Spots" is all about knowing where and when to put nodes to dedicated status and making it completely automated.

The ease of use trade off, with virtualization is just not as high on our list as getting fights bigger... we are at or near Moore's Law and I don't expect to see us getting CPU to 3x anytime soon... so every % we can protect we do.


It's annoying how hard Virtualization has been pushed, that people think it's a solution everyone needs. VMs are great if you want to buy large hardware, to run multiple server instances. If the EVE Server Application moves nodes between servers to load balance, what could they possibly gain by adding a Hypervisor overhead on-top of that?

Guillame Herschel
Gallente
NME1
Posted - 2010.06.16 17:39:00 - [75]
 

Originally by: Trick Novalight
Very interesting, now that makes me wonder how the severside app and netcode is set up. why not have a huge VM array?


Speaking as someone involved with an IT department that is migrating to just exactly this sort of "cloud computing" approach, the tech for efficient VM is relatively new, when compared to the planning (not to mention purchasing) necessary to execute the migration. Remember, they have to keep the existing discrete server farm running while bringing up the "cloud" and cut over to the cloud a piece at a time, all while keeping the TQ ball in the air.

tl;dr: It'll take some time.


Tobin Shalim
Eclipse Industrials
Quantum Forge
Posted - 2010.06.16 17:42:00 - [76]
 

Originally by: Chribba
Edited by: Chribba on 16/06/2010 12:38:35
ugh

I came.




This. There are days when I love being a tech geek. This is one of them.


Question for CCP: I know that there have been rankings in the past comparing Eve's cluster to other supercomputers/systems in the world. How does the new hardware compare/improve the listing?

Andrea Griffin
Posted - 2010.06.16 17:52:00 - [77]
 

Geek Girl chiming in. Cool

I love the more technical blogs that you guys put together. It's always fun to get a behind-the-scenes look at things. I'm really impressed that you're able to do so much with so little hardware. I thought the TQ cluster had many more machines than this!

Dacil Arandur
Habitual Euthanasia
Pandemic Legion
Posted - 2010.06.16 17:52:00 - [78]
 

My company uses VMWare in three of our datacenters for high availability e-commerce stuff. It's mostly running on x3650 M2s, some x3850s, and we have several full blade chassis in our Ontario Canada DC. There are definitely advantages. Portability, ease of new server deployment, snapshots, and backups are fantastic, but there is still a performance hit having that extra layer. (And the vSphere client crashes a lot, which is super annoying.)

Also, I'm sure the TQ cluster is much more efficient at moving solar system instances around than VMWare would be. I find a 2K3 server can take quite a bit longer than "a few minutes" to migrate to a different host under very high load conditions.

Anyway, I LOVE reading these tech blogs and am very much looking forward to the next one!

Qoi
Exert Force
Posted - 2010.06.16 17:52:00 - [79]
 

Best devblog for a long time, thank you very much for sharing Very Happy

(Can i visit you and fondle those blades? Just once. Embarassed)

Dan O'Connor
Cerberus Network
Dignitas.
Posted - 2010.06.16 17:55:00 - [80]
 

Greetings Professor Falken.
Do you want to play a game?

Dakisha
The NomNomNom
Posted - 2010.06.16 17:55:00 - [81]
 

Glanced over, but didn't spot anyone ask this before, but...

I work with virtulisation myself (hosting), and think I can see the basics of the way you're going around splitting nodes across cpus/cores, etc.

And so I have to ask. How come, given the seriousness of lag in 0.0 these days - that you've not upgraded to modern cpu's? We've got quad and six core cpu's on more efficient platforms these days doing the same clockspeed but doing so much more.

And as for lag prediction - in part, someone drops an SBU - reinforce system. If my understanding is correct (and it could be completely wrong) then even without a full hardware upgrade - we're only talking about dropping a few grand on a few nice machines for further reinforced systems.

Surely cores/cpu power is a fairly significant part of the issue?

Lord XSiV
Amarr
Kodar Innovations
Posted - 2010.06.16 17:56:00 - [82]
 

Originally by: CCP Yokai
Edited by: CCP Yokai on 16/06/2010 14:26:51
"Did you guys consider the Cisco UCS for the blade servers?"

We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.




Well this partially explains the reason for the lag. Cisco is low to mid tier equipment at best - if you want to push some serious traffic then you need to move up to some real gear from Juniper or even Brocade. Heck you could even use a Fortinet solution and get superior performance more cost effectively and add in security....

I guess the other reason could be attributed to the junk gear you got from IBM. To those posting here who think that set up is impressive you may want to reconsider. Sure it would be impressive for your basement but for a commercial operation that kind of setup is very inefficient (12 racks for only 280 cores?) and would be way overpriced coming from IBM. Not to mention the extra money thy needs to be spent for electrical and added HVAC requirements. A better solution would be going with either hp or a custom bespoke solution (in fact there is a great company in London specializing in super computing) for better performance and efficiency at a much lower cost. You go with IBM for mainframe, not commodity based hardware.

The only real good thing is the storage from TMS - then again there isn't much choice for vendors in that department.

So CCP is probably paying 2-3 times as much for this setup as they should be; fire your architect/systems integrator and spend the money on a qualified one. Doing so would pay for itself rather quickly and provide a much better infrastructure. Just ask Sony how it went with the Matrix Online and IBM's "On Demand" solutions - more like IBM screwing them "On Demand" :)



Aisley Tyrion
Aseveljet
Posted - 2010.06.16 17:58:00 - [83]
 

All I can say is...

Prepare for unforeseen consequences. Twisted Evil

Taxesarebad
Posted - 2010.06.16 18:01:00 - [84]
 

Originally by: Camios
Originally by: CCP Yokai
"Just try to keep use abit more in the loop would be great."

That is the plan. There are lots of exciting things brewing in Virtual World Operations right now...

As I mentioned, upcoming blog posts will talk about what we are doing for:

-Remapping EVE
-Next Level Fleet Fights
-Hot Spot Prediction




-Remapping EVE: what's this? hope it's not bad news

-Next Level Fleet Fights: Is the current level working? If not, why are we already at the next level?

hmmmm btw sounds interesting. hurry up with those devblogs!


-Remapping EVE -- whats this? i hope amarr conquers minmatar
-Next Level Fleet Fights-- this must when all cap ship pilots are at the level 5 skill instead of 4
-Hot Spot Prediction -- Florida, texas, Icelands volcano, africa it stays hot in th same places all the time...

XenosisReaper
Interwebs Cooter Explosion
Important Internet Spaceship League
Posted - 2010.06.16 18:02:00 - [85]
 

Originally by: Aisley Tyrion
All I can say is...

Prepare for unforeseen consequences. Twisted Evil


that just made me remember how ****ing mad I am at Valve for ****ing not announcing HLep3.

You meany

Molly Cutter
Posted - 2010.06.16 18:04:00 - [86]
 

Originally by: glas mir
Originally by: Molly Cutter


...64 blades is too round number (IT round) to be ignored. Perhaps cluster is at infrastructure limit already? And next step is what - 128?


Clusters tend to grow by powers of 2 because of communication efficiency - a major bottleneck in multiprocessor programming. For instance you can reduce a value from all nodes to a single one in log base 2 time. So reducing 65 nodes takes the same time as reducing 128.

/agreed
My point was exactly that, thanks for clearing it up. It is just that if doubling cluster size would fix everything that would be cheapest solution of them all. That, however, may not be case.
By the look of it, general direction that first step is upgrade on cluster communication infrastructure may show us light.

In real life, some decisions, made years ago, in that era, may come back later and bite us in the a**. Happened to me few times, and i can see it here. But it easy for me to cripple my system for a week and than deploy new one. For CCP that may be pretty darn close to mission impossible (on different levels). But I am happy to see it moving somewhere.

*** but yeah, it really look little undersized isn't it? :D

Vaerah Vahrokha
Minmatar
Vahrokh Consulting
Posted - 2010.06.16 18:07:00 - [87]
 

I don't know why, I get this feeling of us playing with our new little Lego toy aka PI and at the same time at CCP they are about to play with another Lego toy, including managing something like power grid, CPU and so on.


Quote:

So CCP is probably paying 2-3 times as much for this setup as they should be; fire your architect/systems integrator and spend the money on a qualified one



I think if CCP built the architecture today they'd also do different choices.
But no, they have legacy to care about and also some strict service availability parameters to honor. I don't see them going down to re-engineer the whole thing with 3 weeks of downtime for us to enjoy.

A3A3EJ1b
Posted - 2010.06.16 18:12:00 - [88]
 

I propose to place a server EVE in space .. geostationary orbit .. :)
Pluses:
1. cosmic cold.
2. solar energy.
3. lack of earthly disasters.

Swidgen
Posted - 2010.06.16 18:18:00 - [89]
 

I make the over-under on the TQ restart time to be no sooner than 20:00 on 23 June.

The fact that you can't leave the forums up while taking downtime on the game servers has always bothered me.

Eleventh Bride
Posted - 2010.06.16 18:27:00 - [90]
 

Originally by: Lord XSiV
Originally by: CCP Yokai
Edited by: CCP Yokai on 16/06/2010 14:26:51
"Did you guys consider the Cisco UCS for the blade servers?"

We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.




Well this partially explains the reason for the lag. Cisco is low to mid tier equipment at best - if you want to push some serious traffic then you need to move up to some real gear from Juniper or even Brocade. Heck you could even use a Fortinet solution and get superior performance more cost effectively and add in security....

I guess the other reason could be attributed to the junk gear you got from IBM. To those posting here who think that set up is impressive you may want to reconsider. Sure it would be impressive for your basement but for a commercial operation that kind of setup is very inefficient (12 racks for only 280 cores?) and would be way overpriced coming from IBM. Not to mention the extra money thy needs to be spent for electrical and added HVAC requirements. A better solution would be going with either hp or a custom bespoke solution (in fact there is a great company in London specializing in super computing) for better performance and efficiency at a much lower cost. You go with IBM for mainframe, not commodity based hardware.

The only real good thing is the storage from TMS - then again there isn't much choice for vendors in that department.

So CCP is probably paying 2-3 times as much for this setup as they should be; fire your architect/systems integrator and spend the money on a qualified one. Doing so would pay for itself rather quickly and provide a much better infrastructure. Just ask Sony how it went with the Matrix Online and IBM's "On Demand" solutions - more like IBM screwing them "On Demand" :)





The thing is, you're thinking only about cores. The IBM bladecenter set-up they have looks to me to be built to maximize bus path width per core. With four cores per blade and only 8GB of memory per core, I'm more concerned with memory management than anything else--imagine what happens to a fleet fight when the node starts to swap. I wouldn't go to anyone but IBM for blades, especially not if I want them delivered and maintained in Europe.

Personally, I don't like local disk at all on blades. A blade is FRU, boot it off the SAN and all you have to do is update your acls when one dies. But that's just my philosophy.

Ciscos can easily handle 66 nodes, which seems to fall into mid-tier to me.


Pages: first : previous : 1 2 [3] 4 5 6 7 8 9 ... : last (11)

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only