open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: Fostering meaningful human interaction, through testing
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 2 3 [4] 5 6

Author Topic

CCP Oveur

Posted - 2010.08.17 14:34:00 - [91]
 

Originally by: Sered Woollahra
Quote:
Highsec residents, while they could easily participate, may be hesitant to join large fleets, even on the test server, without having some experience in fleets.


To be honest, some of us are using the mass tests to *get* some fleet experience Very Happy


That's a great point, hadn't thought about that angle Very Happy

CCP Oveur

Posted - 2010.08.17 14:38:00 - [92]
 

Originally by: Ben Derindar
Good blog, and good follow-up presence in this thread from CCP. One question though:

Originally by: CCP Tanis
The "herding mentality" and how that affects EVE is something I've put a quite bit of thought into over the years. Though I cannot say with any authority what the game designers are thinking on that front currently, as I'm in QA. I can say that it will never be solved by a single thing, it's simply too ingrained of a part of how humans work in large numbers.

Does game design not agree that the ease with which players can currently form up from anywhere across the entire galaxy at a day's notice through a combination of travel mechanics, is a major catalyst for this?

/Ben


The flipside is that if they can't easily form up, they simply won't, preventing one of the fundamental experiences of EVE to not work at all, the massively multiplayer part Wink

But sure, there are areas that can be tuned to make it less easy, the clone jumping restrictions are in there for a game design reason, not a technical one.

So yeah, we can certainly change game design to affect herding but you know, we want to remain true to multiplayer and just fix the damn thing.

CCP Oveur

Posted - 2010.08.17 14:42:00 - [93]
 

Originally by: Janitor I
In summary .. more blah blah blah and no solution .. ugh


Dear alt,

You would have known we don't have a solution if you had read the previous blog. This blog series is about what and how we're doing it. Thank you for stating the obvious.

Best regards,

Nathan

Frug
Omega Wing
Snatch Victory
Posted - 2010.08.17 14:50:00 - [94]
 

Edited by: Frug on 17/08/2010 14:50:32
Originally by: CCP Oveur

The flipside is that if they can't easily form up, they simply won't, preventing one of the fundamental experiences of EVE to not work at all, the massively multiplayer part Wink

But sure, there are areas that can be tuned to make it less easy, the clone jumping restrictions are in there for a game design reason, not a technical one.

So yeah, we can certainly change game design to affect herding but you know, we want to remain true to multiplayer and just fix the damn thing.


There are gonna be a lot of people who read this and rage, because to them the tendency to form massive fleets is too prevalent and too easy. And they would probably say that "of course people will still form big fleets" and adding to the logistics of getting node breaking fleets will just dampen that slightly. I don't know that area very well but I tend to agree with that point of view. After you guys introduced warp to 0 and those fancy 0.0 bridge things, players have had discussions about the travel time restrictions until there was nothing left to say. I don't think anyone believes it would prevent big fights from happening if some of those travel tools were dampened a bit.

CCP Oveur

Posted - 2010.08.17 14:51:00 - [95]
 

Originally by: wr3cks
Good stuff.

I was at the last test, though, and it seemed very well attended. I think there were at least 350 to 500 in local, no? Isn't that within your stated goal? Are you trying to get it up to 1k+?

Also, as I suggested in the feedback thread, please post info about the test (the moveme channel, that the market in syndicate is seeded with 100 isk stuff, etc) in the Help default channel's MOTD. I didn't know about any of that stuff and was autopiloting 38 jumps to jita to go shopping until I caught an alliancemate, but there were 20ish odd guys who hopped into the Help channel looking for the same info.


I can tell you that for us to successfully fix 1k+ battles we need 1k+ battles to happen in a controlled environment. This is because that when a battle scales up, there are more systems which reach breaking points. So a 600 people battle might work well and everything is fine but at 800 people, some other systems become the bottleneck or the interaction between 3 systems with more than 1000 becomes the problem.

You see, "fixing lag" isn't one fix in some bug we spend 6 months with a team of the greatest minds in massively scalable computing trying to find.

It's more in the order of 100 various fixes.

Many of which will have adverse effect on other systems and will be reverted and changed, then retried.

Many of which have to be deployed selectively to Tranquility and simply monitor what happens at the risk of crashing the cluster because testing them on Sisi is impossible.

Bomberlocks
Minmatar
CTRL-Q
Posted - 2010.08.17 14:53:00 - [96]
 

Originally by: CCP Warlock

We don't give all systems a separate physical server because they don't need one. Were the player base to expand to the kind of numbers when they did, then the existing architecture could be scaled to that number with some modifications to internal routing. For a lot of players simultaneously jumping into a single system, the problem is that that is essentially a set of requests for each player, so the total number of messages that the server has to deal with is a multiple of the number of players. So induced computational load (the amount of processing on the server that has to be done in response to a message's arrival), and the associated queuing is one of the issues there.


I came across a thread on another forum of how to workaround the additional resource overhead of many people requesting the same session change at once. One the CSM members, Vuk Lau, I think it was, stated that fleets will eventually always grow to the maximum number of supported players, and that CCP will probably only solve the lag in a realistic fashion through game design mechanics in the end, i.e. a hard limit on how many people can participate in any given system.

From this I have two questions:
1. Would it not be easier now, even as a temporary workaround, to limit unreinforced systems to a certain proven-in-practice number of players? This would be in force until more breakthroughs had been achieved in combating lag.
2. One idea that occurred to me was the idea of a fleet jump button and information pooling. Since almost all of the big fights are fleet fights, wouldn't it be possible to implement a fleet jump button that the fleet FC could use to jump the whole fleet as once? The grouped fleet info would no longer have to be treated as x number of requests but as one "fleet group" request. I haven't explored the idea further, but I have an idea that I've personally come across this particular problem in another context before and that significant resource savings were thus possible (the server can skip the individual jump requests and go straight on to the process of loading the grid for the fleet members). Of course, I have no idea if this is even remotely possible or true in Eve. What would you think of such an idea?

Darth Vapour
Posted - 2010.08.17 15:02:00 - [97]
 

Can I suggest running one mass jump test with a fleet set up to provide no gang bonuses at all ? Modifying hundreds of ship attributes every time a jump happens is one of the things I've always suspected of having bad consequences.

CCP Oveur

Posted - 2010.08.17 15:02:00 - [98]
 

Originally by: Frug
Edited by: Frug on 17/08/2010 14:50:32
Originally by: CCP Oveur

The flipside is that if they can't easily form up, they simply won't, preventing one of the fundamental experiences of EVE to not work at all, the massively multiplayer part Wink

But sure, there are areas that can be tuned to make it less easy, the clone jumping restrictions are in there for a game design reason, not a technical one.

So yeah, we can certainly change game design to affect herding but you know, we want to remain true to multiplayer and just fix the damn thing.


There are gonna be a lot of people who read this and rage, because to them the tendency to form massive fleets is too prevalent and too easy. And they would probably say that "of course people will still form big fleets" and adding to the logistics of getting node breaking fleets will just dampen that slightly. I don't know that area very well but I tend to agree with that point of view. After you guys introduced warp to 0 and those fancy 0.0 bridge things, players have had discussions about the travel time restrictions until there was nothing left to say. I don't think anyone believes it would prevent big fights from happening if some of those travel tools were dampened a bit.

I don't think I explained that well enough for you. Let me try again.

There are many travel mechanics.

They all have many variables that can be tuned to affect travel, like time, cost and range.

We can change all of them and they are not a binary choice of "fast" or "slow", they have wide ranges to finetune.

If logistics is at a point of being too easy form a fleet that affect the game from a game design perspective, we can fix that with game design.

That does not at all change the fact that a large fleet engagement should be able to happen.

Because changing the logistics and not fixing the ability to have a large fleet engagement only means that you'll get even more frustrated when it doesn't work, after having spent triple the time getting there.

So no, I don't think that people will emorage because we didn't fix people being able to be in the same place through game design like making travel time longer, I think they will be quite happy we get our **** together and fix the real problem Wink

CCP Oveur

Posted - 2010.08.17 15:10:00 - [99]
 

Originally by: Bomberlocks
Originally by: CCP Warlock

We don't give all systems a separate physical server because they don't need one. Were the player base to expand to the kind of numbers when they did, then the existing architecture could be scaled to that number with some modifications to internal routing. For a lot of players simultaneously jumping into a single system, the problem is that that is essentially a set of requests for each player, so the total number of messages that the server has to deal with is a multiple of the number of players. So induced computational load (the amount of processing on the server that has to be done in response to a message's arrival), and the associated queuing is one of the issues there.


I came across a thread on another forum of how to workaround the additional resource overhead of many people requesting the same session change at once. One the CSM members, Vuk Lau, I think it was, stated that fleets will eventually always grow to the maximum number of supported players, and that CCP will probably only solve the lag in a realistic fashion through game design mechanics in the end, i.e. a hard limit on how many people can participate in any given system.

From this I have two questions:
1. Would it not be easier now, even as a temporary workaround, to limit unreinforced systems to a certain proven-in-practice number of players? This would be in force until more breakthroughs had been achieved in combating lag.
2. One idea that occurred to me was the idea of a fleet jump button and information pooling. Since almost all of the big fights are fleet fights, wouldn't it be possible to implement a fleet jump button that the fleet FC could use to jump the whole fleet as once? The grouped fleet info would no longer have to be treated as x number of requests but as one "fleet group" request. I haven't explored the idea further, but I have an idea that I've personally come across this particular problem in another context before and that significant resource savings were thus possible (the server can skip the individual jump requests and go straight on to the process of loading the grid for the fleet members). Of course, I have no idea if this is even remotely possible or true in Eve. What would you think of such an idea?


Artificial hard limits have their purpose, certainly but how would you feel at the other end of the gate, can't jump into the system because it has 300 people in it and you have 3 guys shooting at you.

I actually think I know how you would feel because we have traffic control. And this happens even without the artificial hard limit.

"If there is room it will get filled up". That's totally right. That's why we'll rather continuously make more room. So to answer your questions specifically:

1. We are doing that already in various forms. It doesn't help and is messy.
2. Fleet jump could help with pooling some aspects but it's not like a 100 ship fleet would have the same load as 1 ship but it's totally worth investigating to see if there are any benefits.

Vincent Gaines
Macabre Votum
Morsus Mihi
Posted - 2010.08.17 15:11:00 - [100]
 

Edited by: Vincent Gaines on 17/08/2010 15:12:19
Originally by: Bomberlocks

2. One idea that occurred to me was the idea of a fleet jump button and information pooling. Since almost all of the big fights are fleet fights, wouldn't it be possible to implement a fleet jump button that the fleet FC could use to jump the whole fleet as once? The grouped fleet info would no longer have to be treated as x number of requests but as one "fleet group" request. I haven't explored the idea further, but I have an idea that I've personally come across this particular problem in another context before and that significant resource savings were thus possible (the server can skip the individual jump requests and go straight on to the process of loading the grid for the fleet members). Of course, I have no idea if this is even remotely possible or true in Eve. What would you think of such an idea?


I'm going to throw in a "me too" as I've been thinking about this same thing- pooling fleet data on jump... instead of the numerous processes per client in a fleet that the server side checks separately, if one check is made on same grid/system in the fleet, and from there all known identical data is handled each as one process (I'm not a programmer I'm not sure what word to use) than wouldn't that drastically reduce the number of calls and load on the node?

Indeterminacy
THORN Syndicate
BricK sQuAD.
Posted - 2010.08.17 15:15:00 - [101]
 

Edited by: Indeterminacy on 17/08/2010 15:15:47
I see that CCP has been unable to maintain a test server which reflects their production system thereby enabling timely and effective problem solving.

I am not surprised. 'Test' servers are often given low priority. They are also difficult to maintain in a parallel computing environment.

You (CCP) have an advantage over many HPC shops. You have a single production environment (I hope). You have 1 compute node you have to mimic, one netowrk, etc.

Hopefully someone will be charged with maintaining an upgraded test infrastructure (both the hardware and human aspects).

edit:spelling

CCP Oveur

Posted - 2010.08.17 15:29:00 - [102]
 

Edited by: CCP Oveur on 17/08/2010 15:29:50
Originally by: Indeterminacy
Edited by: Indeterminacy on 17/08/2010 15:15:47
I see that CCP has been unable to maintain a test server which reflects their production system thereby enabling timely and effective problem solving.

I am not surprised. 'Test' servers are often given low priority. They are also difficult to maintain in a parallel computing environment.

You (CCP) have an advantage over many HPC shops. You have a single production environment (I hope). You have 1 compute node you have to mimic, one netowrk, etc.

Hopefully someone will be charged with maintaining an upgraded test infrastructure (both the hardware and human aspects).

edit:spelling


To be clear, the hardware gap we're addressing now isn't our main problem with reflecting our production environment. It's the 50,000 people playing at the same time, where of 1000 might be trying to shoot each other in the face in the same solar system. So compared to that crucial part of replicating what happens to TQ in a controlled test environment, the hardware maintenance is but one of very many factors.

Vincent Gaines
Macabre Votum
Morsus Mihi
Posted - 2010.08.17 15:34:00 - [103]
 

Why does there still need to be a notification system to place a system on a dedicated node? When SBUs become anchored, maybe on the back end an email is shot to someone at CCP?

CCP Warlock

Posted - 2010.08.17 15:37:00 - [104]
 

Originally by: Indeterminacy
Edited by: Indeterminacy on 17/08/2010 15:15:47
I see that CCP has been unable to maintain a test server which reflects their production system thereby enabling timely and effective problem solving.

I am not surprised. 'Test' servers are often given low priority. They are also difficult to maintain in a parallel computing environment.

You (CCP) have an advantage over many HPC shops. You have a single production environment (I hope). You have 1 compute node you have to mimic, one netowrk, etc.

Hopefully someone will be charged with maintaining an upgraded test infrastructure (both the hardware and human aspects).

edit:spelling


Realistic testing of large scale, real time, distributed systems has been a perennial problem for decades. The reality has always been that the live system is larger, and more complex than any realistic test system could be. Putting an individual server under load is doable, but testing a network of servers, and putting it under full capacity is much, much harder. To give the truly extreme example, where is the test system for the entire Internet?

Things have started to get a little better this decade with the hardware price drops and the data centers. The thin clients (which you'll be hearing about later this week) are a major step forward for us, and we're all really excited about the results we're going to get out of them. Even so, we have work in progress to improve our ability to handle and set up tests with large numbers of clients, in and of itself a non-trivial problem. For example, what sort of ship fitting should we setup for any given test? Then, cycle through a bunch of them, with the same test, and compare the results across a pretty large set of collected data on a large variety of system parameters. Which in an ideal world should also be done completely automatically.

Testing these kinds of systems, especially in terms of scaling and load limits is a set of problems in and of itself.

Hawk TT
Caldari
Bulgarian Experienced Crackers
Posted - 2010.08.17 15:39:00 - [105]
 

Originally by: CCP Oveur
Edited by: CCP Oveur on 17/08/2010 15:29:50
Originally by: Indeterminacy
Edited by: Indeterminacy on 17/08/2010 15:15:47
I see that CCP has been unable to maintain a test server which reflects their production system thereby enabling timely and effective problem solving.

I am not surprised. 'Test' servers are often given low priority. They are also difficult to maintain in a parallel computing environment.

You (CCP) have an advantage over many HPC shops. You have a single production environment (I hope). You have 1 compute node you have to mimic, one netowrk, etc.

Hopefully someone will be charged with maintaining an upgraded test infrastructure (both the hardware and human aspects).

edit:spelling


To be clear, the hardware gap we're addressing now isn't our main problem with reflecting our production environment. It's the 50,000 people playing at the same time, where of 1000 might be trying to shoot each other in the face in the same solar system. So compared to that crucial part of replicating what happens to TQ in a controlled test environment, the hardware maintenance is but one of very many factors.


Any recent progress with the "Thin clients" for automated stress testing? Even if they are not feature-rich or very "intelligent", at least you could combine them with the 300-500 real players during mass testing...Or you could just hire some China-bot-addicts to create some load until you are 100% ready for automated testing... Laughing

Daedalus II
Helios Research
Posted - 2010.08.17 15:45:00 - [106]
 

Originally by: CCP Oveur

I can tell you that for us to successfully fix 1k+ battles we need 1k+ battles to happen in a controlled environment. This is because that when a battle scales up, there are more systems which reach breaking points. So a 600 people battle might work well and everything is fine but at 800 people, some other systems become the bottleneck or the interaction between 3 systems with more than 1000 becomes the problem.

You see, "fixing lag" isn't one fix in some bug we spend 6 months with a team of the greatest minds in massively scalable computing trying to find.

It's more in the order of 100 various fixes.


I totally get this, but what I wonder is; if large fleets actually worked under Apochrypha, but didn't work in the next expansion, what went wrong? I guess one could expect certain degradation due to more players on the server overall, but not jumping from 1000 people being able to fight in the same system to max 200 people being able to fight in the same system. If that happens from one day to the next something else has happend, something else is wrong. You just can't say that's due to 100 little things? Why did all those 100 little things go wrong at the same time?

Well, not having played in large fleets neither during Apochrypha or later I can't say if all this is true but it's what I've heard. Maybe it was never possible to have a good 1000 man fight? But surely something must have gone for the worse or people wouldn't be so upset? You don't get upset unless you have tasted something and then lose it again, because otherwise you wouldn't have known what you could have had.

You make this all sound like these are problems that has always been in EVE, that are inherent to large server based games like this, and that when they are fixed the lag will be more or less gone. That's all fair and possible, but then why did Apochrypha work so good when the following expansion didn't? That proves that the game CAN run smooth with large fights, but SOMETHING changed that, and that's what I think people are upset about.

CCP Oveur

Posted - 2010.08.17 15:47:00 - [107]
 

Originally by: Hawk TT
Originally by: CCP Oveur
Edited by: CCP Oveur on 17/08/2010 15:29:50
Originally by: Indeterminacy
Edited by: Indeterminacy on 17/08/2010 15:15:47
I see that CCP has been unable to maintain a test server which reflects their production system thereby enabling timely and effective problem solving.

I am not surprised. 'Test' servers are often given low priority. They are also difficult to maintain in a parallel computing environment.

You (CCP) have an advantage over many HPC shops. You have a single production environment (I hope). You have 1 compute node you have to mimic, one netowrk, etc.

Hopefully someone will be charged with maintaining an upgraded test infrastructure (both the hardware and human aspects).

edit:spelling


To be clear, the hardware gap we're addressing now isn't our main problem with reflecting our production environment. It's the 50,000 people playing at the same time, where of 1000 might be trying to shoot each other in the face in the same solar system. So compared to that crucial part of replicating what happens to TQ in a controlled test environment, the hardware maintenance is but one of very many factors.


Any recent progress with the "Thin clients" for automated stress testing? Even if they are not feature-rich or very "intelligent", at least you could combine them with the 300-500 real players during mass testing...Or you could just hire some China-bot-addicts to create some load until you are 100% ready for automated testing... Laughing

The thin client has it's own blog coming in the next days, more on that there.

CCP Oveur

Posted - 2010.08.17 15:56:00 - [108]
 

Originally by: Daedalus II
Originally by: CCP Oveur

I can tell you that for us to successfully fix 1k+ battles we need 1k+ battles to happen in a controlled environment. This is because that when a battle scales up, there are more systems which reach breaking points. So a 600 people battle might work well and everything is fine but at 800 people, some other systems become the bottleneck or the interaction between 3 systems with more than 1000 becomes the problem.

You see, "fixing lag" isn't one fix in some bug we spend 6 months with a team of the greatest minds in massively scalable computing trying to find.

It's more in the order of 100 various fixes.


I totally get this, but what I wonder is; if large fleets actually worked under Apochrypha, but didn't work in the next expansion, what went wrong? I guess one could expect certain degradation due to more players on the server overall, but not jumping from 1000 people being able to fight in the same system to max 200 people being able to fight in the same system. If that happens from one day to the next something else has happend, something else is wrong. You just can't say that's due to 100 little things? Why did all those 100 little things go wrong at the same time?

Well, not having played in large fleets neither during Apochrypha or later I can't say if all this is true but it's what I've heard. Maybe it was never possible to have a good 1000 man fight? But surely something must have gone for the worse or people wouldn't be so upset? You don't get upset unless you have tasted something and then lose it again, because otherwise you wouldn't have known what you could have had.

You make this all sound like these are problems that has always been in EVE, that are inherent to large server based games like this, and that when they are fixed the lag will be more or less gone. That's all fair and possible, but then why did Apochrypha work so good when the following expansion didn't? That proves that the game CAN run smooth with large fights, but SOMETHING changed that, and that's what I think people are upset about.


So here's the thing. 1000 man fights didn't always work. They often worked, they were differently laggy. So much affects load that there wasn't a single moment in time where suddenly 1000 people stopped working, it was a constant slow degradation intermixed with major performance degradation. When 50,000 people are online playing EVE, more than 40,000 people are just fine. Most of our CPUs on the tranquility cluster is far from reaching 100% cpu at all.

We do have many specific situations which are laggy, some a combined effect of many things happening that get you to that laggy point. We are now focusing on the fleet battles, a very specific situation which has gotten very bad. The fixes deployed to address that specific situation will of course benefit other laggy situations but the fact remains. EVE never had "no" lag, Apocrypha had many situations that were laggy, including fleet battles if they happened to start in a solar system that was on a loaded node.

Genya Arikaido
Posted - 2010.08.17 15:59:00 - [109]
 

Edited by: Genya Arikaido on 17/08/2010 16:02:57
Welcome back, Admir..I mean, Captain Oveur, to your first, best, destiny. The CCP forum master. You've been sorely missed. :)

Warlock: I suppose my earlier question was too...umm..detailed. Someone later asked the same question in far simpler terms and your response was that it was "possible and you were looking at ways to do it", to paraphrase a bit. Smile

I CEO a corpful of noobs (2009-2010 pilots for the most part) in 0.0, and while they get quite a bit of small fleet action and solo PvP. We don't do larger fleets to ensure we don't lag and die. I'd LOVE to send them to Singularity for mass testing, but I can't condone it as an option until the patching system mentioned in this blog is released. Setting up the client for Sisi is just too much to ask new players to do, especially those who don't understand where programs are installed and that copying/moving files they didn't create won't thermonuke their PC. Really glad to see a solution coming for that problem.

Daedalus II
Helios Research
Posted - 2010.08.17 16:08:00 - [110]
 

Originally by: CCP Oveur

So here's the thing. 1000 man fights didn't always work. They often worked, they were differently laggy. So much affects load that there wasn't a single moment in time where suddenly 1000 people stopped working, it was a constant slow degradation intermixed with major performance degradation. When 50,000 people are online playing EVE, more than 40,000 people are just fine. Most of our CPUs on the tranquility cluster is far from reaching 100% cpu at all.

We do have many specific situations which are laggy, some a combined effect of many things happening that get you to that laggy point. We are now focusing on the fleet battles, a very specific situation which has gotten very bad. The fixes deployed to address that specific situation will of course benefit other laggy situations but the fact remains. EVE never had "no" lag, Apocrypha had many situations that were laggy, including fleet battles if they happened to start in a solar system that was on a loaded node.

Ok I see, thanks, that explains a lot. So that should also mean that it should be possible even today to have large fleet fights if you happen to get all the conditions "right"?

How come then, btw, that there is a general performance degradation in EVE? Is it due to adding more "stuff"? Or just that there are more players? New corification code actually being less efficient? Fixes of old bugs that by necessity means making it tougher for the server?
In all honestly I don't think there are THAT much more players generally online today than during Apochrypha? And you have been adding new hardware, so if anything it should be working better or at least as well now than before? If all new changes and additions were perfect that is Wink

Dr Lebroi
Posted - 2010.08.17 16:39:00 - [111]
 

Great blog, liking all this stuff a lot.

I've fancied getting involved in testing but, from a laymans perspective, it all sounds a bit complicated to get the extra client ready etc. Can't the SiSi bit just download when you download normal Eve and stick an 'test server' shortcut icon on your desktop?

If you want real MASS testing it needs to be simple.

A tutorial of how it all works would be good, or a YouTube vid like the scanning one.

As to getting more people involved after you have simplified the process, can't you spam local with messages?

Lots of people get bored in Eve, if you can message local with 'Mass testing starts in 15 minutes, PLEX prizes for participants, click for details', 'Mass testing starts in 10 minutes, PLEX prizes for participants, click for details', 'Mass testing starts in 5 minutes, PLEX prizes for participants, click for details'

I've seen you do it with the 'Traffic Advisory' notices so the technology is there, if you are worried that'll kill immersion, you could RP it up a bit and say it Concord recruitment for deep space anomoly test or something.

I'm sure that would get people in, offer a PLEX lottery, 1 PLEX for every 50 participants which is distributed on a lottery basis to those remaining at the end of the test.

Taudia
Gallente
Sane Industries Inc.
Initiative Mercenaries
Posted - 2010.08.17 16:40:00 - [112]
 

Originally by: Daedalus II

How come then, btw, that there is a general performance degradation in EVE?


Not to be rude but this question has been answered by CCP plenty times already and for a programmer the answer is fairly obvious - finding out what is causing the lag is the main problem in fixing the lag. The actual altering of the code or the hardware or whatever else it could be that is the cause is in most cases a much more trivial ordeal than determining what is actually amiss in the grand scheme of the code.

Originally by: Daedalus II

Ok I see, thanks, that explains a lot. So that should also mean that it should be possible even today to have large fleet fights if you happen to get all the conditions "right"?



This is a much better question. I haven't heard of a large scale fleet fight that didn't have monstrous lag in months, so the problem (if my impression is correct) probably occurs in every fleet fight, which is a clue as to where the problem (in part, at least) may lie. No doubt CCP are already well aware of this correlation though.

Luke S
Zeta Corp.
Posted - 2010.08.17 16:51:00 - [113]
 

Originally by: CCP Oveur
Originally by: Janitor I
In summary .. more blah blah blah and no solution .. ugh


Dear alt,

You would have known we don't have a solution if you had read the previous blog. This blog series is about what and how we're doing it. Thank you for stating the obvious.

Best regards,

Nathan


troll killer XD

Indeterminacy
THORN Syndicate
BricK sQuAD.
Posted - 2010.08.17 17:26:00 - [114]
 

Originally by: CCP Warlock
Originally by: Indeterminacy
Edited by: Indeterminacy on 17/08/2010 15:15:47
I see that CCP has been unable to maintain a test server which reflects their production system thereby enabling timely and effective problem solving.

I am not surprised. 'Test' servers are often given low priority. They are also difficult to maintain in a parallel computing environment.

You (CCP) have an advantage over many HPC shops. You have a single production environment (I hope). You have 1 compute node you have to mimic, one netowrk, etc.

Hopefully someone will be charged with maintaining an upgraded test infrastructure (both the hardware and human aspects).

edit:spelling


Realistic testing of large scale, real time, distributed systems has been a perennial problem for decades. The reality has always been that the live system is larger, and more complex than any realistic test system could be. Putting an individual server under load is doable, but testing a network of servers, and putting it under full capacity is much, much harder. To give the truly extreme example, where is the test system for the entire Internet?

Things have started to get a little better this decade with the hardware price drops and the data centers. The thin clients (which you'll be hearing about later this week) are a major step forward for us, and we're all really excited about the results we're going to get out of them. Even so, we have work in progress to improve our ability to handle and set up tests with large numbers of clients, in and of itself a non-trivial problem. For example, what sort of ship fitting should we setup for any given test? Then, cycle through a bunch of them, with the same test, and compare the results across a pretty large set of collected data on a large variety of system parameters. Which in an ideal world should also be done completely automatically.

Testing these kinds of systems, especially in terms of scaling and load limits is a set of problems in and of itself.


Agreed. Are you able to virtualize multiple instances of the thin client on a piece of hardware? Or run multiple instances of the process on a single node?

Originally by: CCP Oveur

To be clear, the hardware gap we're addressing now isn't our main problem with reflecting our production environment. It's the 50,000 people playing at the same time, where of 1000 might be trying to shoot each other in the face in the same solar system. So compared to that crucial part of replicating what happens to TQ in a controlled test environment, the hardware maintenance is but one of very many factors.



Also, agreed.

You're now devoting time to a infrastructure which won't get you more customers next week. You can't write a sexah blog about it (geeks and nerds excepted) and it creates more work for those involved for that non-instant gratification.

The complicating factors you've described are felt everywhere this is done. What parameters do I give my simulation? What file system [node] do I use for I/O? And so on.

From what I've seen however, many HPC outfits also run multiple variations (hardware and software) of compute, I/O, database nodes. Hopefully you do not have this specific problem which would only further complicate and already tough problem. I was kinda fishing for an answer to that I guess Wink



But in the long run once in place, with some maintenance, it's a huge benefit.

I'm not surprised about any of this as I've been on both ends of it. That is, the win and fail of test systems in a parallel / HPC environment.

Tres Farmer
Gallente Federation Intelligence Service
Posted - 2010.08.17 17:46:00 - [115]
 

@Tanis - thanks for the details, very interesting.. this stuff should had been posted months ago - but better late then never. Keep it up!

@Warlock - I like your style. Looking really forward to your blog(s).

@Oveur - keep them trolls in place, good job Wink

Particular Question(s):

Would anyone of you be able to give some details as to what kind of 'processes' are run per core at the moment (i.e. what runs on a single core/node actually in a solar system?) and what do you think/wish/envision will be running on a core in 1-5-10 years time?

What is your take/thought on 'grids' as smallest processes? Do you think that around 1000 players per 'grid' would be some target or better not? Any game mechanic/technical thoughts about this area?

What happens to the process action? Would it be thinkable to slow down the 'ticks' of the world and slowly fading from a real-time-fleet-fight to a somewhat-turn-based-fleet-fight or a slow-motion-fleet-fight (no input missed, but input restricted/slowed down for everyone).
Or any game mechanic to mitigate those communications between all the fleet-members by reducing the 'resolution' of the sci-fi-simulation in heavy-load-cases (going from floating point to integer, reducing influence of fleet-boosters.. stuff like that)?

Anyways.. you don't have an easy job.

PS:
@Oveur I hope you're writing and thinking about some response to the questions/concerns we brought up in those 3-4 60+ page threads about CCP's view on quality vs quantity. Lag is just one of the many facets us players have irks with (check those threads for details).

Frug
Omega Wing
Snatch Victory
Posted - 2010.08.17 18:03:00 - [116]
 

Originally by: CCP Oveur

Because changing the logistics and not fixing the ability to have a large fleet engagement only means that you'll get even more frustrated when it doesn't work, after having spent triple the time getting there.


That's a good point. I don't think people are emoraging about being able to move around, only that it sounded like you were dismissing tweaking logistical difficulty.

I'll leave it to the CSM to debate the details. Thank you for your posts.

Bomberlocks
Minmatar
CTRL-Q
Posted - 2010.08.17 18:11:00 - [117]
 

Originally by: CCP Oveur
Originally by: Bomberlocks
Originally by: CCP Warlock

We don't give all systems a separate physical server because they don't need one. Were the player base to expand to the kind of numbers when they did, then the existing architecture could be scaled to that number with some modifications to internal routing. For a lot of players simultaneously jumping into a single system, the problem is that that is essentially a set of requests for each player, so the total number of messages that the server has to deal with is a multiple of the number of players. So induced computational load (the amount of processing on the server that has to be done in response to a message's arrival), and the associated queuing is one of the issues there.


I came across a thread on another forum of how to workaround the additional resource overhead of many people requesting the same session change at once. One the CSM members, Vuk Lau, I think it was, stated that fleets will eventually always grow to the maximum number of supported players, and that CCP will probably only solve the lag in a realistic fashion through game design mechanics in the end, i.e. a hard limit on how many people can participate in any given system.

From this I have two questions:
1. Would it not be easier now, even as a temporary workaround, to limit unreinforced systems to a certain proven-in-practice number of players? This would be in force until more breakthroughs had been achieved in combating lag.
2. One idea that occurred to me was the idea of a fleet jump button and information pooling. Since almost all of the big fights are fleet fights, wouldn't it be possible to implement a fleet jump button that the fleet FC could use to jump the whole fleet as once? The grouped fleet info would no longer have to be treated as x number of requests but as one "fleet group" request. I haven't explored the idea further, but I have an idea that I've personally come across this particular problem in another context before and that significant resource savings were thus possible (the server can skip the individual jump requests and go straight on to the process of loading the grid for the fleet members). Of course, I have no idea if this is even remotely possible or true in Eve. What would you think of such an idea?


Artificial hard limits have their purpose, certainly but how would you feel at the other end of the gate, can't jump into the system because it has 300 people in it and you have 3 guys shooting at you.

I actually think I know how you would feel because we have traffic control. And this happens even without the artificial hard limit.

"If there is room it will get filled up". That's totally right. That's why we'll rather continuously make more room. So to answer your questions specifically:

1. We are doing that already in various forms. It doesn't help and is messy.
2. Fleet jump could help with pooling some aspects but it's not like a 100 ship fleet would have the same load as 1 ship but it's totally worth investigating to see if there are any benefits.

Thanks for the answer, Nathan. I appreciate it. I do realise that a fleet pooling mechanism wouldn't be a linear load reduction, but I presume any reduction in load would be a Good Thing(TM), no?

Sarina Berghil
Minmatar
New Zion Judge Advocate
Yulai Federation
Posted - 2010.08.17 19:13:00 - [118]
 

Originally by: CCP Oveur

The flipside is that if they can't easily form up, they simply won't, preventing one of the fundamental experiences of EVE to not work at all, the massively multiplayer part Wink

But sure, there are areas that can be tuned to make it less easy, the clone jumping restrictions are in there for a game design reason, not a technical one.

So yeah, we can certainly change game design to affect herding but you know, we want to remain true to multiplayer and just fix the damn thing.


Some coins have more than two sides ;)

Remaining true to multiplayer design doesn't have to mean stacking 1000 players in the same spot. Funneling players into several more manageable stacks will still retain the multiplayer aspect, and can even end up being more fun.
Travel restrictions are just one approach, there are many other ways to do it, like spreading out objectives, making large blobs impractical or simply make a cap of players in a given area. Multipronged approaches tends to yield the best effect.


Game design reasons and technical reasons can blend to provide benefits in both areas.
Im sure some people here remember old cardboard strategy games, players moving little cardboard pieces around on tactical maps, rolling dice for the outcome of battles and other events.
In these games handling big stacks of pieces is impractical. A technical way to fix that issue can be to make an arbitrary rule that only 10 pieces may be in the same stack. A game design perspective to solve the same problem would make it an advantage to only stack a few at a time.
The second approach is harder to design, but feels better. But even if the second approach can't be designed in a satisfactory way, the arbitrary technical design often works better than a game that is impractical to play.

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.17 19:51:00 - [119]
 

I've been told that I've been a bit over grumpy in my reply. My only excuse is that I've been quite busy debugging my own distributed monstrosity and it has been quite frustrating lately. I'll try to be a bit less grumpy.
Originally by: CCP Warlock
Originally by: Bartholomeus Crane
Grumpy stuff
Oh absolutely. In the long run we have to be able to scale down (running multi-core single SOL systems), as well as up - running more and more machines in the cluster. Especially when we look ahead at the larger (in terms of processor density) multi-core systems that will be coming out over the next few years.
Originally by: Bartholomeus Crane
More grumpy stuff
I'm not sure quite what you mean by the "wild waters of distributed computing proper"? Eve runs as a large set of inter-communicating processes across over 200 nodes, and ~50,000 clients. That's real distributed computing, and some pretty wild waters even by today's standards.

If you have particular models or papers in mind for your view of what a proper distributed architecture would be, it might be easier to discuss what you mean there.

Well, that depends on what you call distributed computing. Is a mail-server distributed computing? A central database interacting with many clients? A single web-server being polled by many web-browsers?

Personally I don't think so if any of these applications runs on its own, and communication is merely a two-way interaction from the client to the server.

To me distributed computing requires partitioning a single task into several sub-tasks, each being 'solved' on single nodes/cores. I have no desire to discuss definitions here, as there are many for distributed computing (and it's various flavours), but mere communication between processes, or mere number of clients, is not enough to qualify. The communication requirements between processes is what counts, as is the requirement to partition the problem space. Cut the cake as it were ...

Even if the various functionalities are combined in one outlet (client), as they (and other functionalities) are for the EVE-client, doesn't make it a distributed program. Not when each process is independent from each other except in outlet, as I believe they are in the architecture used in EVE (but correct me if I'm wrong). Simply put, the fact that you farm out the market to the central shared state memory (the database) while you farm out 'spaceships in space' to a SOL node somewhere else, doesn't make the problem of running EVE a distributed problem as eventually you easily splice off the event-trace for the one from the other, and they essentially never need to talk to each other again. And this is why, in short, I think that at the moment, the EVE server is not a proper distributed program. It is just a rather large collection of functionalities that are handled at different nodes in the cluster.

Now, there is nothing inherently wrong with that, unless you look at the inherent in-scalability of handling the different functionalities on one node, and on one node alone. At one point, the computational requirements for that node will surpass what is available. For some functionalities, if implemented correctly, this point will probably not be reached in the foreseeable future. For others, ready solutions will be available (distributed databases, distributed mail-services, etc.). For the 'internet spaceship in space' function this is not the case. SOL-systems, as you call them, are specific to EVE, and are already, and have been for a while now, a bottleneck. So, yes, CCP will have to look at distributed computing proper (see above) and implement some multi-core architecture for that, as you acknowledge.

But this leads to two questions:
1. What took you so long to (publicly) acknowledge this? (as the inevitability of this should have been obvious years ago); and
2. Has work been started on this architecture, and if not, why not? (have you really been this preoccupied with fixing the old system?)

Trebor Whettam
Posted - 2010.08.17 20:07:00 - [120]
 

Thank you, this was an enjoyable read. I hope this series of Dev Blogs (and discussion) doesn't divert too much of your time, but at the same time being forced to explain issues might help to clarify them and/or see them another way. Good luck, and I'm rooting for you.

I'm curious how much overhead the fleet system introduces. I assume this is one of the first places you dug into, given that it was so thoroughly overhauled with Dominion. Also, Fleet Finder is the one game feature that is always laggy for me. Have previous mass tests investigated what happens when there are no fleets and everyone shoots randomly? Obviously not everything is possible without fleets, but I'm curious what kind overhead we get from watchlists, fleet bonuses, and other things that require cross-referencing.

I'm also curious what would happen if the server 'tick' was scaled to the load, or if that's even possible. It seems that intentionally sacrificing
the responsiveness you hope to provide might prevent the inevitable total failure.


Pages: 1 2 3 [4] 5 6

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only