open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: The Long Lag
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 2 3 4 5 [6] 7

Author Topic

Conan Piter
Ways and Means
Posted - 2010.08.22 16:10:00 - [151]
 

This is probably not the ideal place to put this idea forward, but I've been thinking about while reading this thread so here goes.

It seems to me that one of the most important issues with lag is jumping into a system and being vulnerable while everything loads. I know the devs talked about jumping to a second transient system but that may not solve the problem if the final jump is still subject to significant delays.

I think the answer may be a combination of accepting that there is a problem, working around it and changing expectations of players. (1) Currently one expects that a gate jump is fairly instant, as opposed to a warp which is slow. (2) The server assumes the client is vulnerable and any iniformation given to it is available to the player.

If the gate jump became a warp (transition effects and all) while the system is loaded this would change expectations a little. Here's an example of what might happen:

  1. Player initiates jump and the message is sent to the server. The is a high priority message that results in player invulnerability on the server, possibly while other clients can still target the player.

  2. The server signals to clients to remove the player from the grid.

  3. The player client goes to the warp transition effect

  4. The server starts the process of moving the client to the new system and grid, identifying the player to other clients as "in-transition" but still sending the arrival messages so that other clients can load necessary resources and place the ships.

  5. In-transition ships are loaded into the grid but are invisible to clients until the servers send an "out-of-transition" message. This is the area that can be subject to hacking as theoretically the client can see these invisible players. The reason I think this is important is that I suspect the problem is mostly congestion and getting all this data to the clients before it is usable is fairly important.

  6. Clients with players "in-transition" must complete all loading of system and grid data and be ready to go before sending a message to the server indicating they are "ready".

  7. The server waits until it has received a "ready" message from every client with an "in-transition" player in the grid and the server itself has completed processing for that grid. When this has completed, in theory the jump-in lag has now passed.

  8. At this stage, the server sends a message to every client in the grid indicating that the other players in that grid that were "in-transition" are now "out-of-transition" and can appear in local.

  9. Players receiving the "out-of-transition" message for themselves now end the warp transition effect, display the grid (which has already loaded) and begin the 30 second cloak timer.

I'm sure much of the above is impractical, but it is an example of what might be a sensible approach.

/continued...

Conan Piter
Ways and Means
Posted - 2010.08.22 16:12:00 - [152]
 

...continued/

Here is why I think it would be useful:

  1. Fleet jumps are synchronised so that all players effectively uncloak at about the same time because all players arriving in a grid in groups have overlapping transitions. Because the server waits for a gap between transitions this is effectively a semaphore. It is hard to make this go on forever by jumping slowly because the server will be able to work faster and may encounter a point where everyone is ready more often.

  2. Players are never "on the other side" and vulnerable because until the client is completely ready, they are safe. The down side is that the client is aware of its surroundings without presenting them to the player which goes against the MMO principle of "don't trust the client". I think it is necessary at least until the problem of lag is history.

  3. Players don't feel the system is broken because instead of an unusable interface they are watching the warp effect, knowing the rest of their fleet is in the same position.

  4. The server can prioritise module activation messages for players already loaded in the grid because in-transition players are invulnerable and are therefore a lower priority.

I doubt this example covers all of the problem of jump-in lag, but it may be a useful when thinking about a solution.

Rn Bonnet
Sniggerdly
Pandemic Legion
Posted - 2010.08.23 00:48:00 - [153]
 

Edited by: Rn Bonnet on 23/08/2010 00:51:01
Edited by: Rn Bonnet on 23/08/2010 00:50:07
Quote:
From time to time we also discuss scaling issues with game design, since that is the only place where some of these distributed scaling problems can be solved. The longer term view on fleet fight performance lag is that whilst we can and will maximize performance within any single server's area of space, we are going to have to continue to work on game design to somehow limit the number of people that can be simultaneously in that space. Fully granted, given the vast physical immensity that is actual Space, it is a little hard to make a game case that there isn't enough room for a piddling few thousand spaceships.


Quote:
No, sorry I can't. I don't like to make commitments of any kind until we are absolutely sure we can nail it to the wall, and there are a lot of uncertainties in this one - as well as the need, as earlier posters pointed out, of very carefully and thoroughly testing it out.

There is a lot of motivation to do this though, the very least of which is there are some very cool server toys, I mean advanced high performance multi-core systems it will let us play with.


These two statements seem at odds to me. A multi-core architecture would be excessively scalable. Using agent based architecture (eg. something like Hoare's CSP) you could achieve a scalability which is unmatched. This also allows you to side step pythons Global Interpreter Lock such as PyCSP's pycsp.processes does.


There is no denying that this is a long term solution, which would require a major refactoring of existing code, but it IS a solution. And one that can work. (Plus imagine being able to advertise having had a 5k person battle with no lag).


Alternatively you could just rewrite the entire thing in Go. :p

CCP Warlock

Posted - 2010.08.23 13:31:00 - [154]
 

Originally by: Rn Bonnet

These two statements seem at odds to me. A multi-core architecture would be excessively scalable. Using agent based architecture (eg. something like Hoare's CSP) you could achieve a scalability which is unmatched. This also allows you to side step pythons Global Interpreter Lock such as PyCSP's pycsp.processes does.


One of the nasty little surprises that will be creeping up on many in the industry fairly soon - now that Moore's law is about to stop rescuing us every 18 months - is that multi-core architectures are no more "excessively scalable" than multi-server architectures. On everything except the embarrassingly parallel class of tasks, communication constraints eventually kick in. They do it a little later on the multi-cores, partly because of the shared memory, and the lower communication latencies, but they're still there. The list of distributed computing projects that have found this out the hard way, has a long and very distinguished history, going back to the mainframe days. Hoare's algebra is a useful mathematical description (for those of us who like math at least), but it can't change the basic constraints from the problem itself, and it tends to hide some of the real time implications of latency, which can be both an important constraint and sometimes an opportunity in these systems.


Originally by: Rn Bonnet

There is no denying that this is a long term solution, which would require a major refactoring of existing code, but it IS a solution. And one that can work. (Plus imagine being able to advertise having had a 5k person battle with no lag).


But we do want and need to be able to run on multi-core machines, especially looking at the servers coming down the pipe. They won't be a panacea, but they will let us move the bar up considerably. I think it's fair to say that at the moment we are examining a number of ideas on how to do this, and what the fallout implications will be for the rest of the code - and I'm not saying any more than that.
Originally by: Rn Bonnet

Alternatively you could just rewrite the entire thing in Go. :p


Don't tempt me - although personally, I'd vote for Erlang Smile

CCP Warlock

Posted - 2010.08.23 13:37:00 - [155]
 

Originally by: Conan Piter
...continued/

Here is why I think it would be useful:

  1. Fleet jumps are synchronised so that all players effectively uncloak at about the same time because all players arriving in a grid in groups have overlapping transitions. Because the server waits for a gap between transitions this is effectively a semaphore. It is hard to make this go on forever by jumping slowly because the server will be able to work faster and may encounter a point where everyone is ready more often.

  2. Players are never "on the other side" and vulnerable because until the client is completely ready, they are safe. The down side is that the client is aware of its surroundings without presenting them to the player which goes against the MMO principle of "don't trust the client". I think it is necessary at least until the problem of lag is history.

  3. Players don't feel the system is broken because instead of an unusable interface they are watching the warp effect, knowing the rest of their fleet is in the same position.

  4. The server can prioritise module activation messages for players already loaded in the grid because in-transition players are invulnerable and are therefore a lower priority.

I doubt this example covers all of the problem of jump-in lag, but it may be a useful when thinking about a solution.


I'm very uneasy about coding around the problem, since I think it would just come back and bite us elsewhere. But it's fair to say that working on it is also making us aware of some infelicities in the general design of how jumps are handled.

We really can't trust the client though. If we did, we'd risk turning the problem into one where defenders are unable to protect themselves against invulnerable attackers - and I beg to suggest, that might be the only thing that could possibly be worse than the current situation.

Conan Piter
Ways and Means
Posted - 2010.08.23 14:14:00 - [156]
 

Thanks for answering.
Originally by: CCP Warlock
I'm very uneasy about coding around the problem, since I think it would just come back and bite us elsewhere. But it's fair to say that working on it is also making us aware of some infelicities in the general design of how jumps are handled.

I would be uneasy about coding around the problem too. But if you solved this problem tomorrow and the next day the playerbase doubled, would it still be solved? I think we have to accept that some sort of lag can always occur and to ensure that when it does, the system safeguards the players from the worst effects of it.

Originally by: CCP Warlock
We really can't trust the client though. If we did, we'd risk turning the problem into one where defenders are unable to protect themselves against invulnerable attackers - and I beg to suggest, that might be the only thing that could possibly be worse than the current situation.

I agree too, but from my vantage point (which is mostly guessing) the problem is partly congestion getting information to the client at the right time. This results in exactly what you are (rightly) concerned about: players already in system are invulnerable to those jumping in because the new arrivals can't do anything.

That said I'm only proposing a one-way violation. You give information to the client before it is needed and have a wait state until all clients can synchronise before proceeding. The server trusts the client not to display this information, but the client has no power to do anything that breaks the rules. More importantly, the gap between sending the grid and being able to see the grid is only wide when there is lag. In all other cases, it is fairly instant. One could even eliminate the "ready" message and treat the client as ready once the necessary has been sent so that the client can't artificially hold up the others.

I'll just whittle down the core of what I'm getting at: if you make jump in players invulnerable (and disabled) until all concurrent jump-ins on that same grid are ready, in situations where there is lag at least you won't have fleets dying before they can load the grid.

I'm not going to try and convince you my suggestion is right or you must implement it -- it may not even be feasible. I just want to make sure it is clear and in the pursuit of a perfect solution you don't ignore imperfect improvements.

Haral Heisto
Posted - 2010.08.23 16:28:00 - [157]
 

Originally by: Conan Piter
That said I'm only proposing a one-way violation. You give information to the client before it is needed and have a wait state until all clients can synchronise before proceeding. The server trusts the client not to display this information, but the client has no power to do anything that breaks the rules.


Even this trivial violation of the "don't trust the client" paradigm is easily exploited. Part of the skill involved in FCing these big fleets is effective target calling during the opening seconds of a fight. Your suggestion would give the fleet that jumps in (the agressor fleet) information about the defensive fleet before grid loads. Without giving too much away, it would be trivial to pull this information during the syncing time, allowing the FC to provide an initial target list before the fight begins.

Conan Piter
Ways and Means
Posted - 2010.08.23 16:47:00 - [158]
 

Originally by: Haral Heisto
Even this trivial violation of the "don't trust the client" paradigm is easily exploited. Part of the skill involved in FCing these big fleets is effective target calling during the opening seconds of a fight. Your suggestion would give the fleet that jumps in (the agressor fleet) information about the defensive fleet before grid loads. Without giving too much away, it would be trivial to pull this information during the syncing time, allowing the FC to provide an initial target list before the fight begins.

You are absolutely right. However calling targets is usually done during the 30 second cloak time, right? That would be exactly the same except when there is lag this could potentially be 5 or so minutes instead. The alternative is to spend this time vulnerable on the grid. Remember that if there is no lag, this transition period is instant. Fleets jumping in will not know if they will be subject to lag, so one cannot readily plan ahead to take advantage of this loophole.

Ideally the lag would be eliminated and this loophole will never be exposed. CCP seem pretty confident they'll eliminate lag, making this nothing but a fall-back measure. If there is a little lag the transition periods will be fast and the semaphore will wave several times during a fleet jump, negating this window.

From the perspective of the aggressor fleet, surely it is safer to send a scout in rather than risk being banned for an exploit?

Vaerah Vahrokha
Minmatar
Vahrokh Consulting
Posted - 2010.08.23 20:06:00 - [159]
 

Quote:

One of the nasty little surprises that will be creeping up on many in the industry fairly soon - now that Moore's law is about to stop rescuing us every 18 months - is that multi-core architectures are no more "excessively scalable" than multi-server architectures. On everything except the embarrassingly parallel class of tasks, communication constraints eventually kick in. They do it a little later on the multi-cores, partly because of the shared memory, and the lower communication latencies, but they're still there. The list of distributed computing projects that have found this out the hard way, has a long and very distinguished history, going back to the mainframe days. Hoare's algebra is a useful mathematical



So, since you CCP are at the forefront of technology, what about checking out Quantum Computers? Premature, but with lots of promises... One of their properties is to easily parallelize highly expensive computations including some nasties like those "O(y^x)" algorythms.

Mashie Saldana
Minmatar
Veto Corp
Posted - 2010.08.23 20:26:00 - [160]
 

Originally by: CCP Warlock
One of the nasty little surprises that will be creeping up on many in the industry fairly soon - now that Moore's law is about to stop rescuing us every 18 months - is that multi-core architectures are no more "excessively scalable" than multi-server architectures. On everything except the embarrassingly parallel class of tasks, communication constraints eventually kick in. They do it a little later on the multi-cores, partly because of the shared memory, and the lower communication latencies, but they're still there.

So does this mean the servers you are aiming for are like the AMD ones with 4 x 12 core CPUs to reduce the latency as much as possible once you go mutli-core?

PirceHat
Posted - 2010.08.23 20:50:00 - [161]
 

Edited by: PirceHat on 23/08/2010 20:50:21

Rn Bonnet
Sniggerdly
Pandemic Legion
Posted - 2010.08.23 20:53:00 - [162]
 

Quote:
On everything except the embarrassingly parallel class of tasks, communication constraints eventually kick in.

I must respectfully disagree, many of the major advances in hardware over the next few years (which we already have prototypes for) are working specifically around the traditional Von Neumann bottleneck. In particular consider Intel's Silicon Photonics technology which has the potential to remove traditional issues with on-board locality and distance. A forward looking observer will note that previous failures all resulted because of the aforementioned bottleneck. However critical technologies like Silicon Photonics could easily bypass this problem, and result in truly massively scalable parallel processing. Especially as memory densities increase and access times continue to fall.

To be more specific though even disregarding technology currently on the horizon, stuff like the 48 core AMD boxes available today could easily do a 24x speedup for Eve if the core was switched to an agent based simulation even with the overhead incurred from communication bottlenecks.

To top it off while Eve is not Embarrassingly parallel it can be fairly close. Especially if the right splitting of information and object states are made.

Liang Nuren
Posted - 2010.08.23 21:04:00 - [163]
 

Originally by: CCP Warlock

One of the nasty little surprises that will be creeping up on many in the industry fairly soon - now that Moore's law is about to stop rescuing us every 18 months - is that multi-core architectures are no more "excessively scalable" than multi-server architectures. On everything except the embarrassingly parallel class of tasks, communication constraints eventually kick in. They do it a little later on the multi-cores, partly because of the shared memory, and the lower communication latencies, but they're still there. The list of distributed computing projects that have found this out the hard way, has a long and very distinguished history, going back to the mainframe days. Hoare's algebra is a useful mathematical description (for those of us who like math at least), but it can't change the basic constraints from the problem itself, and it tends to hide some of the real time implications of latency, which can be both an important constraint and sometimes an opportunity in these systems.



Very very well said. :)

-Liang

Whatever Dood
Posted - 2010.08.23 21:30:00 - [164]
 

Originally by: Liang Nuren
Originally by: CCP Warlock

One of the nasty little surprises that will be creeping up on many in the industry fairly soon - now that Moore's law is about to stop rescuing us every 18 months - is that multi-core architectures are no more "excessively scalable" than multi-server architectures. On everything except the embarrassingly parallel class of tasks, communication constraints eventually kick in. They do it a little later on the multi-cores, partly because of the shared memory, and the lower communication latencies, but they're still there. The list of distributed computing projects that have found this out the hard way, has a long and very distinguished history, going back to the mainframe days. Hoare's algebra is a useful mathematical description (for those of us who like math at least), but it can't change the basic constraints from the problem itself, and it tends to hide some of the real time implications of latency, which can be both an important constraint and sometimes an opportunity in these systems.



Very very well said. :)

-Liang


First post, bear with me if I screw up here.

This post by Warlock is a red-herring. (Sorry.) It's irrelevant to our problem space. CCP is currently using dual-core servers, so they're limited in the amount of multi-core processing they can exploit, but that's just what they've got now - the typical server architecture nowadays is quad-core or more. If their "location" LBU (software) architecture took full advantage of multi-core hardware we'd potentially see a near-linear performance improvement, ie, running on 4 cores == nearly 4x faster. (We wouldn't, but it wouldn't be due to "communication constraints", it'd be because no real-world solution takes "full advantage" of concurrent hardware - decomposing a linear processing block into concurrently schedulable blocks is problematic. Plus a lot of esoteric hardware-dependent issues. 3x is realistic.)

Basically the statement that "they [communication constraints] kick in a little later" is true only if you define "a lttle later" as "orders of magnitude later". Which I'd submit is disingenuous.

I look forward to your posts, Jackie, and yours as well, Liang, but in this case it sounds like you're discounting the advantages of multicore architecture a little too emphatically.

This isn't a main point I'd like to discuss, just an observation on the first post Eve-O forums let me reply to - this character is new, specifically for forum discussion.

Trebor Daehdoow
Gallente
Sane Industries Inc.
Posted - 2010.08.23 21:44:00 - [165]
 

Originally by: Haral Heisto
Even this trivial violation of the "don't trust the client" paradigm is easily exploited. Part of the skill involved in FCing these big fleets is effective target calling during the opening seconds of a fight. Your suggestion would give the fleet that jumps in (the agressor fleet) information about the defensive fleet before grid loads. Without giving too much away, it would be trivial to pull this information during the syncing time, allowing the FC to provide an initial target list before the fight begins.

Although there may be practical reasons why this would be difficult, you can avoid such situations by sending the data in encrypted form, then sending a packet with the decryption key that unlocks the data at the right time. While you may be able to glean some information from traffic analysis, it probably wouldn't be worth the effort.

Rn Bonnet
Sniggerdly
Pandemic Legion
Posted - 2010.08.24 11:00:00 - [166]
 

Doesn't work like that, somewhere along the line the client will have to decrypt it to make use of it, at which point the key, method, and actions are vulnerable to exploit.

zz01shagsme
Posted - 2010.08.24 11:22:00 - [167]
 

I just wanted to sign this post and thank the devs i have been more than vocal on your support and its finally nice to see something.

I will say that I still enjoy eve, and although my last big engagement was actually my BIGGEST and LONGEST engagements ever and the lag was bareable I will never forgot the enjoyment of it. The fight occured in EWOK when TCF had respect amongst some allies and local stood at 1000(ish) the fight lasted almost 5 hours from what I remember but it truely was awesome.

/me raises a glass and saluts good times returning hopefully soon to big fights and to the devs for finally communicating, Its good to pop out from behind your screen every so often!

cyllan anassan
Amarr
DEUS EX 1
Posted - 2010.08.24 16:56:00 - [168]
 

imagine this was wow and there where 1-2 million players playing at the same time, the lag issue will become worse and worse as the game grows in size, maybe just maybe the solution would be to split the servers somehow, keeping the chat channels and separating it in geographical zones of the galaxy..... maybe i dont know
.... it is a bit like the single core versus multicore processors, this would require a total rewrite of parts of eve and a lot more info going back and forth from server to server but would improve lag and it could be solved by adding more servers

Tres Farmer
Gallente Federation Intelligence Service
Posted - 2010.08.24 17:25:00 - [169]
 

Originally by: cyllan anassan
imagine this was wow and there where 1-2 million players playing at the same time, the lag issue will become worse and worse as the game grows in size, maybe just maybe the solution would be to split the servers somehow, keeping the chat channels and separating it in geographical zones of the galaxy..... maybe i dont know
.... it is a bit like the single core versus multicore processors, this would require a total rewrite of parts of eve and a lot more info going back and forth from server to server but would improve lag and it could be solved by adding more servers

If this where a reading comprehension test and we would get grades, yours would be a F(minus).
Please read some blogs about EVEs server structure (they're all in the archive).
EVE hasn't got the kind of problem you think it has.

Kelban Kevar
Gallente
Evocations of Shadow
Eternal Evocations
Posted - 2010.08.24 21:38:00 - [170]
 

the probalem is ya didnt know wtf you was doing when ya first started..and you just kept screwing up and messing with things.cuase there are now a few games out that play off 1 severe cluster in real time much like eve does.and guess what they have more ppl online than eve even thinks about haveing and they have no lag

Liang Nuren
Posted - 2010.08.24 22:18:00 - [171]
 

Edited by: Liang Nuren on 24/08/2010 22:19:12
Originally by: Whatever Dood

This post by Warlock is a red-herring. (Sorry.) It's irrelevant to our problem space. CCP is currently using dual-core servers, so they're limited in the amount of multi-core processing they can exploit, but that's just what they've got now - the typical server architecture nowadays is quad-core or more. If their "location" LBU (software) architecture took full advantage of multi-core hardware we'd potentially see a near-linear performance improvement, ie, running on 4 cores == nearly 4x faster. (We wouldn't, but it wouldn't be due to "communication constraints", it'd be because no real-world solution takes "full advantage" of concurrent hardware - decomposing a linear processing block into concurrently schedulable blocks is problematic. Plus a lot of esoteric hardware-dependent issues. 3x is realistic.)



I think that you aren't really contradicting anything that Warlock said. Warlock was making a statement about parallel processing as a whole, not something specific to the Eve cluster. As someone who deals with parallel processing terabytes of data across clouds of multi core machines, I have to say that there are communication bottlenecks in both situations. A lot of the time, these bottlenecks are "solved" by virtue of relying on Moore's Law even though there's frequently things that can be done either algorithmically or via optimization to lower the amount of required communication.

It's funny, but I've found that scaling over cores and scaling over servers both have advantages. Scaling over cores is great when you have a common block of data that needs to be examined by multiple processes/threads. Scaling over servers is important when you want to maintain separate disk/memory resources (especially cache). Also, scaling over servers is frequently much less expensive than scaling over cores. Of course, you also have to consider the cost of getting your data into position in the first place.

Quote:
This isn't a main point I'd like to discuss, just an observation on the first post Eve-O forums let me reply to - this character is new, specifically for forum discussion.


Smart man - http://liang.evepress.com/2010/07/failure-1.html

-Liang

Whatever Dood
Posted - 2010.08.24 23:49:00 - [172]
 

Originally by: Liang Nuren
I think that you aren't really contradicting anything that Warlock said.

heh. You snipped it, Liang, the disagreement was in my next sentence, ie,
Originally by: Whatever Dood
Basically the statement that "they [communication constraints] kick in a little later" is true only if you define "a lttle later" as "orders of magnitude later". Which I'd submit is disingenuous.

If we change the wording to "much, much later" I'm cool with it. The implication otherwise is that communication constraints apply somewhat equally to tightly and loosely coupled systems. While in a very abstract, theoretical sense that's true, it's misleading. The time differences are so enormous that it results in a difference in kind, not just degree, of the types of problems we can handle in the two environments. I think we're agreeing on that.

Man this forum window is hard to type in.

Liang Nuren
Posted - 2010.08.25 07:13:00 - [173]
 

Originally by: Whatever Dood

If we change the wording to "much, much later" I'm cool with it. The implication otherwise is that communication constraints apply somewhat equally to tightly and loosely coupled systems. While in a very abstract, theoretical sense that's true, it's misleading. The time differences are so enormous that it results in a difference in kind, not just degree, of the types of problems we can handle in the two environments. I think we're agreeing on that.

Man this forum window is hard to type in.


While I see where you're coming from, I think you're missing out on scaling a bit. The situation that CCP is looking at can be described as an N-Body problem, which essentially means that the amount of communication between the nodes is measured by N^2. For large values of N, even the very impressive linear differences you're talking about will quickly disappear. You simply must focus on better algorithms and requirements instead of hoping that Moore's Law will come to the rescue again. That's what I think Warlock was getting at.

Also, you're lucky - I deleted several3000 character walls of text about parallel processing. I really enjoy enormous data sets, and I'm prone to ramble at length about them given an opportunity. Laughing

-Liang

Trebor Daehdoow
Gallente
Sane Industries Inc.
Posted - 2010.08.25 11:32:00 - [174]
 

Originally by: Rn Bonnet
Doesn't work like that, somewhere along the line the client will have to decrypt it to make use of it, at which point the key, method, and actions are vulnerable to exploit.

I am sorry, but you are misunderstanding the method.

Let us assume that there is some information I have now that you need next Tuesday. The way things are currently done in EVE, I would just email you the information on Tuesday.

The alternate method is to encrypt the information today using a single-use random key (which will never be re-used) and send it to you now. You have it, but it is useless to you until I send you the key, which I do on Tuesday.

I am in effect using you as a secure cache for the information. You know that I have sent you something, and may be able to guess some things about it (that's the traffic analysis), but until you get the key, you can't use or exploit it.

The tricky part is figuring out when such methods are worth the extra effort. But if I happen to know that the tubes of the Internet will be particularly clogged on Tuesdays, then it can be a win.

Sian Tiger
Posted - 2010.08.25 11:35:00 - [175]
 

Just wanted to let CCP know a few things.
1. A short while ago we were in a 0.0 system that had a POS being attacked. With only a few sub-capital ships flying around the POS everything was fine. As soon as 2-3 enemy Capital ships cyno in the game became very laggy. Only 2-3 Capitals!
Now this happened every time we defended the POS. A few caps jump in and the game goes laggy. This was NOT some giant fleet jumping in via a gate. Now there are only 3 ways I know you can get into a system. Jump in via gate, cyno or be in there when you log in. I am therefore suggesting that the problem lies with the Capital Ships ONLY. I do not believe the issue has anything to do with other ships.
2. Large fleet fights between sub-caps rarely produce laggy systems. Bring in the Caps and lagfest begins.

Now you might say that Lag can be defined into certain categories (gate lag, fleet lag etc) and maybe it is idk. I think a lot of the lag issues are due partly to the same cause.

This whole topic has been great for the thinking person and while the Devs are looking INside the problem delving deep into the technical aspects us pilots can see from the OUTside from a high level perspective.

The answer lies with the Capital Ships.

Might be very nai(eve) but I will put 1 isk on this Smile

Whatever Dood
Posted - 2010.08.25 15:50:00 - [176]
 

Originally by: Liang Nuren

While I see where you're coming from, I think you're missing out on scaling a bit. The situation that CCP is looking at can be described as an N-Body problem, which essentially means that the amount of communication between the nodes is measured by N^2. For large values of N, even the very impressive linear differences you're talking about will quickly disappear. You simply must focus on better algorithms and requirements instead of hoping that Moore's Law will come to the rescue again. That's what I think Warlock was getting at.

Also, you're lucky - I deleted several3000 character walls of text about parallel processing. I really enjoy enormous data sets, and I'm prone to ramble at length about them given an opportunity. Laughing

-Liang


But, Liang, I got all that the first time. Ie, when I said "it's irrelevant to our problem space" then went on to throw out example values for # of cores, this is what I was referring to. When you're talking scaling, what you're concerned with is the number of cores - which we've got a real-life limit on. So "N" doesn't get very large.** And it has to get VERY large or those impressive linear differences stay impressive. That's where we are today. Which is what I said in my original post. As long as we're talking about core counts of two, or eight, or even 32, pff. O(n) is irrelevant, secondary effects dominate. (I could throw out a wall of text on this too.)

I was going to say something about Moore's Law, to the effect that it died a long time ago. It's not sick, it's pining for the fjords, it's an ex-Law. The same heat issues are also constraining local core count nowadays.

I don't think anyone cares if we babble on. As long as we keep it in one place, neh, we're just chatting. I've a different background, I do pretty much this, ie, I turn serial game architectures concurrent, with an earlier background in realtime operating system design.


** - there's a chance you or Jackie are concerned with number of logical communicating entities, ie, actors/objects/whatever, but if you are, you have to limit them to the max currently active/executing - which of course, is the same as # of cores. Or, if you're talking about execution granularity, that's a separate issue with a specific solution.

Liang Nuren
Posted - 2010.08.25 16:38:00 - [177]
 

Edited by: Liang Nuren on 25/08/2010 16:40:58
Originally by: Whatever Dood

But, Liang, I got all that the first time. Ie, when I said "it's irrelevant to our problem space" then went on to throw out example values for # of cores, this is what I was referring to. When you're talking scaling, what you're concerned with is the number of cores - which we've got a real-life limit on. So "N" doesn't get very large.** And it has to get VERY large or those impressive linear differences stay impressive. That's where we are today. Which is what I said in my original post. As long as we're talking about core counts of two, or eight, or even 32, pff. O(n) is irrelevant, secondary effects dominate. (I could throw out a wall of text on this too.)

I was going to say something about Moore's Law, to the effect that it died a long time ago. It's not sick, it's pining for the fjords, it's an ex-Law. The same heat issues are also constraining local core count nowadays.

I don't think anyone cares if we babble on. As long as we keep it in one place, neh, we're just chatting. I've a different background, I do pretty much this, ie, I turn serial game architectures concurrent, with an earlier background in realtime operating system design.


** - there's a chance you or Jackie are concerned with number of logical communicating entities, ie, actors/objects/whatever, but if you are, you have to limit them to the max currently active/executing - which of course, is the same as # of cores. Or, if you're talking about execution granularity, that's a separate issue with a specific solution.


N refers to the number of actors in the system, and the number of cores available only limits the amount of concurrent execution instead of the amount of required communication. As N gets larger, linear differences vanish.

-Liang

Ed: My particular variant of parallel processing includes processing and creating "enormous" (but not Google enormous!) datasets. I've found that spindle count and disk cache is at least as important as the number of cores in your cloud.

CCP Warlock

Posted - 2010.08.25 17:04:00 - [178]
 

I think this is one of these areas where everybody is correct, in the constraint space they're talking about. Liang's quite correct about the combinatorial problem, and Whatever Dood is correct that below a certain number of cores they don't apply, because it's not actually possible for that number of cores to overload its communication bandwidth. Which was one of the points I was trying to get over in the presentation. (I should also apologise for some sloppy wording, since I tend to use communication in the computational Information(Shannon) sense).

We are not in any way ignoring the multi-core possibilities btw., so please don't read that into anything here. I will confess to being a little tired of people telling me they're a total solution to all scaling problems though. They get you more CPU, and then outside some of the simple mathematical models, a lot of work designing the application to take the best advantage of it. Unfortunately, a thread is not some imaginary friend that will solve all performance problems - when you really need Mr. Thread he's just never there for you.

As Liang says, O(N(N-1)) can get out of control very quickly, and the multi-core hardware folks know this, regardless of what they end up telling their sales engineers. The Gupta/Scaglione (Gupta presented the original result, but Scaglione's proof is much more elegant), is very important in what it tells us about scaling limitations in these systems. What we've lived through in the last 10 years is massive CPU improvements, and incredible increases in network bandwidth and speed. These have pushed the bar up considerably on the size of these systems, but the constraints are still quite fundamentally the same as they were 30 years ago, certainly with larger numbers, but we also have much larger applications now too.


Whatever Dood
Posted - 2010.08.25 17:35:00 - [179]
 

Edited by: Whatever Dood on 25/08/2010 17:35:21
Edit: darn it, always re-check thread for new posts before hitting "send".

Originally by: Liang Nuren
N refers to the number of actors in the system, and the number of cores available only limits the amount of concurrent execution instead of the amount of required communication. As N gets larger, linear differences vanish.


Wait. That's a totally different "communication".

Okay, we've got a misunderstanding. We're switching back and forth between two different types of "communication", probably because the original question she was answering isn't that clear. (certainly not to me, anyway.)

First of all, there's cross-core or cross-node communication overhead. We're not talking messages here. We're talking overhead added due to critical sections, atomic operations, cache flushing for shared memory coherence, that sort of thing. This overhead goes up on a per-operation basis as a function of core count roughly on an n-squared basis. (little bit of wiggle room on "n-squared" there, but okay first approximation.)

Then, there's agent-agent messaging. This is the other n-squared communication load. Again, you can niggle the n-squared thing, but that's beside the point. The point is, these two "communication" loads aren't the same thing. For a given program, the agent messaging load is constant. ie, Assuming we're converting a serial process into a concurrent process, the agent messaging load is a constant in O-space, ie, the growth function is still N-squared and "N" doesn't change between the serial and concurrent implementation.

I think we took the phrase - "...communication constraints eventually kick in. They do it a little later on the multi-cores,..." to mean different things. And now I'm not completely sure what the intent was. I assumed "communication constraints" meant concurrency overhead, ala cross core/node communication. If it means overall system messaging load, and "eventually" means "as # of agents goes up", well of course it'll kick in - but it'll do so regardless of whether you're running concurrently or not. Hmm

Whatever Dood
Posted - 2010.08.25 18:09:00 - [180]
 

Originally by: CCP Warlock
I will confess to being a little tired of people telling me they're a total solution to all scaling problems though. a lot of work designing the application to take the best advantage of it.


Don't worry about that, again, we're just chatting. I don't actually believe we have the right to give input to the development process as outsiders. I don't get that sense of entitlement from Liang either.

Originally by: CCP Warlock
The Gupta/Scaglione (Gupta presented the original result, but Scaglione's proof is much more elegant), is very important in what it tells us about scaling limitations in these systems.


I'm sorry, but you can't talk like this if you're trying to communicate. (I get this same thing from Bartholomeus, btw.) I don't know what "the Gupta/Scaglione" refers to, and a quick google doesn't clear it up for me. And yet, I suspect it's a principle that's familiar to me. The last time Barto did this, it took four pages of posts to clear up that his obscure reference was to a principle I not only already knew but had extensive experience working with/around. caveat: I don't actually feel you have an obligation to divulge as much info as you guys are doing. But still, if we're chatting.

Originally by: CCP Warlock
but we also have much larger applications now too.


Of course that works in the opposite direction generally, ie, lower statistical chance of contention, lower percentage concurrency overhead. (Just boatloads more work to decompose.)


Pages: 1 2 3 4 5 [6] 7

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only