open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: Fixing Lag: Character Nodes
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 2 3 [4] 5

Author Topic

Jim Luc
Caldari
Rule of Five
Vera Cruz Alliance
Posted - 2010.08.23 07:14:00 - [91]
 

Originally by: SFX Bladerunner
Originally by: Akita T
So, when can we expect to see problematic "location nodes" getting moved to a different CPU core in real-time (with a brief interruption in service before that happens, but without a server reboot) ?!?


Do I really have to say it?...

It starts with an S, and ends with a ™


Shortly™? Razz

Franga
NQX Innovations
Posted - 2010.08.23 09:35:00 - [92]
 

Good read. I appreciate the effort to explain it for those of us that are 'technically challenged'.

Vaerah Vahrokha
Minmatar
Vahrokh Consulting
Posted - 2010.08.23 10:49:00 - [93]
 

Edited by: Vaerah Vahrokha on 23/08/2010 10:54:18
Quote:

Our "Fixing Lag" series continues with CCP Atlas' blog on character nodes, which you can read here.



I read the blog, envisioned the whole Python and C++ architecture for some seconds, and it caused me to have a techgasm.


Quote:

Market hubs like Jita have the potential for load balancing stations separately of the solar system and other stations. That is something we are currently investigating as a possible 'end-all' fix to Jita. There is a fair bit of game design involved and I'm not making any promises however. :-)

Interesting tidbit about Guido-and-the-GIL. I need to google it.



I am not sure this is what you need, I am quite sure the 80%-20% rule also applies to Jita / other hubs, where you could indeed partition by station but it'd be useless because one station is really the one center of activity.

What about going all out and quasi fractally (since you already did this in a macro way) implement a sub-station layered level of partitioning?

- Every NNN players a new core is allocated

- The outer (and inner) calls to such core are re-routed to a dispatcher that makes both sides "think" they are dealing with one core like now (read: legacy code is somewhat preserved) while in reality the virtual sub-core is being managed by a freshly allocated resource elsewhere.

This would bring in massively insane scalability.

The scheduler would also build a moving average of the last amounts of virtual core splits that could be reported to you, so you know how to optimize its granularity and can plan hardware purchase in advance.


Edit: of course such virtual core partition manager would keep statistics about the lists of users associated to each core and would periodically coalesce emptying lists (due to players warping off or logging off) into other lists and basically perform its own flavour of garbage collection.

corebloodbrothers
Posted - 2010.08.23 10:54:00 - [94]
 

Edited by: corebloodbrothers on 23/08/2010 10:54:46
awsome reading, even if it isnt the holy grail of lag it shows us, players or ccp customers, that ccp does care and is putting effort in. that alone gives confidence for future.

i am curious whats more to come. as i am often in the fleet fights with lag, i wouldt mind missing things during those events. Meaning: give players, or enforce above a threshold the closing of certain player performed services.

maybe there are tasks in that solar system still on the same node and not vital too the fleet fight,
* like checking your wallet
* or updating market orders.
* Or close and remove NPC stuff and wormholes if that helps.
* Clean out tasks too the actions used for the fleet fight,
* hell close chatwindows and local chat if it helps.
* turn ships into squares if it helps
* turn brackets off by default above a certian threshold
* reset graphics to low if it matters (probably player comp affected only though)
* turn of messages abotu damage and misses default when node is full
* kill relogging (debatable)
* or move them next door (very debatable ;))
* no reforming fleets (is used as lag tactic these days)

the point is, people in a fleet fight occupying a (in general) 0.0 system will accept all sort of trade offs and things they miss if it helps clearing out lag for that task they are there.

u can put it under a magical no lag button, or (my favourite ) enforce it, so people dont use lag as a war method by making it havier (examples are peopel reformign fleets in a lag system or spamm local)

the list is endless, what u think ccp ?

Yeay Fritg
Caldari
Confrerie de Kaedri
Cluster Of Rebirth
Posted - 2010.08.23 11:36:00 - [95]
 

Hello,

All sounds logical except... do you remember having accessed your wallet in a fleet battle...while jumping by the Titan Bridge to Jita ?

Or may be was it when the FC told every Capital ship to jump and to check their market orders ?

Never tested this way to battle in eve, was just using weapons fitted on ships.

Still not convicted...

Yeay

FAIL Communicator
Posted - 2010.08.23 12:49:00 - [96]
 

Edited by: FAIL Communicator on 23/08/2010 12:53:31
so u never seen local spamm during a 500 man fleet fight by one side to crash the node ? so the tcu or blockade unit is either saved or timers expire ? well i have.

remaking fleets or putting in new fleet commanders and moving stuff around is also used, same as spamming one person with convoys, especially if he s primary. Why do u think al those premade pictures with symbols are posted in local ?

u should consider the worst of each players behaviour if u battle lag, especially when u consider EVE is about that possible dark side

but the mentioned points are just examples of the sugestion to not only move actions, not fleet fight related, used to other nodes, but to clear as much non essential funcionality to reduce lag caused by massive fleets. Its the point i like to stress that players would gladly accept that as a trade off. Looking at node load in a more wide view would create other reduction in order to keep lag low.


BIZZAROSTORMY
Posted - 2010.08.23 12:56:00 - [97]
 

so.. the Proxy server asumes we're all logging in from Imacs? Surprised

Syekuda
Hell's Revenge
Posted - 2010.08.23 13:55:00 - [98]
 

To CCP: To do my part please take Faction Warfare in consideration when you want people to report you fleet fights in your form. Currently, it seems like you (or someone in CCP) don't care at all about fleet fights in faction warfare.

In case people or even CCP don't know (wouldn't be surprised at all) fights happens in low-sec. You've seen the data in the last economic newsletter. Less than 15% of people in EVE-Online lives in low-sec. So I can assume that lots of system is in 1 node ...or not alot. So you can assume yourself that when a 200 man total fleet (both parties) jump in a system it can get laggy like hell

Again, please ACCEPT faction warfare request fleet fight. The last 2 attempts were denied and not accepted and we suffered from it.

Keiko Kobayashi
Amarr
Celestial Janissaries
Curatores Veritatis Alliance
Posted - 2010.08.23 16:59:00 - [99]
 

Re. the fleet fight notification form, I always figured that the Dominion changes allowed you to better predict large scale fleet fights, because you can take data of deployed SBUs and their timers as a reasonably accurate indicator of when and where large scale battles will occur?

Isn’t that the case, and doesn’t that make filling out fleet fight notification forms largely unnecessary?

HeliosGal
Caldari
Posted - 2010.08.24 11:59:00 - [100]
 

so to push ccp further have unintended fleet fights across multiple quite nodes and encourage a spread of the player base got it. Now ccp for the big one if someone is afk in station for an hour lets log em out that will help reduce lag

Steini OFSI
Gallente
The Collective
Against ALL Authorities
Posted - 2010.08.24 13:18:00 - [101]
 

can't you split up gunfire, movement and ship health in a similar architechture?


CPU1, Moving: handles by 1 cpu and collisions, hands out distance, transversal and sig-radius
CPU2, Shooting: Calculates guns, ammo, skills, dmg modifiers, tracking

CPU3, a combiner that takes only the most nessecary numbers and calculates dmg

I'm just curious.

CyberGh0st
Minmatar
Blue Republic
RvB - BLUE Republic
Posted - 2010.08.24 14:21:00 - [102]
 

Edited by: CyberGh0st on 24/08/2010 14:41:58

Originally by: Caladain Barton
Edited by: Caladain Barton on 21/08/2010 18:58:55
CCP, we were having 1000 man battles on un-reinforced nodes without them crashing. you could squeeze 2k into a reinforced node back before dominion.

Just so you know how high the bar *was* at. Are you planning on restoring the game to that level, or to the 500 man battle on regular node, and 1000 on the special nodes?


Perhaps you missed it, but CCP Atlas said this change offloads work from the location node, and thus will improve it's performance, this will help jump ins and other lag problems, but this is not the bugfix that fixes the Dominion Bug.

However they think they are close to actually fixing what went wrong.

So here is some speculation :

With these changes and once they finally fix the Dominion Bug, a reinforced node should be able to handle 2000 +20% ( possible increase in Jita population ) is 2400 player battles :p

Whatever Dood
Posted - 2010.08.24 18:49:00 - [103]
 

Originally by: Steini OFSI
can't you split up gunfire, movement and ship health in a similar architechture?


CPU1, Moving: handles by 1 cpu and collisions, hands out distance, transversal and sig-radius
CPU2, Shooting: Calculates guns, ammo, skills, dmg modifiers, tracking

CPU3, a combiner that takes only the most nessecary numbers and calculates dmg

I'm just curious.


I'm going to answer this one, and add some questions of my own.

First of all, what Atlas is talking about here is moving coarse-grain calculations like fleet bookkeeping out of the "location" (or fleet-fight) process and into their own process, or load-balancing unit (LBU). These LBU's can be scheduled on separate nodes.

The fleet bookkeeping LBU doesn't have to communicate with the location/fleet-fight LBU very often, because fleet composition doesn't change on a tick by tick basis. Therefore, the overhead for talking across nodes isn't prohibitive. (Actually, this is also why performing fleet bookkeeping actions during a battle is a good lag "cheat", because it does incur more load than would otherwise be the case due to the cross-node communication.)

Splitting off processing at the granularity you're talking about is fine-grain, ie, it'll involve communication at the per-tick level. Therefore, it's not appropriate to do so "in a similar architecture", as you say, ie, by splitting out into separate LBU's. <shrug> It probably would be appropriate at the node level, however. (or something like it, the actual bits we break out depend on how we wrote the original code.)

In other words -

Each server is a "node" with two cores.
Talking across cores is very cheap.
Talking across nodes is expensive.
The LBU architecture is for cross-node deployment.

What you're suggesting is more appropriate for cross-core deployment, and involves recoding the location/fleet-fight process into a multi-core architecture. (But they said that's on the table too.) hth

My question relates to "tick by tick basis" - what's the "tick" or "frame" time for the location/fleet-fight process? It can't be one second, can it?

LTcyberT1000
Caldari
Free Space Tech
Goonswarm Federation
Posted - 2010.08.24 18:54:00 - [104]
 

Edited by: LTcyberT1000 on 24/08/2010 19:09:10
Edited by: LTcyberT1000 on 24/08/2010 18:54:33
Well, it seems the way cluster works now and the way to distribute load is going to be in same way as distributing kernel-level calls over 1 big cluster machine. That would result as 1000+ CPU Cores SMP + distributed memory over LAN...

The way i see, probably it would be best to have same approach as Linux Single System Image project (http://openssi.org/cgi-bin/view?page=openssi.html) has. In result, CCP would end up as 1 super computer sharing low level distributed CPU/tasks load over entire cluster instead of traditional single point of failure in load distribution node with limited nodes to share. Smile


P.S> I know, there is 1 chance of million about this for reaching idea up to Devs ears but still... :D

Whatever Dood
Posted - 2010.08.24 19:09:00 - [105]
 

Originally by: LTcyberT1000
Edited by: LTcyberT1000 on 24/08/2010 18:54:33
Well, it seems the way cluster works now and the way to distribute load is going to be in same way as distributing kernel-level calls over 1 big cluster machine. That would result as 1000+ CPU Cores SMP + distributed memory over LAN...

The way i see, probably it would be best to have same approach as Linux Single System Image project (http://openssi.org/cgi-bin/view?page=openssi.html) has. In result, CCP would end up as 1 super computer sharing low level distributed CPU/tasks load over entire cluster instead of traditional single point of failure in load distribution node with limited nodes to share. Smile



I'm confused. You mention SMP, symmetrical multiprocessing, architecture - which usually refers to multiple CPU's connected to the same local memory (ie, via a hardware bus, ie, in the same box) and running the same OS image. But then you go on to mention distributed memory over LAN. That doesn't fit?

I think the key multiprocessing problem to solve for EVE is turning the "location" or fleet-fight LBU concurrent. That implies cross-thread communication at a fine-grain level. It also implies a limit to the total amount of processing we'd have to splat, ie, just the processing load of one "location" LBU.

We wouldn't want our communication actions there to go through a LAN. That's orders of magnitudes slower than cross-core communication. I think the appropriate hardware architecture for EVE is probably just what they have now, except to use servers with more (perhaps 4x) cores.

Of course the real work is converting the code base to take advantage of concurrent resources.

LTcyberT1000
Caldari
Free Space Tech
Goonswarm Federation
Posted - 2010.08.24 19:13:00 - [106]
 

Edited by: LTcyberT1000 on 24/08/2010 19:14:56
Originally by: Whatever Dood


I'm confused. You mention SMP, symmetrical multiprocessing, architecture - which usually refers to multiple CPU's connected to the same local memory (ie, via a hardware bus, ie, in the same box) and running the same OS image. But then you go on to mention distributed memory over LAN. That doesn't fit?
work is converting the code base to take advantage of concurrent resources.


The way how Single System Image works - it is really runnig as 1 virtual multicpu computer. If you need to get idea how it works, see http://setiathome.ssl.berkeley.edu/ and other mathematical calculation projects already working for years... :)

Whatever Dood
Posted - 2010.08.24 19:29:00 - [107]
 

Originally by: LTcyberT1000
Edited by: LTcyberT1000 on 24/08/2010 19:14:56
Originally by: Whatever Dood


I'm confused. You mention SMP, symmetrical multiprocessing, architecture - which usually refers to multiple CPU's connected to the same local memory (ie, via a hardware bus, ie, in the same box) and running the same OS image. But then you go on to mention distributed memory over LAN. That doesn't fit?
work is converting the code base to take advantage of concurrent resources.


The way how Single System Image works - it is really runnig as 1 virtual multicpu computer. If you need to get idea how it works, see http://setiathome.ssl.berkeley.edu/ and other mathematical calculation projects already working for years... :)



I wasn't questioning whether it works. I was questioning the usage of the "SMP" acronym for loosely-coupled, ie, over LAN, systems.

Regardless, the problem remains the same. It's much, much more expensive to communicate across a LAN. Enough so that single-box SMP architectures like what they're using now are more appropriate for problems like turning the fleet-fight LBU's concurrent.

Tripflare
Posted - 2010.08.24 22:26:00 - [108]
 

Originally by: Alain Kinsella
That sounds a lot like VMWare VMotion


I've often wondered if running nodes as Virtual Machines would be worth mentioning for load balancing: If nobody is using the node then the VM requests very little resources from the underlying host - just enough to tick over.

Meanwhile, where demand for lots of CPU / RAM is required on a node - that VM ramps up and uses it's full allocation on the host; 8 Virtual CPUs and 255Gb RAM can be allocated to a VM running on VMware vSphere (each vCPU is mapped to a physical core).

You can have up to 32 physical hosts per vSphere cluster - each host can have up to 1Tb RAM..

If lots of nodes (VMs) happen to be getting hit hard on one host the Dynamic Resource Scheduler (DRS) will live migrate (vMotion) the nodes that aren't doing much automatically to other hosts in the cluster - this is a very fast process and is fully automated.

If a physical host suffers hardware failure, the nodes (VMs) that were running on it are automatically restarted on other hosts in the cluster. (VMware High Availability)

There's more but I think I'll stop there for now... :)

Trip

Taudia
Gallente
Sane Industries Inc.
Initiative Mercenaries
Posted - 2010.08.28 12:44:00 - [109]
 

Wait, why does it say this blog was posted today? Fallout introduced the comment thread for it more than a week ago...?


Nareg Maxence
Gallente
Posted - 2010.08.28 14:01:00 - [110]
 

The stackless python upgrade blog didn't recieve as enthusiastic response as this one did, so they decided to bump it.

Genya Arikaido
Posted - 2010.08.28 16:10:00 - [111]
 

Dev Blog Necromancy? Shocked

jwingenderowns
Posted - 2010.08.28 16:51:00 - [112]
 

You have not seen this dev-blog before. You will appreciate it and enjoy it.
</jedi-mind-trick>
Cool

Lan Tragger
Posted - 2010.08.29 11:38:00 - [113]
 

Edited by: Lan Tragger on 29/08/2010 12:02:16
I understand GIL makes things annoyingly difficult in a single process. Easy solution is of course multiple processes. Of course, That still leaves you with synchronization between processes, but let's be honest. If Eve does not learn to use hardware more fully, you will always have scalability problems. The ability to transfer the location node on the fly should easily be doable. Of course, that means running a split node during transfer, but that shouldn't be too large of a problem (It's actually a huge undertaking). What about kicking queue processors which can act independently (don't need to block us, but currently do because of GIL) into separate processes and feed the queue to them. In a multi-core system, this means you can use shared memory or similar to transfer the data. In a multi-platform scenario, pick your transmission medium. Of course, the gains must exceed the latency of the transfer, though I suspect some evil coding could make it happen. Sending micro code snipets to cluster nodes and dealing with the latency of the transfer isn't a new problem. Even if you can't transfer the entire location yet, at least consider that effects/module queue processing (re the other blog entry) might be isolated enough to separate out from the main server thread and process independently on a separate core which would fix the play nice with GIL problem in that scenario. Depending on how crucial ordering of queue processing is, you could possibly even utilize multiple queue processors on demand in case the current processing gets too far behind (all number of ways to balance data across them, even if you played the simple odd/even game).

shutting up now. Visit google and see some of their clustering technologies. It's awesome. Mostly doesn't apply in this scenario, but I can think of a few tidbits that would.

Mr LaForge
Posted - 2010.08.29 16:15:00 - [114]
 

When I read the first few words of the blog, I thought this:

"Cobra sucks, GI Joe is better"

ELECTR0FREAK
Eye of God
Posted - 2010.08.29 18:36:00 - [115]
 

This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.

Hey, I'm all for you all having a good time when working, but I think it's time to put down the bottle guys and gals. Laughing

Syekuda
Hell's Revenge
Posted - 2010.08.30 00:13:00 - [116]
 

Edited by: Syekuda on 30/08/2010 00:27:09
Uhh, what he ^^ said. Guys stop drinking. Vacation is over
Laughing

Uhhh I may be first to say publicly but ccp got pwned by themselves lmao

ps: is this blog lagging or losing sync ?

Terminal Entry
New Fnord Industries
Posted - 2010.08.30 03:28:00 - [117]
 

Originally by: Syekuda

ps: is this blog lagging or losing sync ?


LMAO!


Trebor Daehdoow
Gallente
Sane Industries Inc.
Posted - 2010.08.30 15:01:00 - [118]
 

Originally by: ELECTR0FREAK
This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.


Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.

For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".

The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.

Syekuda
Hell's Revenge
Posted - 2010.08.30 15:59:00 - [119]
 

Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK
This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.


Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.

For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".

The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.


6 pack of redbull should do the trick

CCP Masterplan


C C P Alliance
Posted - 2010.08.30 18:01:00 - [120]
 

Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK
This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.


Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.

For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".

The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.

I think this is what you're looking for...


Pages: 1 2 3 [4] 5

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only