open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: Fixing Lag: Character Nodes
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 [2] 3 4 5

Author Topic

Jinquoi
JSR1 AND GOLDEN GUARDIAN PRODUCTIONS
Test Alliance Please Ignore
Posted - 2010.08.20 20:21:00 - [31]
 

Wow! Congrats for a techless geek with on an amoebic brain cell you have just made understandable a very technical topic. Thanks!Rolling Eyes

Mynas Atoch
Eternity INC.
Goonswarm Federation
Posted - 2010.08.20 20:23:00 - [32]
 

This is probably the best devblog of the bunch released this week. Its focussed, informative, positive, describes the historical design, the improvement, and the success of the implementation. It even has evidence of before and after. I'll take one of these instead of four hurf and blurf devblogs any day of the week.

You are doing what your peers are not. Giving us a peek behind the curtain, not of vague promises and waffle, nor cool gee whiz features, nor even monologues wrapped up in post doctoral jargon, but of something we can understand and appreciate.

The rest of you please take note.

thanks

Zendoren
Aktaeon Industries
The Black Armada
Posted - 2010.08.20 20:25:00 - [33]
 

Edited by: Zendoren on 20/08/2010 20:48:33
Best blog thus far!

However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.

Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.

Edit: People have been trowing around the term multi-thread instead of multi-processor. I would like to point out that threads are a way of processing tasks that take advantage of prioritization on s single processor or VM. Multi-processor is the ability to process tasks on multiple processors on a system that has more than one processor. Big difference, at least in some programing languages.

CCP Atlas

Posted - 2010.08.20 20:33:00 - [34]
 

Originally by: Ford Chicago
CCP Explorer, did you attempt to normalize the comparison against differences by day of the week in order to accurately quantify the benefit of this change or are you just showing us raw data?

This is raw data but the 4 runs were quite similar in terms of a population and usage profile.

Originally by: Ford Chicago
I also found it interesting that "up to 80% of the calls were routed elsewhere" (other than the Location node) but the cpu utilization of the Location node only dropped 5-15% points. This means that 20% of the calls are responsible for the majority of cpu usage.

Yes, that is a very good observation. This isn't giving us an 80% gain in terms of CPU since the calls that were routed elsewhere are much lighter than the ones that need to remain.

Originally by: Ford Chicago
CCP Explorer, can you go into more detail about which types of calls generate the most cpu utilization? Which types of calls have been handed to the Character nodes besides mail. What are the 5-6 calls made on a jump event that *don't* need to be handled by the location node?


Some examples of calls that now get routed to the character nodes are lookups of characters, corps and alliances (something that happens all the time when you see someone in your overview for example), certain show-info operations, sov info, some station info, etc, etc. It's all over the place, which is the reason it hasn't been structured properly up until now. Programmers have typically thought "I have this teeny tiny call, I'll just stick it on the location node".
Originally by: Ford Chicago
I found this to be one of the more interesting of the recent dev blogs, but even so, all it really says is that some things that used to be handled by the Location node are now handled elsewhere. As a programmer I suspect my interest is on the more technical side than the average player, but I'm frustrated with the recent "dev blogs" that seem more like marketing material.


Thanks. :) Like I mentioned, it's just a bunch of little things, most of them very light calls but they add up to a big bunch of traffic.

Zendoren
Aktaeon Industries
The Black Armada
Posted - 2010.08.20 20:33:00 - [35]
 

Originally by: Alain Kinsella
Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).

Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.

[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]




Try clearing your mail cache within the Esc Option menu (on last tab).

CCP Atlas

Posted - 2010.08.20 20:41:00 - [36]
 

Originally by: James Bryant
Hey guys,

Fantastic dev blog. Certainly answers a whole slew of architectural questions that have been lingering in my head for some time.

My question is in regards to location node load balancing. The very fact that fleet fight requests are necessary seems to indicate a lack (or possibly not enough) automated load balancing of location nodes.

No doubt this is not a new idea to you all, so I am wondering what the difficulties are in implementing the capability to offload light traffic location nodes to underutilized CPUs when heavy traffic nodes start to throttle the CPU. Is there not a way to manage the connection while the handoff is being made? Or is it more an issue of detection and implementing the proper hysteresis in the algorithm (so that nodes don't start swapping around CPUs needlessly)?



Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.

With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node.

CCP Atlas

Posted - 2010.08.20 20:50:00 - [37]
 

Originally by: Zendoren
Edited by: Zendoren on 20/08/2010 20:26:16
Best blog thus far!

However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.

Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.


This does not change the topology of the cluster at all, and is a perfect fit for its existing layout. From the client's point of view the network is:

Client -> Proxy -> Sol -> SQL Server

The 'Sol' tier can be any node in the cluster while the rest of the layers are exactly 1 for each client. For the sol nodes as you saw in Figure 1 in the blog, you maintain a virtual connection to several sol's at a time depending on the request context. It's all transparent to the application logic and pretty nifty and easy to work with. We do need to place certain restrictions on game design in order to maintain this schema, but it's the architecture that Eve was founded upon.

(I'm not mentioning above that there is a hardware load balancer in front of the proxy tier which picks a proxy for you when you connect since that will just confuse the layout)

Herschel Yamamoto
Agent-Orange
Nabaal Syndicate
Posted - 2010.08.20 21:05:00 - [38]
 

Those are some very impressive graphs you've got there. A few questions. What impact will this have on Jita - what will the new pop cap be? How does this seem to be affecting the jump-in lag that has plagued fleet fighting in recent months? And how will this affect lag in contexts other than people jumping into systems - does it speed things up for people who are in system doing things, or just on system load?

And thanks for a great week of dev blogs, all involved. I even understood like 2/3 of it.

CCP Atlas

Posted - 2010.08.20 21:16:00 - [39]
 

Originally by: Liang Nuren
Edited by: Liang Nuren on 20/08/2010 18:36:26
Awesome dev blog - this should really help a lot. It sounds like you guys are really doing a fantastic job, and I think you're all awesome.

For my own curiosity though:
- Is the bottleneck in the database (finding/updating rows) or in the processing of individual requests (like loading/manipulating objects). It seems like if its the second, then this is really an awesome way to handle it.
- If it's the second, is there a single character database or did you distribute characters onto different databases? If you distributed them, is it difficult to move characters between databases for load balancing purposes?
- If you distributed it, is there an archival character database for offline/inactive characters, and perhaps a series of smaller character node databases for logged in characters which replicate to the master db?

Well, I could talk shop all day, and I probably shouldn't. But I do have a more serious question - it seems to me that the "Jita Inventory System" shouldn't be required to dump someone's stuff in a station. It seems like the interactions that can be had by docked people are limited to trade windows and local chat - neither of which I can imagine being handled by the location node. It seems like it's a perfect place to further distribute. Is this an improvement you guys are planning on making or are there things I don't know about?

I got money on the second, personally.

Also: sorry for the armchair development. A very well written blog that tangentially touches on my area of expertise.

-Liang

Ed: Also, I thought I saw an email on python-dev a couple months back where Guido accepted someone's method of getting rid of the GIL.


We only have a single database and it's easier to scale that up than the sol nodes and we're already ahead of the curve in terms of what the DB can deliver. We do cache very aggressively on the server though and consolidating these character node calls onto a half a dozen nodes rather than servicing them throughout the cluster does remove a bit of the DB load since we get more cache hits, but like I said, the DB is not a big issue in this regard today. What this particular change saves us mostly is having to process relatively light and simple calls on a given node.

The inventory system is what lies at the heart of Jita's cpu cycles and it's really just a glorified DB cache. Moving items about and interacting with them causes a cascade of all sorts of events that must be handled by the game systems on that node. Therefore it's not really feasible to offload parts of those operations elsewhere.

Market hubs like Jita have the potential for load balancing stations separately of the solar system and other stations. That is something we are currently investigating as a possible 'end-all' fix to Jita. There is a fair bit of game design involved and I'm not making any promises however. :-)

Interesting tidbit about Guido-and-the-GIL. I need to google it.

Zendoren
Aktaeon Industries
The Black Armada
Posted - 2010.08.20 21:23:00 - [40]
 

Edited by: Zendoren on 20/08/2010 21:45:33
Edited by: Zendoren on 20/08/2010 21:23:59
Originally by: CCP Atlas
Originally by: Zendoren
Edited by: Zendoren on 20/08/2010 20:26:16
Best blog thus far!

However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.

Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.


This does not change the topology of the cluster at all, and is a perfect fit for its existing layout. From the client's point of view the network is:

Client -> Proxy -> Sol -> SQL Server

The 'Sol' tier can be any node in the cluster while the rest of the layers are exactly 1 for each client. For the sol nodes as you saw in Figure 1 in the blog, you maintain a virtual connection to several sol's at a time depending on the request context. It's all transparent to the application logic and pretty nifty and easy to work with. We do need to place certain restrictions on game design in order to maintain this schema, but it's the architecture that Eve was founded upon.

(I'm not mentioning above that there is a hardware load balancer in front of the proxy tier which picks a proxy for you when you connect since that will just confuse the layout)


Sorry I confused you for Soundwave, Atlas

Edit: GIL

CCP Atlas

Posted - 2010.08.20 21:24:00 - [41]
 

Originally by: Herschel Yamamoto
Those are some very impressive graphs you've got there. A few questions. What impact will this have on Jita - what will the new pop cap be? How does this seem to be affecting the jump-in lag that has plagued fleet fighting in recent months? And how will this affect lag in contexts other than people jumping into systems - does it speed things up for people who are in system doing things, or just on system load?

And thanks for a great week of dev blogs, all involved. I even understood like 2/3 of it.

This change isn't going to multiply the number of people we can cram into Jita, but I'm hopeful that it will give us 10-20% yield in population. We are taking it slow in Jita and have the population cap set at 1500 now, we will increase it once we see Jita handling that well. We would rather see a lag-free Jita at 1500 than laggy at 1800.

This will have a positive impact on the jump in lag for fleets since many of the calls that slow down the jumping are now serviced immediately elsewhere, leaving the location node free to do the important bits. This isn't a fix for jump-in lag though. We have some hopeful actual fixes (serious mitigation anyway) in the pipes for immediate future though. More blogs on that soon.

This sort of change will speed up utility functions that don't impact your solar system directly. Your client should seem a bit 'spiffier' when talking to the server. You won't see an fps increase but you don't have to wait as long for things like loading up the map, right clicking on other players and things of that nature. There is also a bit less for the location node to do so it has more buffer for the pew-pew.

Manfred Rickenbocker
Pan Galactic Gargle Blasters
Important Internet Spaceship League
Posted - 2010.08.20 21:38:00 - [42]
 

Edited by: Manfred Rickenbocker on 20/08/2010 21:40:38
I notice y'all dont have "Station" nodes. Is there a reason you cant break station traffic (such as docked pilots list, fittings, inventory, industry, etc etc) on a separate node people pew-pewing outside in their very important internet spaceships? I figure if you did that, you might be able to help split your traffic between those who are station spinning to those zooming around in space.

Sered Woollahra
No Fixed Abode
LEGIO ASTARTES ARCANUM
Posted - 2010.08.20 22:08:00 - [43]
 

You know, the information contained in this series of blogs should be combined & edited into a comprehensive case study on MMO infrastructure scaling and performance troubleshooting. It would make a terrific read for anyone interested in high performance/availability environments. It may be better to wait for some concrete results though :-)

Blue Harrier
Gallente
Posted - 2010.08.20 22:17:00 - [44]
 

Originally by: CCP Atlas

Snip -

This sort of change will speed up utility functions that don't impact your solar system directly. Your client should seem a bit 'spiffier' when talking to the server. You won't see an fps increase but you don't have to wait as long for things like loading up the map, right clicking on other players and things of that nature. There is also a bit less for the location node to do so it has more buffer for the pew-pew.

Funny you should say that but after the last patch I was talking to my son (he was in 0.0 and I was in Essence in Empire), and I asked him did he notice the client seemed ‘snappier’ to use and he replied that he was thinking the same and about to ask me.

So it looks like you on the right track, keep up the good work and thanks for some great blogs that even I can understand (well some of it Wink).

Cailais
Amarr
Nasty Pope Holding Corp
Talocan United
Posted - 2010.08.20 22:28:00 - [45]
 

I hope you guys fix EVE soon. 90% of my buddies no longer log in and ones quit completely (I did get his stuff though ;) )

C.


TornSoul
BIG
Gentlemen's Agreement
Posted - 2010.08.20 22:37:00 - [46]
 

Quote:

We have multiple market regions living on a single node and currently four nodes servicing all the market regions. If the load on the market increases we can just increase the number of nodes dedicated to that task and decrease the number of markets on a given node.


When exactly did this happen???

I recall from far back (years) that that was one of the holy grails you where working on.
It was my impression (not announced? or me not catching it?) that this hadn't been achieved yet.

Reading the blog it comes of as if this has been in place some time (how I read it anyhow)
Is this correct - or is it in fact part of the described change(s) - I.e. a recent thing?

/confused...


---

Oh and - Awesome, awesome (series of) blog(s)

---

And seeing Oveur active on the forums again, and even torfi, really does wonders for the "karma bank account".
Please keep it up guys.


Camios
Minmatar
Sebiestor Tribe
Posted - 2010.08.20 22:39:00 - [47]
 

Edited by: Camios on 20/08/2010 22:52:22
Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?

Xianthar
Vanishing Point.
The Initiative.
Posted - 2010.08.20 22:52:00 - [48]
 

Originally by: Liang Nuren

Ed: Also, I thought I saw an email on python-dev a couple months back where Guido accepted someone's method of getting rid of the GIL.



Be nice if it were looked at again, i know there was a patch back around 1.5ish that removed the GIL and implemented fine-grain locking which lost support because performance was much worse on single core systems and didn't start to shine till 3+ cores. But that was ~10 ten years ago, prior to quad cores being the norm and 6-8 core cpu's + hyperthreading virtual cores being the performance segment. Perhaps the trade off makes much more sense now.

Then again with the recent change to python 2.7 from 2.5 CCP picked up the multiprocessing package that was added in 2.6, maybe that they plan for spreading out node work loads.




CCP Explorer

Posted - 2010.08.20 22:53:00 - [49]
 

Originally by: TornSoul
Quote:
We have multiple market regions living on a single node and currently four nodes servicing all the market regions. If the load on the market increases we can just increase the number of nodes dedicated to that task and decrease the number of markets on a given node.
When exactly did this happen???

I recall from far back (years) that that was one of the holy grails you where working on. It was my impression (not announced? or me not catching it?) that this hadn't been achieved yet.

Reading the blog it comes of as if this has been in place some time (how I read it anyhow) Is this correct - or is it in fact part of the described change(s) - I.e. a recent thing?
The market has been run on its own set of nodes for years.

TornSoul
BIG
Gentlemen's Agreement
Posted - 2010.08.20 23:05:00 - [50]
 

Originally by: CCP Explorer
The market has been run on its own set of nodes for years.

Thanks for a quick! answer.

---

But dang-nabbit..
Now I have to go find that post I made a couple of weeks ago where I made a smart remark about this not having happened yet.

Do send me an EVE mail next time Razz


CCP Atlas

Posted - 2010.08.20 23:09:00 - [51]
 

Originally by: Camios
Edited by: Camios on 20/08/2010 22:52:22
Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?

The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.

There is not a massive amount of work done in the chat channel though. Typing in local doesn't impact the server much but it does play a role in whether your client recovers or not when your session is hurting.

Jim Luc
Caldari
Rule of Five
Vera Cruz Alliance
Posted - 2010.08.20 23:15:00 - [52]
 

Originally by: CCP Atlas
Originally by: Camios
Edited by: Camios on 20/08/2010 22:52:22
Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?

The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.

There is not a massive amount of work done in the chat channel though. Typing in local doesn't impact the server much but it does play a role in whether your client recovers or not when your session is hurting.


What is the possibility of using a proxy node for these types of things? For instance, the location node is needed, yes, but could the location node dispatch events for when someone enters, and leaves the system - but the chat and client will be running on a completely separate node, listening for any change in activity. It seems to me that eliminating the chat from a node will drastically reduce load.

Mashie Saldana
Minmatar
Veto Corp
Posted - 2010.08.20 23:27:00 - [53]
 

With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?

Alain Kinsella
Minmatar
Posted - 2010.08.20 23:49:00 - [54]
 

Originally by: CCP Atlas

Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.

With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node.


*ears perk up*

That sounds a lot like VMWare VMotion, or the similar 'live migrate' feature in Solaris LDOM. [Not sure if Hyper-V has something similar.]

I keep getting this vision of the Sol node as a Stackless interpreter setup to run as a mini-VM. Twisted Evil

As for Mail, cache clearing may have fixed. Embarassed Still checking through it though (my other guy is on one of the Bulk lists, that should be a reasonable test). Wink Will bug report if I can 'repro consistently (I was able to before, will see now).

The Paperwork
Posted - 2010.08.21 00:13:00 - [55]
 

Quote:
The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.


So... and I'm just spitballin' here... what about letting "local" die in a lag fire, and just having big fleet fight grid nodes?

Ephemeral Waves
Silver Snake Enterprise
Posted - 2010.08.21 00:53:00 - [56]
 

Quote:
...support fleet fights of a scale that far exceeds anything you've seen before...


We'd be happy if it would support fleet fights of a scale that we've ALREADY seen before rather than the current screwed up situation.

Frug
Omega Wing
Snatch Victory
Posted - 2010.08.21 00:59:00 - [57]
 

Dude. A bunch of beige towers leading into an original imac?

No wonder there's lag issues.


Jim Luc
Caldari
Rule of Five
Vera Cruz Alliance
Posted - 2010.08.21 01:06:00 - [58]
 

Originally by: Frug
Dude. A bunch of beige towers leading into an original imac?

No wonder there's lag issues.




I LOL'd Laughing

CCP Atlas

Posted - 2010.08.21 01:11:00 - [59]
 

Originally by: Mashie Saldana
With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?

Actually, the NPC AI is a perfect example of a system that needs to live on the location node. However, there are not any outstanding scalability issue with that.

There are no technical reasons Sleeper AI or something akin to that isn't on more or all NPC's, it's a game mechanical / balancing issue which is outside my expertise... ugh

CCP Atlas

Posted - 2010.08.21 01:12:00 - [60]
 

Originally by: Jim Luc
Originally by: Frug
Dude. A bunch of beige towers leading into an original imac?

No wonder there's lag issues.




I LOL'd Laughing


I was wondering when someone would comment on that Very Happy


Pages: 1 [2] 3 4 5

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only