open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: Fixing Lag: Character Nodes
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: [1] 2 3 4 5

Author Topic

CCP Fallout

Posted - 2010.08.20 18:04:00 - [1]
 

Our "Fixing Lag" series continues with CCP Atlas' blog on character nodes, which you can read here.

Orephia
Posted - 2010.08.20 18:08:00 - [2]
 

Thanks!

& first??

Qoi
Exert Force
Posted - 2010.08.20 18:23:00 - [3]
 

Great read, thanks a lot.

Looks really reasonable, i'll look forward to fleet fight numbers when the dreaded blackscreen issue has been resolved :)

Hieronomus
Posted - 2010.08.20 18:23:00 - [4]
 

forth !!!

hi

Dacil Arandur
Habitual Euthanasia
Pandemic Legion
Posted - 2010.08.20 18:25:00 - [5]
 

Thanks for the blog!
Seems like a very smart way to take some pressure off the solar system "location nodes."
I also really like the idea of other services not directly related to the location being free of lag even if the location itself is overloaded.

Thanks again for keeping us informed!

Meissa Anunthiel
Redshift Industrial
Rooks and Kings
Posted - 2010.08.20 18:28:00 - [6]
 

Better legends for the graphs would be appreciated, I have absolutely no clue what I'm looking at. Care to say what each colored line is?

Thanks a lot for the devblog however (and the character nodes).

Alain Kinsella
Minmatar
Posted - 2010.08.20 18:29:00 - [7]
 

Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).

Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.

[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]


Master Akira
Shiva
Morsus Mihi
Posted - 2010.08.20 18:29:00 - [8]
 

Originally by: CCP Atlas
Our hope is that very soon our beloved Tranquility will be able to support fleet fights of a scale that far exceeds anything you've seen before, hopefully going beyond the roof of roughly one thousand on a dedicated node.


Now that's a bold statement.

This was a very interesting blog, with interesting solutions to the given problem. My question then would be:

Are you guys already working on moving the load of a single solar system to multiple cores if needed? Are you guys already biting the bullet of doing multithread? Because it seems you will HAVE to do it at some point whether you like it or not, and Oveur already stated that it was a "first step" to do...

Liang Nuren
Posted - 2010.08.20 18:36:00 - [9]
 

Edited by: Liang Nuren on 20/08/2010 18:36:26
Awesome dev blog - this should really help a lot. It sounds like you guys are really doing a fantastic job, and I think you're all awesome.

For my own curiosity though:
- Is the bottleneck in the database (finding/updating rows) or in the processing of individual requests (like loading/manipulating objects). It seems like if its the second, then this is really an awesome way to handle it.
- If it's the second, is there a single character database or did you distribute characters onto different databases? If you distributed them, is it difficult to move characters between databases for load balancing purposes?
- If you distributed it, is there an archival character database for offline/inactive characters, and perhaps a series of smaller character node databases for logged in characters which replicate to the master db?

Well, I could talk shop all day, and I probably shouldn't. But I do have a more serious question - it seems to me that the "Jita Inventory System" shouldn't be required to dump someone's stuff in a station. It seems like the interactions that can be had by docked people are limited to trade windows and local chat - neither of which I can imagine being handled by the location node. It seems like it's a perfect place to further distribute. Is this an improvement you guys are planning on making or are there things I don't know about?

I got money on the second, personally.

Also: sorry for the armchair development. A very well written blog that tangentially touches on my area of expertise.

-Liang

Ed: Also, I thought I saw an email on python-dev a couple months back where Guido accepted someone's method of getting rid of the GIL.

Aranial
Gallente
Empyrean Warriors
The Obsidian Front
Posted - 2010.08.20 18:42:00 - [10]
 

Wahey! More mental nom nomVery Happy.

CCP Explorer

Posted - 2010.08.20 18:50:00 - [11]
 

Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.

Daedalus II
Helios Research
Posted - 2010.08.20 18:50:00 - [12]
 

How much memory does a typical system use? How long would it take to copy all that to another node? Would it be possible for example to measure if one system spikes, then temporarily pause all other systems on that node for a few ticks and copy them to a node with more resources left? If we're talking one or a few seconds I'm sure people would accept the game lagging a few seconds if it means they don't have to be on the same node as a 1000 man fleet fight.

Nye Jaran
Posted - 2010.08.20 18:52:00 - [13]
 

Originally by: CCP Explorer
Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.


The same graph (network traffic) shows twice now.

Master Akira
Shiva
Morsus Mihi
Posted - 2010.08.20 18:52:00 - [14]
 

Originally by: CCP Explorer
Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.


You duplicated the images Rolling Eyes

CCP Explorer

Posted - 2010.08.20 18:54:00 - [15]
 

Truly fixed now.

Liang Nuren
Posted - 2010.08.20 18:55:00 - [16]
 

Originally by: CCP Explorer
Truly fixed now.


It's why you're a director instead of a dev. Twisted Evil (Thanks for the fix)

-Liang

CCP Explorer

Posted - 2010.08.20 18:58:00 - [17]
 

Originally by: Meissa Anunthiel
Better legends for the graphs would be appreciated, I have absolutely no clue what I'm looking at. Care to say what each colored line is?
There are larger versions of the the images available by clicking them.

Figure #2 is the number of net read calls made in a given time period. Up to 80% of the calls were routed away from the Jita location node to the character nodes.

Figure #3 is the CPU usage on the Jita location node before and after.

Lower lines are "after" and lower is better.

CCP Explorer

Posted - 2010.08.20 19:01:00 - [18]
 

Originally by: Alain Kinsella
Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).

Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.

[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]
EVE Mail is on the Character Nodes. In the first iteration we implemented Mail Nodes for EVE Mail but they then became Character Nodes in the second iteration and started hosting other services. I'll mention your concern to the devs.

Korerin Mayul
Amarr
Posted - 2010.08.20 19:10:00 - [19]
 

Lovley work!
It must have been soul destroying re-routing all those calls, but the scalability gains make it the kind of work that our children children will thank you for!

every time you do stuff like this, eve gets a little bit smarter - that iteration is one of the reasons im still playing.
Keep up the good work (after a few good beers perhaps)

Kaliba Mort
Minmatar
Dark-Rising
Executive Outcomes
Posted - 2010.08.20 19:11:00 - [20]
 

It is at all possible in the near to mid-term future to make the Location node (eg. interactions of ships on same grid) a multi-threaded node? Or at least make it multi-process that shares data via shared memory?


Hawk TT
Caldari
Bulgarian Experienced Crackers
Posted - 2010.08.20 19:17:00 - [21]
 

Great work! Keep going!

Could you (by any chance) share with us which services are still to be migrated from the Location nodes to the Character nodes? Or this will be posted in an upcoming blog?

Cheers!

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.20 19:18:00 - [22]
 

This is a good blog. I liked reading it. I want to know more about the underlying distribution, what goes where, the load differences, and communication overhead, etc., but I doubt I'll ever get it. Nevermind, this method (splitting off functional entities from other functional entities) is a method that works, up to a point. Beyond that, you'll have nothing further to split away and will have to go look at multi-core partitioning, but for now it will help. Keep going (if you still can) ...

Malcanis
Caldari
Vanishing Point.
The Initiative.
Posted - 2010.08.20 19:27:00 - [23]
 

I am very much appreciating this new, communicative CCP. Obviously we're not going to see DevBlogs and DevPosts sustained at quite this rate, but I hope the CCP staff are going to carry on this way.

Actual Information beats the hell out of speculation

And I also think there has been a great improvement in the mood of the playerbase. We're still very much waiting on real results, but a lot of us are feeling a lot more positive and optimistic that we'll get them.

CCP Explorer

Posted - 2010.08.20 19:32:00 - [24]
 

Originally by: Malcanis
I am very much appreciating this new, communicative CCP. Obviously we're not going to see DevBlogs and DevPosts sustained at quite this rate, but I hope the CCP staff are going to carry on this way.

Actual Information beats the hell out of speculation

And I also think there has been a great improvement in the mood of the playerbase. We're still very much waiting on real results, but a lot of us are feeling a lot more positive and optimistic that we'll get them.
I do want to submit that this blog contains real live-on-TQ results (phase #2 of these changes was deployed to TQ last week on 12 August, phase #3 was a part of Tyrannis 1.0.4 this week on 18 August).

In addition there are dev blogs in the pipelines from other devs with other such results.

Callipygian Provocateur
Posted - 2010.08.20 19:36:00 - [25]
 

Edited by: Callipygian Provocateur on 20/08/2010 19:38:10
I'm honestly a bit surprised to hear that the location node code is, at the very least governed by, python. I would have expected more of the server side code to be lower level. However, since you mentioned running into the GIL, and because the servers are running Windows, I'm curious if there has been any exploration of IronPython.

From my understanding, at least for some applications, it's faster than CPython on Windows. It also isn't hindered by the pesky GIL and allows easy access to native Windows threads. And there was an effort underway to slip some JIT compilation under the surface. Perhaps some interest from a 'large' customer might even reinvigorate Microsoft's interest in the project.

*Edit*
Also, thanks for yet another awesome blog post. I <720 (that's <3!! [yes, I'm a math nerd]) this kind of info.

ShadowMaster
Gallente
Posted - 2010.08.20 19:52:00 - [26]
 

Thank you again for yet another amazing dev blog. Looking forward to the next one today.

Ford Chicago
Pandemic Legion
Posted - 2010.08.20 19:53:00 - [27]
 

Meissa Anunthiel, the dev blog states that "each line is "the number of calls made onto the Jita [location] node during a 24 hour period". It is a bit confusing because the legend does not list the Node Ids sequentially. This would have been obvious if the legend had simply had dates instead of an internal node id.

I think that the lines in Figure 2 cover four sequential days; note that the difference in the numbers is approximately 200 which roughly corresponds with the number of nodes in the cluster. If so, it makes comparisons a bit difficult as there are known load differences on different days of the week. CCP Explorer, did you attempt to normalize the comparison against differences by day of the week in order to accurately quantify the benefit of this change or are you just showing us raw data?


I also found it interesting that "up to 80% of the calls were routed elsewhere" (other than the Location node) but the cpu utilization of the Location node only dropped 5-15% points. This means that 20% of the calls are responsible for the majority of cpu usage.

CCP Explorer, can you go into more detail about which types of calls generate the most cpu utilization? Which types of calls have been handed to the Character nodes besides mail. What are the 5-6 calls made on a jump event that *don't* need to be handled by the location node?

I found this to be one of the more interesting of the recent dev blogs, but even so, all it really says is that some things that used to be handled by the Location node are now handled elsewhere. As a programmer I suspect my interest is on the more technical side than the average player, but I'm frustrated with the recent "dev blogs" that seem more like marketing material.

Agrilad
Posted - 2010.08.20 20:07:00 - [28]
 

A thought just occured to me that I am sure has occured to y'all.

Why if there are 4 calls that always have to be made every jump. Why don't you combine them into 1 call. So the 4 different round trips over net don't have to occur?

What are those 4 calls?


Was distracted but was given a second to think.
I may have answered my own question. Perhaps those graphs and data aren't the call's over internet, but the call's inside your proxy's and load balancers. So perhaps the call to make a jump is a single call from the client, but takes 4 seperate calls to 4 different nodes to complete.

James Bryant
Posted - 2010.08.20 20:17:00 - [29]
 

Hey guys,

Fantastic dev blog. Certainly answers a whole slew of architectural questions that have been lingering in my head for some time.

My question is in regards to location node load balancing. The very fact that fleet fight requests are necessary seems to indicate a lack (or possibly not enough) automated load balancing of location nodes.

No doubt this is not a new idea to you all, so I am wondering what the difficulties are in implementing the capability to offload light traffic location nodes to underutilized CPUs when heavy traffic nodes start to throttle the CPU. Is there not a way to manage the connection while the handoff is being made? Or is it more an issue of detection and implementing the proper hysteresis in the algorithm (so that nodes don't start swapping around CPUs needlessly)?

-JB

Ix Forres
Caldari
Righteous Chaps
Posted - 2010.08.20 20:20:00 - [30]
 

Originally by: Callipygian Provocateur
Edited by: Callipygian Provocateur on 20/08/2010 19:38:10
I'm honestly a bit surprised to hear that the location node code is, at the very least governed by, python. I would have expected more of the server side code to be lower level. However, since you mentioned running into the GIL, and because the servers are running Windows, I'm curious if there has been any exploration of IronPython.

From my understanding, at least for some applications, it's faster than CPython on Windows. It also isn't hindered by the pesky GIL and allows easy access to native Windows threads. And there was an effort underway to slip some JIT compilation under the surface. Perhaps some interest from a 'large' customer might even reinvigorate Microsoft's interest in the project.

*Edit*
Also, thanks for yet another awesome blog post. I <720 (that's <3!! [yes, I'm a math nerd]) this kind of info.


Nearly all of EVE is written in Python. A particular version called Stackless Python. IronPython has a number of significant drawbacks to go with the small positives, and last time I heard the project was being slowly abandoned by MS along with IronRuby. Stackless, however, is being developed heavily by people inside CCP; this is a big enough deal that they write their own version of Python for all intents and purposes. I don't think that any benefits would outweigh the huge work required to port away from it, not to mention losing things like StacklessIO which have been major CCP projects in the past to deliver significant IO improvements. This is one area where CCP really is on top of the game.

There's also the fact that quite a lot of the stuff on the server _can't_ be done in a thread-friendly way, and anything that could be threaded would still be limited to running on one core since that's how the LBUs work (as I understand it) so there's no advantage to threading over their existing methodology, which is to use stackless tasklets (which are like threads, without the overhead). An equivalent in Ruby would be fibers; I'm sure there's other equivalents for other languages.

Back to the blog post; there's a lot of great info and the separation of more code from the Location node to other nodes for the purposes of load balancing is an interesting approach to take, and seems to be delivering. While this will of course help what further steps are being undertaken to improve performance and to either decrease load on the Location nodes, or split up the processing tasks on the node under load? What about transparent node movements and other ideas that've been thrown around in the past; has anything come of these, or are other methods being focused on before attacking those potentially more time consuming issues?

Either way, great informative blog from Atlas, a nice read.


Pages: [1] 2 3 4 5

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only