open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog by Explorer - StacklessIO and lag reduction
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 2 [3] 4 5 6

Author Topic

Jacob Goods
Goods Industries
Posted - 2008.09.27 05:19:00 - [61]
 

One of the more interesting articles I've read in very long time. Thanks!

Beltantis Torrence
Wolfsbrigade
ShadowWolves.net
Posted - 2008.09.27 06:15:00 - [62]
 

Impressive stuff guys, well done.

Suitonia
Gallente
Genos Occidere
HYDRA RELOADED
Posted - 2008.09.27 06:29:00 - [63]
 

Nice work.

Franga
NQX Innovations
Posted - 2008.09.27 07:21:00 - [64]
 

Originally by: DigitalCommunist
This blog was good, more like this!


He speaks truth. This is the blog we were looking for!

GOOD DEV BLOG! Great improvements by the looks of thing. I wait to hear what it's like in the bigger fleet fights of FW and 0.0 alliances.

/me touches CCP.

Disteeler
Perkone
Posted - 2008.09.27 07:22:00 - [65]
 

wow those blogs makes a difference! nice technology. Couple that with infiniband and you have a winner model of mmo server farm

Malcanis
Caldari
Vanishing Point.
The Initiative.
Posted - 2008.09.27 07:29:00 - [66]
 

Originally by: porkbelly
Originally by: Bartholomeus Crane
The results certainly look promising, but I would like to know what StacklessIO actually does...

Ah, yes.
As the dev primarily responsible I should probably write a technical blog about it. Meanwhile IŽll offer that StacklessIO is a framework that allows us to make things such as asynchronous IO and work that is spawned off to worker threads appear as regular, blocking operations for tasklets in Stackless Python. We then use this to perform asynchronous Winsock operations using IO completion ports. The semantics are not new, but the scheduling framework and the lightweight winsock layer we use are.


Also +1 for having a great nick

Ancy Denaries
Posted - 2008.09.27 07:40:00 - [67]
 

Most impressed both by the improvements and the devblog. I approve of this service :D Very very nice to hear that this benefits the whole cluster, and not just a few nodes. Very nice work and congratulations on pulling it off!

Chienka
Di-Tron Heavy Industries
Atlas Alliance
Posted - 2008.09.27 08:45:00 - [68]
 

Any more accounts on how much this is affecting the lag in blob warfare?

Brock Nelson
Posted - 2008.09.27 09:16:00 - [69]
 

Edited by: Brock Nelson on 27/09/2008 09:19:41
I'm looking at the Average Cluster Ping Time chart and one question about them.

The spike using old technologies between 20:24 and 21:36, would you say that is due to the number of player during that time frame?

If that's the case, how is it possible that the stacklessio maintains the average ping time regardless of the number of player in Jita? Edit: Would the average ping not correlate to the number of players in Jita?

Also, CCP Explorer mentioned that some of the node are already running 64 bit; how does the performance of those node compare against the 32 bit nodes? I know that performance data may be biased (ie: more load in one node vs another).

I was wondering what exactly is causing performance issue in any node or in this case, Jita. I know its due to 1400-something people but does the demand on the node break down into something more specific? Such as people putting more people on block b/c of scams? People accessing Jita market? Players just sitting in station? Chatroom? etc

Don't forget to copyright "StacklessIO"YARRRR!!

iudex
Posted - 2008.09.27 09:17:00 - [70]
 

I don't know if this has something to do with the new technology, but i noticed a new problem which started few days or weeks ago.

Although there is no lag in my mission hub (except when reprocessing big ammounts of modules, which now can take up to 5 minutes), the autorepeat modules sometimes (actually quite often last days) have a strange ******ation: the launcher rof can be up to double as much, it sometimes take up to 15 seconds between the launch of a missile, where the theoretical rof (the one shown on launcher info) is 8.3 sec.
Also the shield booster take more seconds to activate, when there is no such effect my perma-boosting shieldtank remains cap-stable at 40%, with this new strange lag effect it's sometimes up to 70-80% (which can only mean that the shieldbooster rof is much higher than usual). Next to that i don't see any lag, although there are over 100 people in that mission hub (Irjunen) the fps stays around 40-60.

Maybe this has nothing to do with StacklessIO, but this phenomenon is rather new, it's hard to detect for missionrunners that don't use permatanks and fof launchers (if you reactivate them a lot you might not notice the actual rof increase), so maybe there are connections between that new phenomenon and the new technology.

Brock Nelson
Posted - 2008.09.27 09:22:00 - [71]
 

inudex, I think StacklessIO patch was applied only to nodes with heavy demands such as Jita. I understand that mission hubs have large demand but nothing compares to Jita

Vim
Spiritus Draconis
Posted - 2008.09.27 09:56:00 - [72]
 

I approve of what your doing with my sparkling purple me!

CCP Explorer

Posted - 2008.09.27 10:31:00 - [73]
 

Originally by: Brock Nelson
inudex, I think StacklessIO patch was applied only to nodes with heavy demands such as Jita. I understand that mission hubs have large demand but nothing compares to Jita
StacklessIO was applied everywhere on the cluster. It is also coming to a client near you next Tuesday when EVE Online: Empyrean Age 1.1.1 will be released.

CCP Lingorm


C C P
Posted - 2008.09.27 10:31:00 - [74]
 

Originally by: Brock Nelson
inudex, I think StacklessIO patch was applied only to nodes with heavy demands such as Jita. I understand that mission hubs have large demand but nothing compares to Jita


StacklessIO was applied to all node in the Tranquility Cluster, not just to high load nodes.

The 64bit compile of EVE has been deployed to some high load systems and we are actively monitoring the performance of these systems.

Onyx Asablot
The Vicious Circle
Posted - 2008.09.27 10:50:00 - [75]
 

Great job, thank you very much CCP.

Radeberger
Caldari
I Care...... Seriously i do
Posted - 2008.09.27 11:03:00 - [76]
 

After reading the dev blog it came to me that it seems you now have a way to measure lag.

If that is the cause will that mean that you will use this on a more permanent basis? Will that also mean that you can begin to reimburse people who lose their ships etc from laggy fleet fights?

Other than that sure looks impressive, very nice work.

CCP Explorer

Posted - 2008.09.27 11:03:00 - [77]
 

Edited by: CCP Explorer on 27/09/2008 17:48:41
Originally by: Brock Nelson
The spike using old technologies between 20:24 and 21:36, would you say that is due to the number of player during that time frame? If that's the case, how is it possible that the stacklessio maintains the average ping time regardless of the number of player in Jita? Edit: Would the average ping not correlate to the number of players in Jita?
The old technology exhibited very particular behaviour when the number of pilots crossed a certain threshold. StacklessIO provides much better performance that is less correlated to player numbers.
Quote:
Also, CCP Explorer mentioned that some of the node are already running 64 bit; how does the performance of those node compare against the 32 bit nodes? I know that performance data may be biased (ie: more load in one node vs another).
It's too early to tell what the performance gain will be, we have had native 64 bit code running for only a few days now on selected nodes in the cluster. The biggest advantage is scalability, the ability to use more memory.

The normal setup in the cluster is that a blade has two 64 bit processors, 4 GB of memory and runs Window Server 2003 x64 (we are planning an upgrade to Windows HPC Server 2008). Each blade runs two nodes and each node then hosts a number of solar systems. There are also dedicated nodes for the market (each market blade runs three market nodes), dedicated nodes for corporation services, a dedicated head node for the cluster, etc.

Finally there is a pool of dedicated dual-CPU machines that only run a single node per machine. Jita and four other solar systems are assigned to that pool. That pool is now running all native 64 bit code and the blades have been upgraded to 16 GB of memory.
Quote:
I was wondering what exactly is causing performance issue in any node or in this case, Jita. I know its due to 1400-something people but does the demand on the node break down into something more specific? Such as people putting more people on block b/c of scams? People accessing Jita market? Players just sitting in station? Chatroom? etc
... all of the above? The Inventory System is high on the list in Jita as players move a large number of items between hangar and cargo. The Agression Manager and Damage Tracking and Calculation System are low on the list in Jita.

Mors Magne
Astral Adventure
Posted - 2008.09.27 11:06:00 - [78]
 

This is all very exciting! Very Happy


Whiskey Girl
Posted - 2008.09.27 11:12:00 - [79]
 

nice now for fleet battles that should be a improved area

ArchenTheGreat
Caldari
Pulsar Nebulah
Army of Lovers.
Posted - 2008.09.27 11:22:00 - [80]
 

Originally by: Taedrin

"tasklets"
"Winsock operations"
"IO completion ports"



Tasklets - as far as I remember they are light threads, something which Stackless python is good for
Winsock - http://en.wikipedia.org/wiki/Winsock
IO completion ports - http://technet.microsoft.com/en-us/sysinternals/bb963891.aspx

Dudley Beekle
Posted - 2008.09.27 11:32:00 - [81]
 

Nice work CCP, good to see theirs life left in the cluster yet Laughing

Lucas Avignon
Avignon Associates Inc.
Posted - 2008.09.27 11:43:00 - [82]
 

Best dev blog of all time \o/

Wow this is amazing, I was going to say was the memory problem to do with 32bit code but it seems you are already deploying 64bit code server and client wide.

I had become impatient with CCP about lack of info regarding your tackling of the lag problem, of course now there are blogs coming left right and center and actual solutions being implemented to reduce lag and improve the server performance.

There have been a lot of dev posts on this issue lately which is great and I look forward to hearing more, especially with regards to client/server code changes that can allow multi threading and allow Solar systems to run dynamicaly on multiple cores.

As well the code changes that will allow the implementation of infiniband and what is planned for the server hardware upgrade.

For me this is the most interesting and important project in Eve, far overshadowing anything ambulation will bring.


DaDutchDude
Agony Unleashed
Agony Empire
Posted - 2008.09.27 12:00:00 - [83]
 

Holy [censored] Batman! Dev Goodness Galore!

1) Very nice blog and nice dev responses with lots of new info! Kudos to CCP!

2) Very nice improvements! It looks like some important problems for not only Jita but also fleet battles have been squashed! Double kudos CCP, esp. to the developers in the basework doing the magic server voodoo stuff nobody is usually aware of.

3) Isn't the name StacklessIO a bit ... uhm ... confusing?
As far as I understand is (or guesstimate), you picked the name since it fits in nicely with the features Stackless Python offers, although it's actually more about non-blocking IO by adding more support asynchronous and parallel communications. Am I right?

4) Memory usage is the new bottleneck? Halp!
a) I guess for systems like Jita and the other one node systems, it is easy to predict the memory load will be high, so with 64-bit code and 16GB, this seems well under control.
But how about massive 0.0 battles? Now that the network IO is no longer the bottleneck, this could become a problem with the unpredictable nature of massive 0.0 fleets. From what I understand, they could still cause out of memory problems in systems that have less hardware available.
I know the memory usage has been lowered by your hard work over the past week, but are you taking aditional steps to prevent this? Are more systems being updated with more memory and 64bit code, or do you have another trick up your sleeve? And can you make failures more graceful perhaps then just a node crashing?

b) I'm noticing that the memory usage on the nodes on the last graph only goes up and never back down until downtime, despite the fact that downtime is not the busiest time of the day. It makes me wonder about memory leaks and if this is the reason you require a daily server reboot. Could you tell us more about that?

5) Removing old bottlenecks --> find new ones!
I know from personal experience that removing one bottleneck can lead to the next system to fail. We already seen the memory issue, but are there any other things you found out about your systems / software / infrastructure by speeding up IO?

6) Windows Server 2008 HPC: ORLY?
I read somewhere about 2008 HPC and it looks like it would fit well with CCP, but can you tell us more about what you are specifically hoping to gain from this upgrade?

Looking forward to your responses!

Deva Blackfire
Viziam
Posted - 2008.09.27 12:01:00 - [84]
 

Originally by: Karbowiak
So, this must be why i hardly felt any lag in JU- tonight with ~300 in system doing battle..

epic.. :D

FLEET BATTLES HERE I COME \o/ Laughing


Pretty much same question - was JU- covered by this "magic stuff" yesterday? We had 350 on local and lag was minimal (at one point it spiked for me and blocked EVERYTHING for like 10seconds but after this highest lag i had was 3-5 seconds). Also we were using drones around - launching drones mid fight took me 15-20 seconds tops.

Blazde
4S Corporation
Morsus Mihi
Posted - 2008.09.27 12:42:00 - [85]
 

Edited by: Blazde on 27/09/2008 12:54:03
I'll hold off full enthusiasm until the long term effect on fleet battes is clearer but it's most definetely fantastic to see core technology being ovehauled like this, and cool devblogs to boot Cool

Couple of observations so far:

Of the two big battles I've experienced since the change (WH-JCA on 16th 22:00-01:00 local peaked at 440
and M-OEE8 21st 21:00-22:00 local around 250) both had very different lag conditions to what I'd usually expect. My ship was generally more responsive and client seemed to stay updated more often. No staring at an empty grid for 5 minutes before loading, no 5 minute activation lag.

However there was plenty of new weirdness, in the 1st:
- Several people jumping in to system and ending up inside the station, ship intact
- Several people blackscreening for an hour during login/undock and their ship only appearing ingame when they logged off
- One report of a ship remaining in space for 45 minutes (after which GMs intervened) after log-off in a nearby quiet system after leaving the battle. This bug isn't totally new but I've never seen it so extreme as this.
- A few reports of 'ghost' ships.
A good example:
Quote:
the systems around there are still quite messed up. We've been probing and destroying ships that logged off hours ago. When they blow up, a wreck appears, but the ship remains there and becomes unlockable. There's a bob hurricane that has been sitting on a gate in WH for several hours now, and a few jumps down towards TVN, a phantom tri gang on a gate. You jump in and there's about 5 or 6 ships sitting around the gate, all unlockable, and none of them are in local. If you bump them, they slowboat back to where they were.


In the 2nd battle and a very similar one about an hour before there was:
- Huge numbers of unlockable 'ghost' ships, mostly where ships had died.
- Two occurences of what felt like an almost-but-not-quite node death, requiring everyone to relog during which many dead (and damaged) ships returned to full health, in many cases after insurance had already been paid for the lost ships.
- Around 50-75% of killmails never being generated.

I hate to say it but it was categorically worse than before StacklessIO was introduced. I'm assuming it's minor teething problems and will be (or has been) quickly diagnosed and fixed, I'm just saying don't be too fast to relax and sign it off as a job well done Smile

More generally I'm little concerned about your attention to Jita. Lag conditions in jita are presumably pretty repeatable, or predictable rather, so it serves as a good base line metric to measure changes against. But I really wish you'd analyse fleet battles (lowsec FW and 0.0) more closely than it appears you do. The weird unexpected bugs and lag conditions are more likely to crop up in fleet warfare because players tend to create less common place, 'extreme' node stress conditions (as a simple example it will never happen that hundreds of ships simultaneously activate mwd in close proximity in Jita).

There are still big issues with excess lag created by carriers. There are very common desyncs that occur when capital sized ships collide that put people's supercaps in danger on a daily basis. That ships often remain in space after log off longer than they should in lag is also very serious. We as players can't submit simple repro steps and get these things fixed. It requires careful logging and analysis of real fleet situations server side.

I strongly suspect that more attention to these situations would uncover performance bugs that could improve all of EVE. Even if they don't, fleet battles are simply a more critical aspect of EVE than the performance of Jita. At the end of the day I can schedule my trade activities for off-peak times and a little lag in Jita is no big deal compared to it's affect on territorial warfare.

But as I said thumbs up, good to see substantial changes even if it's a bit hairy for a while.

Deva Blackfire
Viziam
Posted - 2008.09.27 13:27:00 - [86]
 

Quote:
- Huge numbers of unlockable 'ghost' ships, mostly where ships had died.


O yea that is the "new" problem. Yesterday melee in JU- generated a LOT ghost ships. We had around 30-40 ghost ships on the grid at some point.

Felysta Sandorn
Celestial Apocalypse
Posted - 2008.09.27 13:31:00 - [87]
 

Originally by: porkbelly
Originally by: Bartholomeus Crane
The results certainly look promising, but I would like to know what StacklessIO actually does...

Ah, yes.
As the dev primarily responsible I should probably write a technical blog about it. Meanwhile IŽll offer that StacklessIO is a framework that allows us to make things such as asynchronous IO and work that is spawned off to worker threads appear as regular, blocking operations for tasklets in Stackless Python. We then use this to perform asynchronous Winsock operations using IO completion ports. The semantics are not new, but the scheduling framework and the lightweight winsock layer we use are.


Yeah, I did a similar thing in VB last Tuesday when I was bored at work... Laughing

Robacz
Essence Enterprises
Posted - 2008.09.27 14:06:00 - [88]
 

Very nice blog, lots of interesting info! As a Jita resident, I am very happy to hear about this new technology you developed, so please let me give you big fat

GOOD JOB

Very HappyVery Happy

Mallikanth
Posted - 2008.09.27 14:21:00 - [89]
 

I love techie blogs because of these few points...

1) The 'technical' players who question CCP's revelations about their (CCP's) code and/or servers. They make me laugh.

2) I have not heard of any other Company, let alone a MMO one, who gives as much techie information to the player base as CCP do. Much Kudos to CCP ugh

3) These Techie blogs are always superb. I especially like this one.

Nice one
/end fan boy postLaughing

Disteeler
Perkone
Posted - 2008.09.27 14:44:00 - [90]
 

Edited by: Disteeler on 27/09/2008 14:46:17
Originally by: CCP Explorer
The normal setup in the cluster is that a blade has two 64 bit processors, 4 GB of memory and runs Window Server 2003 x64 (we are planning an upgrade to Windows HPC Server 2008). Each blade runs two nodes and each node then hosts a number of solar systems. There are also dedicated nodes for the market (each market blade runs three market nodes), dedicated nodes for corporation services, a dedicated head node for the cluster, etc. Finally there is a pool of dedicated dual-CPU machines that only run a single node per machine. Jita and three other solar systems are assigned to that pool. That pool is now running all native 64 bit code and the blades have been upgraded to 16 GB of memory.


Knowledge make customers less whinners, I hope. Anyway, thanks for the info, I love to know tech specs \o/

BRW, 16GB in Jita means... more people to Jita, BBQ!!


Pages: 1 2 [3] 4 5 6

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only