open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: ing Lag: Module Lag - Why Not All Bugfixes Are A Good
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 [2] 3 4 5 6 7

Author Topic

CCP Veritas

Posted - 2010.08.23 16:23:00 - [31]
 

Originally by: Ix Forres
Great facial expression, good dev blog. How many major fixes have come about since the introduction of the thin client platform, and when was this made internally available? What were your methods of debugging EVE before it was available?


Thanks~

The thin clients matured enough for usage roughly a month ago, and while it took a fair bit of time to get them to do useful things, much has come from them already. It's hard to say how many major fixes have come from their usage so far, since that's dependent on your definition of major. Certainly most of what I talk about in this blog was made possible in the timeframe it happened via thin clients, and there are some nice optimizations coming down the pipe as well that originated in controlled thin client setups.

Before the thin clients we did load testing by getting as many warm bodies to log in to test servers, both internally and with the public mass tests. This type of testing is fantastic for finding things you didn't think to test, but not very useful for the kind of specific drill-down testing that I've been talking about above. Removing the need to use mass tests for that purpose has been very liberating, both in that we can do more of it and that we can focus on higher level information during mass tests.

Peter Tjordenskiold
The Executives
Executive Outcomes
Posted - 2010.08.23 16:25:00 - [32]
 

Wouldn't it be easy to slow down all cylic times in a heavy load situation, like

real_cyclic_time = cyclic_time * load_factor ?

The paradigma of real time is great, but a cyclic time based on load could help a little bit.


CCP Veritas

Posted - 2010.08.23 16:25:00 - [33]
 

Originally by: Dierdra Vaal
Quote:
Essentially what we have to do is re-introduce yielding in the effect processing queue, only under our control instead of at the whim of some defect.


Does this mean the module lag will get better, or will it stay the same?


I'm not really sure. At the very least it should be more consistent. I did play around a bit with how much yielding is done during the last mass test, and I think I found a good value where the amount of module lag is less while still maintaining reasonable responsiveness of other systems.

Ander
Gallente
Sniggerdly
Pandemic Legion
Posted - 2010.08.23 16:34:00 - [34]
 

The problems you describe in the blog are basic multitasking / multithread issues covered in even the basic university operating system courses.

Stuff like resource starvation is an interesting but very common problem if your devs dont know exactly what they are doing.

Liang Nuren
Posted - 2010.08.23 16:35:00 - [35]
 

Edited by: Liang Nuren on 23/08/2010 16:40:03
Originally by: CCP Veritas

I've been busy profiling it under some basic scenarios and there's some low hanging fruit to be plucked in there. There's significant algorithmic optimizations to be made as well, and those follow you around regardless of what language you're in. Language/platform shift isn't on the table until the algorithms are sound.



Smart man. Out of curiosity, are there ways to offload some of Dogma's functionality to some other process, or perhaps run multiple Dogmas on different cores? Perhaps decompose its effects into things which can be parallelized? Well, good luck.. I'm sure all this already occurred to you. :)

Quote:
In other words, I believe Dogma is doing stupid things, and I intend to beat the stupid out of it before considering giving it rocket boots.


You'd think that rockets would be someone else's job anyway. At least, I hope so! Marvelous job on the dev blog and sleuthing!

-Liang

Quote:
The problems you describe in the blog are basic multitasking / multithread issues covered in even the basic university operating system courses.


This is the most ridiculous thing I've read all day. Simply put: the differences in scale and complexity between what's covered in University OS courses and what happens in the real world is staggering.

Obsidian Hawk
RONA Corporation
RONA Directorate
Posted - 2010.08.23 16:35:00 - [36]
 

CCP Veritas - Great read, great blog, epic picture lol. If I could offer a suggestion for future tests using the thin client. Add on 1 smartbomb to all the ships and have that going off at the same time. While the tests do prove impressive, the damage is all one sided, you need to receive as much as you give.

Well anyway it's just a thought. Seeing all damage being given and only 1 target receiving and the target isnt trying to shoot back, seems to make the load unbalanced, if that makes any sense.

Genya Arikaido
Posted - 2010.08.23 16:38:00 - [37]
 

Veritas, that face and the context in which is was made had my sides bursting for 15 minutes. LaughingLaughingLaughing

Know that it will soon become the new "face" (pardon the pun) of future forum picture posts in the context of captions ranging from "OMG!" to "FACEMELT!" and even the dreaded "WHAT HAS BEEN SEEN CANNOT BE UNSEEN!".

LaughingLaughingLaughingLaughingLaughingLaughing


Axemaster
Posted - 2010.08.23 16:47:00 - [38]
 

Edited by: Axemaster on 23/08/2010 16:51:45
Some questions:

You say the system was actually getting overloaded. So where was the bottleneck happening, in the CPU?

If so, isn't this exactly the sort of problem that would benefit most from parallel computing? You are basically running a bunch of identical timing processes. Why not split them into groups and assign each group to a separate cpu core? It seems straightforward enough. It would even allow you to separate them from the location servers, speeding them up.

Actually, why not use a gpu to process this? They seem tailor-made for these kinds of massive parallel computing problems. All you'd have to do is get a specialized driver (not so hard since they are already used for this by many scientists). The cpu could send the processes over to the gpu, which would run the timers for the list of active modules. The cpu would keep track of the positions and velocities of the ships since it's a location node, and I assume that info is all stored in the RAM. The GPU, having access to the same resources, could also run the hit/miss/damage calculations.

This is hardly a spiderweb issue - guns cycling and firing only need to know the relative speed and distance of the target. It seems very easy to split this up.

Am I in the ballpark here?

Edit: The fact that you describe the processes lining up in a queue makes me feel even more that parallel computing is needed here.

Charles37
Posted - 2010.08.23 16:49:00 - [39]
 

Great blog (fantastic picture!); it's very nice to be kept in the loop on what's happening with some of these issues, and I sincerely hope that the dev team continues to do so even after this 'lag blog marathon' has ended.

Mynxee
Veto.
Veto Corp
Posted - 2010.08.23 16:50:00 - [40]
 

Edited by: Mynxee on 23/08/2010 16:51:28
Very interesting read from a process point of view (since I lack the tech savvy to comment on the technical stuff). This is quite possibly my favorite of the Dev Blogapalooza offerings so far--I really enjoyed the "how we figured stuff out" detective novel approach. And that first picture was great!

Thanks to CCP Veritas and all the other dev bloggers for acknowledging the CSM's contributions. ♥


Taudia
Gallente
Sane Industries Inc.
Initiative Mercenaries
Posted - 2010.08.23 16:53:00 - [41]
 

Originally by: Liang Nuren
Simply put: the differences in scale and complexity between what's covered in University OS courses and what happens in the real world is staggering.


Confirming this. As a student current trying to get into the gaming industry, the difference between what is used in the industry and what is considered academically worthwhile is staggering. The thing is, strange as it sounds, academic theory doesn't actually have to work in practice for academics to be happy about it. I am pretty sure that the sleeper AI, while it is great and very complex relative to the ordinary AI and even what is implemented in most games, is pretty trivial from an academical viewpoint. In short, when things have to work, you do something you know you can make work in reasonable time and with reasonable performance, which means that cutting edge theory stuff is a bad choice.

Gnulpie
Minmatar
Miner Tech
Posted - 2010.08.23 16:54:00 - [42]
 

Great blog.

Thanks for all that really nice and detailed information. This is exactly what was missing the last (couple) month(s). Excellent to see that you are back on the right track with communication again. Even though those blogs are certainly a strain on the devs (who likes writing all that stuff? Very Happy), it is certainly good and helpful for everyone.

Thanks!

Hawk TT
Caldari
Bulgarian Experienced Crackers
Posted - 2010.08.23 16:54:00 - [43]
 

Edited by: Hawk TT on 23/08/2010 16:55:18
Great Blog! Nice screenshots Twisted Evil

For those interested in the Python GIL (Global Interpetter Lock) drawbacks and in Python cooperative-multitasking:
David Beazley's - Understanding Python GIL - PyCon 2010

This is David Beazley's presentation @ PyCon 2010...who the f**k is David Beazley?!?! Evil or Very Mad - OK, Google him ConfusedSmile
The presentation is fun, it's informative, it shows how Python behaves on multiple-CPU cores (BAAAAD!) etc.
The presentaiton also covers the issues when threads release control prematurely Embarassed
And there is some hope - a new GIL is being developed, unfortunately for Python 3.2+, but still it could be ported to Stackless 2.7...More or less, it seems that GIL will survive for the time being and will not be "killed" any time soon...

Hopefully CCP is working on something on their own...GIL in Python = No Benefits form Multicore Processing

Cedori
Shiva
Morsus Mihi
Posted - 2010.08.23 17:00:00 - [44]
 

Edited by: Cedori on 23/08/2010 17:00:55
Since you noted that drones seemed to push the server over the edge, wouldn't it be a possible optimization step to "group" drones as guns are grouped?

I know that has some complex issues with drones taking damage/etc, but with most ships fielding 5 drones, it seems that doing this could cut down on the number of calculations Dogma needs to perform by a large margin. Instead of 6 calculations per character involved in a fleetfight, you'd have 2, or a decrease of ~200% (less real effect of course, but still).

P'eon
Posted - 2010.08.23 17:02:00 - [45]
 

Originally by: devblog
And then there was lag! The server I was running against was positively unhappy. But still, modules were reasonably fine - the error was not to be seen. (Remember what I said above about there being many manifestations of lag? Well this was a different sort, but not what we needed to reproduce the Ďstuck module' symptom) Some fantastic sleuthing by my teammate CCP Masterplan lead us to some simple steps that could be done to induce this error. Module delays quickly followed.


out of curiosity, what did you have to do to induce module lag?

Darth Vapour
Posted - 2010.08.23 17:05:00 - [46]
 

Edited by: Darth Vapour on 23/08/2010 17:06:38
June 25th, 2010:
Quote:
EVE is now, from a technical standpoint, in a better state than it has ever been.


And now this:

Quote:
The "rare" error happened 1.5 million times in the month of June, 2010 on TQ.


Does this mean the error occurred even more before the server came to be in the best state it ever was or is the statement in the minutes the nonsense everyone thinks it is ?

Korerin Mayul
Amarr
Posted - 2010.08.23 17:08:00 - [47]
 

Originally by: Ix Forres
embarrassingly parallel tasks


hahahahhaaa! thats my new favorite buzzword ever.

CCP Veritas

Posted - 2010.08.23 17:08:00 - [48]
 

Originally by: Axemaster
You say the system was actually getting overloaded. So where was the bottleneck happening, in the CPU?


Yup.

Originally by: Axemaster
If so, isn't this exactly the sort of problem that would benefit most from parallel computing?


This goes back to something I said earlier in regards to language/platform shift. Going wide on a poor algorithm won't gain you as much as switching to a good algorithm, so that's where we're going to attack first. It's possible that some day making Dogma parallel will make sense as the primary approach, but today is not that day.

Originally by: Axemaster
Actually, why not use a gpu to process this?


Well, for starters our servers haven't got GPUs Smile Past that, GPU processing is good for the true extreme of execution to memory ratios - it takes a lot of time to shovel information into the GPU and get it back out relative to the speed of execution, so you need a problem that has heavy execution per memory to make sense to go to the GPU. I believe it's unlikely that this particular system will fit that profile.

CCP Veritas

Posted - 2010.08.23 17:12:00 - [49]
 

Originally by: Cedori
Since you noted that drones seemed to push the server over the edge, wouldn't it be a possible optimization step to "group" drones as guns are grouped?


Just because they pushed load over the edge doesn't mean they're a lot of load themselves. Think of it this way, if drones add 10% CPU and player modules add 95% CPU, having folks start all modules and then launch drones would make it feel like drones are a lot of load, since having them out pushed the server over 100% CPU. However, any time spent addressing their 10% would be considerably better spent addressing the 95% elsewhere.

Marlenus
Ironfleet Towing And Salvage
Posted - 2010.08.23 17:13:00 - [50]
 

Originally by: CCP Veritas
I believe Dogma is doing stupid things, and I intend to beat the stupid out of it before considering giving it rocket boots.


I am not a programmer, but this made me LOL. A vivid metaphor I can understand.

Vuk Lau
4S Corporation
Morsus Mihi
Posted - 2010.08.23 17:26:00 - [51]
 

Originally by: Darth Vapour
Edited by: Darth Vapour on 23/08/2010 17:06:38
June 25th, 2010:
Quote:
EVE is now, from a technical standpoint, in a better state than it has ever been.


And now this:

Quote:
The "rare" error happened 1.5 million times in the month of June, 2010 on TQ.


Does this mean the error occurred even more before the server came to be in the best state it ever was or is the statement in the minutes the nonsense everyone thinks it is ?



I will have liberty to say that I almost pooped all over Socratesz when I heard the famous "EVE is now, from a technical standpoint, in a better state than it has ever been." so my bet is that was pure example of nonsense

Nareg Maxence
Gallente
Posted - 2010.08.23 17:29:00 - [52]
 

I enjoyed this devblog. Looks like the thin client is going to be cornerstone in sorting out load related issues for you.

Keiko Kobayashi
Amarr
Celestial Janissaries
Curatores Veritatis Alliance
Posted - 2010.08.23 17:33:00 - [53]
 

Edited by: Keiko Kobayashi on 23/08/2010 17:33:53
So this Ďfixí, will it make it so that I donít have to manually unstuck and fire guns anymore? Iím fine with a bit of module lag (that is equally shared by everyone) as long as I donít have to turn off auto-repeat, ungroup guns and click target icons in order to do damage.

Especially for fleets less experienced with large scale battles, this gun bug currently puts them at a serious disadvantage, basically reducing their damage output to near 0. Currently if you donít recognise that the gun bug is in effect in time, you may have lost say, all of your logistics already.

darius mclever
Posted - 2010.08.23 17:49:00 - [54]
 

nice dev blog and keep up the good work.

one note though ... i dont share your observation from the last mass test. as soon as drones were allowed the lag became horrible. before them it was quite playable.

Elsa Nietzsche
Posted - 2010.08.23 17:51:00 - [55]
 

Could you explain what 'processing effects' are?

Axemaster
Posted - 2010.08.23 17:54:00 - [56]
 

Edited by: Axemaster on 23/08/2010 18:00:34
Ok, since you were nice enough to answer my previous question, I'll ask a new one:

How are position/velocity/acceleration vectors handled and updated by the location server? Are they simply updated 100 times per second? Or does the server use some sort of predictive calculus to calculate where the ship should be when something queries it (i.e. it doesn't have a position until it's asked, like in quantum mechanics)?

If it isn't using predictive calculus, surely it would benefit MASSIVELY from switching to it? In fact, I can think of exactly how you would do it, it wouldn't even be hard to figure out. You could essentially remove all nonessential updating from the server, and probably cut cpu usage by 75% or more.

Also, I don't want to sound like a broken record, but all those vector sets cry out for parallel computing...

Edit: Upon think about it further, I'm quite confident that given the relatively simple physics in Eve, you should be able to find analytic solutions in most cases. Possibly even in all cases, though with orbiting it gets trickier.

At the very least, analytic solutions should be possible for a very significant portion of the motion calculations in Eve. They wouldn't have to be updated, only calculated a single time when the server actually needs them. Very efficient.

CCP Warlock

Posted - 2010.08.23 17:54:00 - [57]
 

Originally by: Darth Vapour
Edited by: Darth Vapour on 23/08/2010 17:06:38
June 25th, 2010:
Quote:
EVE is now, from a technical standpoint, in a better state than it has ever been.


And now this:

Quote:
The "rare" error happened 1.5 million times in the month of June, 2010 on TQ.


Does this mean the error occurred even more before the server came to be in the best state it ever was or is the statement in the minutes the nonsense everyone thinks it is ?



Well, technical perfection is like everything else in life, you can always do better.

But let's just say, that there's a little samizdat activity in CCP at the moment to provide an appropriately inappropriate t-shirt in order to properly memorialise that particular entry into the great list of "They can't hit an elephant from ther...." remarks.

Onwards. Upwards. To the stars!

Kesper North
Caldari
Gentlemen of Means
Gentlemen's Agreement
Posted - 2010.08.23 18:00:00 - [58]
 

Can you explain why you are using Maelstroms as a laser platform? Very Happy

Evelgrivion
Gunpoint Diplomacy
Posted - 2010.08.23 18:01:00 - [59]
 

Edited by: Evelgrivion on 23/08/2010 18:01:14
Originally by: CCP Warlock
Originally by: Darth Vapour
Edited by: Darth Vapour on 23/08/2010 17:06:38
June 25th, 2010:
Quote:
EVE is now, from a technical standpoint, in a better state than it has ever been.


And now this:

Quote:
The "rare" error happened 1.5 million times in the month of June, 2010 on TQ.


Does this mean the error occurred even more before the server came to be in the best state it ever was or is the statement in the minutes the nonsense everyone thinks it is ?



Well, technical perfection is like everything else in life, you can always do better.

But let's just say, that there's a little samizdat activity in CCP at the moment to provide an appropriately inappropriate t-shirt in order to properly memorialise that particular entry into the great list of "They can't hit an elephant from ther...." remarks.

Onwards. Upwards. To the stars!



Heres an idea for a shirt quote Twisted Evil

"From a technical standpoint, the game has never been in ҉҉̡̢̡̢̛̛̖̗̘̙̜̝̞̟̠̖̗̘̙̜̝̞̟̠̊̋̌̍̎̏̐̑̒̓̔̊̋ ͡҉҉a better҉҉̡̢̡̢̛̛̖̗̘̙̜̝̞̟̠̖̗̘̙̜̝̞̟̠̊̋̌̍̎̏̐̑̒̓̔̊̋̌̍̎̏̐̑ ͡҉҉ sta͡҉҉ ̵̡̢̛̗̘̙̜̝̞̟̠͇̊̋̌̍̎̏̿̿̿ ҉҉̡̢̡̢̛̛̖̗̘̙̜̝̞̟̠̖̗̘̙̜̝̞̟̠̊̋̌̍̎̏̐̑̒̓̔̊̋̌̍̎̏̐͡҉)-"

Lumy
Minmatar
Sebiestor Tribe
Posted - 2010.08.23 18:01:00 - [60]
 

Edited by: Lumy on 23/08/2010 18:02:09
Originally by: CCP Warlock
Well, technical perfection is like everything else in life, you can always do better.

But let's just say, that there's a little samizdat activity in CCP at the moment to provide an appropriately inappropriate t-shirt in order to properly memorialise that particular entry into the great list of "They can't hit an elephant from ther...." remarks.

Onwards. Upwards. To the stars!

The public demands photos.


Pages: 1 [2] 3 4 5 6 7

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only