open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: Fostering meaningful human interaction, through testing
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 [2] 3 4 5 6

Author Topic

Stick Cult
Posted - 2010.08.16 19:36:00 - [31]
 

Originally by: Genya Arikaido
Informative.

Quote:
Improvements to the Planetary Interaction UI, making it easier to use


DETAILS!! If PI gets rid of the clickfest, I may try it again.

No, I think he means before it was released. Oh god, you should've seen it before Tyrannis was released...

CCP Warlock

Posted - 2010.08.16 19:37:00 - [32]
 

Originally by: Bomberlocks
I also have a question about the testing and earlier comments from CCP devs about dynamic allocation of server resources to nodes. How is this coming on and how do these tests influence that process?

Originally by: ccp_warlock

The mass tests generally don't have much influence on this. We monitor load during them, and use that information to look at particular systems if we see them misbehaving themselves. In general we try to load balance systems across nodes, and that's game systems as well as solar systems, but a lot of that is done manually after looking at cluster performance data. More dynamic load balancing is something we are looking at closely, but carefully. There can be some very interesting hysteresis effects if that isn't handled very carefully - the load gets moved off somewhere else, that server then gets overloaded, load gets moved back...


Originally by: bomberlocks

Thank you very much for taking the time to answer my questions. I presume you have a lot of edge cases in this process that need very careful handling. I think a follow up question would be what kind of load prediction would you use to pre-empt server overloading?



My pleasure. I wish it was that simple. A lot of load prediction has to be done at the design stage, where we review designs from the perspective of how much load we expect them to create, and make sure that the cluster can handle that. Sometimes that means a re-design either in the game mechanics or their implementation, sometimes that means more machines in the cluster, often both.

Predicting load real time and adapting to it on the fly is one of the great challenges of distributed computing. It's much harder in practice than theory, except for a small subset of fairly tractable problems.

Originally by: CCP Warlock

Originally by: Bomberlocks
I wanted to edit my question above, but I think I'd rather make it another separate question:

Could you perhaps explain how the process that occurs when a session change occurs, such as jumping through a gate? I don't mean the tiny details, but just an overview of the client-server communication and an overview of what goes on at the server during this process?


Quite a lot is the short answer. Once the client has requested a jump, the server has to arbitrate communication with all the sub-systems that are involved with that client, and make sure that they have all done what they individually need to do to transfer the client to the new system. Some of those systems will be running on the destination system, some elsewhere in the cluster, so there is a fair bit of inter-process communication involved. Since clients can individually be interacting with different systems - some clients are in Fleets for example, most aren't, it can get quite involved.

Originally by: bomberlocks

If it isn't too much trouble, could you tell me how error tolerant these processes are? I ask this in the context of the problem of, for example, a ship sticking on a gate, where the "jump session handler", if I may call it that, seems to think a jump has occurred whereas the process that handles the ship state seems to think it hasn't.


I'll be talking about that a little bit in my devblog coming up.

Meissa Anunthiel
Redshift Industrial
Rooks and Kings
Posted - 2010.08.16 19:37:00 - [33]
 

Originally by: Jack Gilligan
Nice update, but one thing I would urge you guys to do is to NOT put all your eggs in the CSM basket. As we've seen, CSM members have their OWN agendas oftentimes that do not match the goals of the player base as a whole. Which is all the more remarkable that they were pretty much unified in their "FIX THE DAMN LAG" message.

Don't use the CSM as an excuse to not take feedback from people who aren't in it.



CCP welcomes feedback from everyone, CSM and not. So *DO* post your feedback on the mass testing.

Also, thanks a lot Tanis and the plethora of CCP people present during the mass test, keep up the good work (and get us rid of the lag pretty please with sugar on top)...

Genya Arikaido
Posted - 2010.08.16 19:39:00 - [34]
 

Originally by: Stick Cult
Originally by: Genya Arikaido
Informative.

Quote:
Improvements to the Planetary Interaction UI, making it easier to use


DETAILS!! If PI gets rid of the clickfest, I may try it again.

No, I think he means before it was released. Oh god, you should've seen it before Tyrannis was released...


I did. I was right there with everyone else on Sisi trying to figure it out and work up manufacturing lines. OMG what a mess...

CCP Tanis

Posted - 2010.08.16 19:40:00 - [35]
 

Originally by: Stick Cult
Originally by: Genya Arikaido
Informative.

Quote:
Improvements to the Planetary Interaction UI, making it easier to use


DETAILS!! If PI gets rid of the clickfest, I may try it again.

No, I think he means before it was released. Oh god, you should've seen it before Tyrannis was released...


I did indeed mean "before it was released".

That said, I know we're looking at reducing/removing the "click-fest" that's currently required in the PI UI. That's not my team though, so I do not have any details, or timetables to give. Also, I could be totally wrong. Either way, this is definitely something we're aware of and it isn't being ignored.

Genya Arikaido
Posted - 2010.08.16 19:42:00 - [36]
 

Edited by: Genya Arikaido on 16/08/2010 19:42:50
Originally by: CCP Tanis
...this is definitely something we're aware of and it isn't being ignored.


This is good. I was afraid that the logs would show nothing and that everything is working as intended. Laughing

Can you try to answer my bigger question on page 1 though, Tanis? Please?

Sarina Berghil
Minmatar
New Zion Judge Advocate
Yulai Federation
Posted - 2010.08.16 19:45:00 - [37]
 

Very interesting blog.

I sometimes wonder if you are looking into ways of controlling or 'herding' player movements, to spread out load.

It seems to me in most MMOs players have a natural instinct to stack up in the same spot, either due to game mechanics or for social reasons. Very few systems would be able to handle that I guess.

Many MMOs goes to great lengths to try to design a reasonable spread of players, both to keep most of the game world active and to prevent performance bottlenecks in the long run.

Faolan Fortune
Posted - 2010.08.16 19:52:00 - [38]
 

Can't say I understood all of it but it was a good read regardless, long straight to the point dev blogs are the best.

The sisi installer is an idea I very much like, I've put off joining in sisi quite a few times just out of laziness of making it all work. I've also missed a lot of testing events because my memory sucks, I see it a week in advance and think 'Yeah I'll join in, it'll be a laugh' and then realised I missed it the day after.
Those are my excuses for not turning up, but I think you're on the right track with some kind of 'reward' scheme. There's probably quite a few people who see it as a waste of time, from 'I could waste 3 hours doing tests or spend 3 hours making ISK' to 'I don't pay to test'. There are lots of varied reasons why people don't do things, that's why you have to bribe them Laughing

Problem is, what can you reward?


Genya Arikaido
Posted - 2010.08.16 19:57:00 - [39]
 

Edited by: Genya Arikaido on 16/08/2010 19:58:58
Originally by: Sarina Berghil
Very interesting blog.

I sometimes wonder if you are looking into ways of controlling or 'herding' player movements, to spread out load.

It seems to me in most MMOs players have a natural instinct to stack up in the same spot, either due to game mechanics or for social reasons. Very few systems would be able to handle that I guess.

Many MMOs goes to great lengths to try to design a reasonable spread of players, both to keep most of the game world active and to prevent performance bottlenecks in the long run.


A simple (in terms of design, not necessarily implementation) fix for the Jita problem would be storefronts. If you wanted/needed to trade more than 2-3 buy/sell orders in a single station at a time, you'd have to rent a storefront which costs more to rent based on the same algorithm used for offices, + the total quantity of orders (buy and sell) that you want to use it for. (Just be sure that there's more than just 20-30 storefronts per station....omg, what a mess.)

This has 2 effects;
First, mass buyers and sellers spread out more...a LOT more. Due to distances, it may even cause new trade hubs to appear that we've never considered before. Some backwater system 6-7 jumps from Jita with an avg population of 5 would suddenly be useful because it has a lot of stations with manufacturing in it...and cheaply. This is actually the very thing that created Jita, but the catalyst was then instead the removal of the old highways and the much longer travel times to the former trade hub, Yulai (Never Forget! Never Forgotten! Long live Yulai!)

Second, it provides branding for players. "Chirriba's Veldspar Eporium!" sticks in the mind, and if a way was made to link these storefronts by naming them all the same should the player wish to, we could create chain stores. Imagine...for the traders in EVE, owning the EVE Wal-Mart analogue would be awesome.

Anyway...random idea/solution for a given problem. Oh, and did I mention CCP looked at these once before? Yeah, they got dropped from an expansion, promised for Later™, and forgotten. Surprised? Not really.

Stick Cult
Posted - 2010.08.16 19:59:00 - [40]
 

Originally by: Faolan Fortune
Problem is, what can you reward?

sp pools on Singularity! Very Happy

Bomberlocks
Minmatar
CTRL-Q
Posted - 2010.08.16 19:59:00 - [41]
 

Originally by: CCP Warlock
....(abbreviated for clarity)
My pleasure. I wish it was that simple. A lot of load prediction has to be done at the design stage, where we review designs from the perspective of how much load we expect them to create, and make sure that the cluster can handle that. Sometimes that means a re-design either in the game mechanics or their implementation, sometimes that means more machines in the cluster, often both.

Predicting load real time and adapting to it on the fly is one of the great challenges of distributed computing. It's much harder in practice than theory, except for a small subset of fairly tractable problems.


Thanks again. Someone further up brought up the issue of an overloaded system resource having a backwash effect on neighbouring systems, particularly in the 0.0 game regions. If I'm not hogging your attention too much, would it not be a feasible workaround in dynamic allocation to at least move the fellow shared resources on that node somewhere else when a system resource is staggering under load? As someone said, I think players would accept a break of a minute or so while the resources were being moved if they knew what is was for?
Originally by: CCP Warlock

...
I'll be talking about that a little bit in my devblog coming up.
I'll be looking forward to it greatly.

CCP Tanis

Posted - 2010.08.16 20:00:00 - [42]
 

To Genya: I've pointed your question to our Core Server Team, who are far better equipped to answer than I would be.

To Sarina: You've hit on one of my pet interests, actually. I admit it, I'm a huge geek and large scale social behavior gets my juices flowing. The "herding mentality" and how that affects EVE is something I've put a quite bit of thought into over the years. Though I cannot say with any authority what the game designers are thinking on that front currently, as I'm in QA. I can say that it will never be solved by a single thing, it's simply too ingrained of a part of how humans work in large numbers. This phenomenon is certainly accounted for in parts of how we put the mass-testing program together though.

Genya Arikaido
Posted - 2010.08.16 20:03:00 - [43]
 

Edited by: Genya Arikaido on 16/08/2010 20:04:21
Originally by: CCP Tanis
To Genya: I've pointed your question to our Core Server Team, who are far better equipped to answer than I would be.


Thanks! Looking forward to an explanation either in the form of an additional devblog or a post here, or whatever. CCP communicating with us candidly and openly (again) can only be epic on an epic scale of 4TW. Very Happy

CCP Warlock

Posted - 2010.08.16 20:04:00 - [44]
 

Edited by: CCP Warlock on 16/08/2010 20:04:51
Originally by: Genya Arikaido
Informative.

Quote:
Improvements to the Planetary Interaction UI, making it easier to use


DETAILS!! If PI gets rid of the clickfest, I may try it again.


My sole question is...what's being done about the star system resource allocation? This being where theoretically a single star system could consume all node resources in the entire cluster, moving the glass ceiling from the limits of a single node to the limits of the entire cluster's hardware. Naturally you'd want to impose some artificial limit, or only partially apply the dynamic reallocation of resources to underused systems, but if you could do this...fixing lag would be as simple as adding more blades.

From what I understand though, this isn't so much of a limitation of EVE's architecture as it is the design of the cluster's hardware architecture in the way it manages sessions between nodes. The primary problem being in that transferring a session from one node to another while in the same system would create a moment of "loading...please wait" while flying around, as the session data is not globally available due to access speed problems.

Please, correct the crap out of me if I'm wrong. Most of this is just what I've gathered from devblogs and presentations at Fanfests. Though I do have some direct education in HPC in the process of earning my CompSci Degree (with minor in astrophysics...no joke, I love this stuff). I just don't necessarily get all the hardware limitations of an HPC cluster design.



Very good question. At the risk of pre-empting my blog tomorrow.

At the moment we run on single cores so the issue that Zendoren quite rightly raises isn't a problem, yet.

The hardware limitations of any HPC cluster design, or indeed any system that requires any form of communication between processes, are communication itself. For example, the approximate RTT for a message between machines in a data centre at the moment is 0.5ms, plus any induced computational latency on top of that. It depends on how much communication has to be done (which becomes a design question ultimately), as to how practical it is to distribute tasks over multiple machines. In the extreme "embarrassingly parallel" case, there is no communication, and so in theory no limit on the number of machines used, although in practice communicating the task and collecting the results can become an issue for very large numbers.

However, for MMO's in general and certainly for Eve, there tends to be a lot of inter-process communication going on, and the data centre delay then becomes a very real issue.


Seriously Mean
Killer Koalas
R.A.G.E
Posted - 2010.08.16 20:06:00 - [45]
 

Edited by: Seriously Mean on 16/08/2010 20:07:58
nice =D
looking forward to fixes and patches, incarna, whatever you guys make, it will be awesauce
dont believe half the alt-posting crap that went on before =)

it may not be 100%, but as a mesh networking / tella / networking guy, i can completely understand the complexity, the scalability issues and the dangers of "quick fixes" that propagates through every node....
so, take your time and make it right, and dont work too much :P you should have lives too

Genya Arikaido
Posted - 2010.08.16 20:09:00 - [46]
 

Edited by: Genya Arikaido on 16/08/2010 20:10:24
Right, and since you gave that nasty little variable a name, what's being done to reduce the data center delay to the point that running a single star system over multiple nodes becomes possible? What is that delay at now (on avg)? What would be needed to make it feasible for multi-node star systems to where it doesn't interrupt the player experience?

If this preempts your devblog for tomorrow, just say so and I'll wait for the blog. Laughing

Jack Gilligan
Caldari Provisions
Posted - 2010.08.16 20:16:00 - [47]
 

Originally by: CCP Warlock
Originally by: something somethingdark
XBOX HUGE wall of text
Actual content thats new and previously unknown : 4% (much like recent eve expansions)

anyways
here is an idea for a mass test
Load up an old Apocrypha client/server on sissi and then go compare logs


Unfortunately if logs could solve this problem it would be fixed. My eyes bleed from looking at logs so much the last few months.


Your logs show nothing. Everyone knows this.

Some day you all need to look at fixing that log server that never records stuff.

CCP Warlock

Posted - 2010.08.16 20:35:00 - [48]
 

Originally by: Genya Arikaido
Edited by: Genya Arikaido on 16/08/2010 20:10:24
Right, and since you gave that nasty little variable a name, what's being done to reduce the data center delay to the point that running a single star system over multiple nodes becomes possible? What is that delay at now (on avg)? What would be needed to make it feasible for multi-node star systems to where it doesn't interrupt the player experience?

If this preempts your devblog for tomorrow, just say so and I'll wait for the blog. Laughing



In the limit, this is always going to be a constraint, for reasons that go back to Shannon's Law. With current technology that limit is the speed of light. (Negotiating a change there might be a little iffy, given the fallout effects on other physical systems.)

However, let's say hypothetically that quantum communication worked and communication between two nodes really was instantaneous. In that case, the problem would be moved to the computational time it took to process messages and respond, and all the associated queuing and consequent buffer handling. There would still be a limit on the total amount of communication the cluster could support, and that is what ultimately limits cluster scaling.


Herschel Yamamoto
Agent-Orange
Nabaal Syndicate
Posted - 2010.08.16 20:38:00 - [49]
 

Nice blog. I'm definitely glad to hear about the Sisi updater - that will make my life much easier.

You mention that Sisi is getting TQ-like hardware. Does this mean that if we ever get another Armageddon Day on the test server, it might actually be able to handle the load without the extreme lag we've gotten in past tests?

Zanes Shoubje
Posted - 2010.08.16 20:39:00 - [50]
 

And here I was thinking all this mass testing was just a paid evening getting drunk while you tell a couple of hundred fools to jump through a gate while you are laughing your ass off. Now I find out you did something useful. I am disappointed.Crying or Very sad

Dierdra Vaal
Caldari
Veto.
Veto Corp
Posted - 2010.08.16 20:49:00 - [51]
 

Thanks CCP Tanis, I hope we can see more devblogs like this one (from you and others!) :)

Trebor Daehdoow
Gallente
Sane Industries Inc.
Posted - 2010.08.16 20:49:00 - [52]
 

Just a quick note to that the back-and-forth between CCP and CSM about mass-testing has been very productive; it is the longest thread ever in our Super-Secret CSM Forum, and we hope it is the first of many such discussions. Although I still think my Evil Idea for getting huge attendance on SiSi should have been implemented... Twisted Evil
Originally by: CCP Tanis
We've just started allowing bombs during mass-tests (as of the last test on Aug 5), for exactly this reason. It adds complications with people being incidentally podded, but clones work around that well enough. In addition, we've also been using titan bridges for the last few tests. I don't think we manage to get as full use out of them as we could, but we hope that by also including players as FC's this will improve over time.

No idea if it's an easy fix, but why not just up the resists on pods during mass tests? And if you want more titan bridges, then the solution is simple: free titans for CSMs on SiSi. Very Happy

Zurakaru Ze
Posted - 2010.08.16 20:56:00 - [53]
 

I don't quite understand why the warp into system test was not repeated with the flags set. A mass session change is not like warping from point to point within the same system. And I do believe the gate and cyno jumps really need focus, no?

Stick Cult
Posted - 2010.08.16 21:09:00 - [54]
 

Originally by: Zurakaru Ze
I don't quite understand why the warp into system test was not repeated with the flags set. A mass session change is not like warping from point to point within the same system. And I do believe the gate and cyno jumps really need focus, no?

The flags were specific to module cycling.

Alain Kinsella
Minmatar
Posted - 2010.08.16 21:19:00 - [55]
 

Edited by: Alain Kinsella on 16/08/2010 21:19:40
Very interesting, thank you. And I deal with monitoring logs all the time (usually created by scripts that I maintain), so I understand your pain. ugh

Anyway, you (briefly) brought up hardware, and inter-process comms (both in and out of the server). Have you considered looking at other vendors, or at least testing them for throughput improvements? All it takes is a slight difference in how network ports and PCI-E slots are allocated (across various busses) to sometimes make a difference.

Oh, and have you looked at tools such as Splunk to help with data-mining of the server logs?

ElfeGER
Deep Core Mining Inc.
Posted - 2010.08.16 21:25:00 - [56]
 

hmm those memory graphs look a bit strange, is there no garbage collection going on?

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.16 21:34:00 - [57]
 

Edited by: Bartholomeus Crane on 16/08/2010 21:36:08
Originally by: CCP Warlock
Originally by: Genya Arikaido
Edited by: Genya Arikaido on 16/08/2010 20:10:24
Right, and since you gave that nasty little variable a name, what's being done to reduce the data center delay to the point that running a single star system over multiple nodes becomes possible? What is that delay at now (on avg)? What would be needed to make it feasible for multi-node star systems to where it doesn't interrupt the player experience?

If this preempts your devblog for tomorrow, just say so and I'll wait for the blog. Laughing


In the limit, this is always going to be a constraint, for reasons that go back to Shannon's Law. With current technology that limit is the speed of light. (Negotiating a change there might be a little iffy, given the fallout effects on other physical systems.)

However, let's say hypothetically that quantum communication worked and communication between two nodes really was instantaneous. In that case, the problem would be moved to the computational time it took to process messages and respond, and all the associated queuing and consequent buffer handling. There would still be a limit on the total amount of communication the cluster could support, and that is what ultimately limits cluster scaling.


Very well, but stepping back from the world of quantum communication and quickly scurrying back to the world where the Shannon-Hartley theorem still applies: I do hope you realise that running the bottleneck processes pseudo-sequentially on a single core will ultimately - or rather eventually - throw up quite a hard threshold, beyond which you will have to dip your toe into the wild waters that are distributed computing proper.

Even pseudo-dynamically conjuring away all other non-related processes will eventually still only leave you with a single core to run your process on. The limits of which can only be stretched so far with more efficient, or in this case, actually working code. Moore's law is all well and good but the vagrancies of real life (in this case CPU power dissipation) mean that the world has gone multi-core and speed-up through instruction-level parallelism or clock-rate increases is basically an end. Eventually you'll have to, as it were, cut the cake, and multi-thread/process over multiple cores with all that that entails.

I won't deny that a balance will have to be struck between communication and computational bottlenecks but there are many solutions available for this problem. Solutions that are theoretically and practically proven and that do not necessarily rely on expensive or esoteric hardware. Has CCP spend any time thinking or working towards such a proper distributed architecture in the long run? Or are you still fully engaged in plugging the holes of the kiddy pool you're currently swimming in? As it were ...

NatteFrost85
Ministry of War
Posted - 2010.08.16 21:42:00 - [58]
 

if you ppl get that sisi patch tool to work as it is described i might join the mass tests.

Louis deGuerre
Gallente
Malevolence.
Posted - 2010.08.16 22:49:00 - [59]
 

Nice blog, keep em coming.

Originally by: CCP Tanis
Along with the changes to communication above, we're also ramping up our allocation to mass-testing internally. We're allocating more QA resources into the test events, and building better test scenarios. We're getting more time from programmers to be able to speak directly with the CSM and key others about what exactly the issues are, what testing has found, and what our "next step" is likely to be. All told we're seeing a lot of firm commitment from both our QA and Software teams to not only keep working on the fixes, until they're done, but also to continuing to support these tests and the playerbase as we work through our endless effort to continually improve EVE's overall performance.


So...where are these allocations coming from ? I thought all your guys were booked for the next 18 months ?
Are you actually diverting resources toward EVE or just reallocating ? ugh

Verkan Vall
Posted - 2010.08.16 22:49:00 - [60]
 

Edited by: Verkan Vall on 16/08/2010 22:54:24
Interesting and informative. The one minor complaint I have on this is that the graphs might have been formatted better; While it seems clear that the red line is memory usage and the black w/ blue fill CPU, this is not stated on the graph.


As far as participation goes, I think you may be up against two barriers:

Highsec residents, while they could easily participate, may be hesitant to join large fleets, even on the test server, without having some experience in fleets. This might be countered by including a link to a "Following Orders is Easy!" illustrated text or video in the testing news entry or mailing list, giving basic info on how to keep up with a moving fleet and follow targets once you've engaged, along with some easy combat fits.
Nullsec residents, on the other hand, are likely to be worried about having opponents invade if they decide to participate en masse. This is a bit harder to deal with; the only real method I can see is locking down sov and possibly preventing POSes from being damaged during, and probably for half an hour to an hour before and after, the event.

EDIT: As far as predicting what nodes need extra power, might it be possible to track where fleets think they're headed (broadcast destinations and mass autopilot routes)?


Pages: 1 [2] 3 4 5 6

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only