open All Channels
seplocked EVE Information Portal
blankseplocked New Dev Blog: Fostering meaningful human interaction, through testing
 
This thread is older than 90 days and has been locked due to inactivity.


 
Pages: 1 2 3 4 [5] 6

Author Topic

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.17 20:07:00 - [121]
 

And as to particular models, or papers explaining them, I have, and there are plenty of those. Each with their particular applications, constraints etc. But in my previous post I've already had to make a lot of assumptions on what is under the hood of the EVE server, and I've never seen a blog, for example, on the architecture on the EVE server, where the bottlenecks are, and what and how functionalities are serviced, and where.

So at the moment I simply have no way to knowing what is appropriate or not.

You have to remember that these blogs are the first inkling the players got as to what goes on at Iceland when it comes to these matters. Before, the only thing we could piece together from all over is that there is a central database storing the state of the EVE world with some expensive hardware attached to it, and that there is something called a SOL system that uses stackless Python on a single single core (way to go Global Interpreter Lock), with some Destiny Balls thrown in for good measure. Where I don't really know though.

So, I'm already speaking from a severe disadvantage here, and I can't possibly provide a solution to a problem I know practically nothing about. Maybe the coming blogs will provide some more information though. We'll see ...

Stick Cult
Posted - 2010.08.17 20:46:00 - [122]
 

I hate to be so annoying, but could someone from CCP answer what I said back in post #20, a simple yes or no is fine...
Quote:
(I've said this before, but I want to put it where it will be read by devs..) But, I'd settle for less. Following the 2 day downtime, we got 100k skillpoint pools. What about this: for every mass test you get some amount of skillpoints (2-5 million) on your Singularity account. It's enough so that players won't leave TQ so they can go fly their titans on the test server, but enough to encourage participation. In the end, you'd also end up with more people on the test server, which ultimately leads to ~more testing~, which is always a good thing. Can a dev say "no this will never happen" so I can stop talking about it?

CCP Warlock

Posted - 2010.08.17 21:08:00 - [123]
 

Cutting to get under the character limit. I have no problem with the grumpiness, I think it's deserved.
Originally by: Bartholomeus Crane

Well, that depends on what you call distributed computing. Is a mail-server distributed computing? A central database interacting with many clients? A single web-server being polled by many web-browsers?

Personally I don't think so if any of these applications runs on its own, and communication is merely a two-way interaction from the client to the server.

To me distributed computing requires partitioning a single task into several sub-tasks, each being 'solved' on single nodes/cores. I have no desire to discuss definitions here, as there are many for distributed computing (and it's various flavours), but mere communication between processes, or mere number of clients, is not enough to qualify. The communication requirements between processes is what counts, as is the requirement to partition the problem space. [snip]


I suspect we agree on what constitutes "real" distributed systems, rather than what I tend to personally think of as the easy, client server case. But 'mere communication'? Communication is what creates all the issues in these systems, whether it's explicit - messaging, or implicit, two processes accessing shared memory.
Originally by: Bartholomeus Crane

Even if the various functionalities are combined in one outlet (client), as they (and other functionalities) are for the EVE-client, doesn't make it a distributed program. Not when each process is independent from each other except in outlet, as I believe they are in the architecture used in EVE (but correct me if I'm wrong).

Simply put, the fact that you farm out the market to the central shared state memory (the database) while you farm out 'spaceships in space' to a SOL node somewhere else, doesn't make the problem of running EVE a distributed problem as eventually you easily splice off the event-trace for the one from the other, and they essentially never need to talk to each other again. And this is why, in short, I think that at the moment, the EVE server is not a proper distributed program. It is just a rather large collection of functionalities that are handled at different nodes in the cluster.



If you mean distributed in the sense of some of the truly de-centralised protocols, the BGP routing protocol is the first to come to mind, then no it's not. As I mention in my blog the best way to think of Eve is as a superposition of multiple distributed systems. Some of these are client server in structure, some aren't. The market systems for example run on their own nodes, and communicate with the clients and other systems as they need to. That's relatively easy because markets themselves are a distributed system.
Originally by: Bartholomeus Crane

Now, there is nothing inherently wrong with that, unless you look at the inherent in-scalability of handling the different functionalities on one node, and on one node alone. At one point, the computational requirements for that node will surpass what is available. For some functionalities, if implemented correctly, this point will probably not be reached in the foreseeable future. For others, ready solutions will be available (distributed databases, distributed mail-services, etc.)... <snip>


As you say, there are very real limits on what client-server architecture's can do. But let me suggest, there are also equally real, if rather larger, limits on what distributed architectures can also do, especially when you consider the full implications of the Fisher consensus problem.

Eve is a distributed system and we can fairly arbitrarily move CPU load around the cluster, market's can run on Sols for example, it's a cluster optimization not to. Where we can't as in fleet fights, it's because of real time constraints.
Originally by: Bartholomeus Crane

But this leads to two questions:


Just one in my case - why is the character limit so low on these posts :)

See below...

CCP Warlock

Posted - 2010.08.17 21:26:00 - [124]
 

Originally by: Bartholomeus Crane

1. What took you so long to (publicly) acknowledge this? (as the inevitability of this should have been obvious years ago); and
2. Has work been started on this architecture, and if not, why not? (have you really been this preoccupied with fix)



The base architecture of Eve, especially the cluster handling in conjunction with the Stackless tasklet infrastructure, gives us all we need in terms of building blocks for scaling the system. It then comes down to how we use and develop that capability.

So we already have the architecture that you're essentially talking about. The problem is scaling it, both to larger clusters and to multi-core single systems. Work has been in progress for some time in terms of laying the groundwork for those changes, but this isn't the sort of thing that can be done without a lot of preparation, and testing. In some cases, you have to build the systems to build the systems with some of this stuff. It's also necessary to think well ahead, since the lead time on work like this can be considerable.

Better internal monitoring for the cluster certainly got a big push from the long lag problem, but was also something we needed to improve. Certainly a lot of us have lost sleep over the issue (literally when we're monitoring late night fleet fights), but for example, the thin clients were something we needed regardless.


CCP Zulu

Posted - 2010.08.17 21:50:00 - [125]
 

Originally by: Stick Cult
I hate to be so annoying, but could someone from CCP answer what I said back in post #20, a simple yes or no is fine...
Quote:
(I've said this before, but I want to put it where it will be read by devs..) But, I'd settle for less. Following the 2 day downtime, we got 100k skillpoint pools. What about this: for every mass test you get some amount of skillpoints (2-5 million) on your Singularity account. It's enough so that players won't leave TQ so they can go fly their titans on the test server, but enough to encourage participation. In the end, you'd also end up with more people on the test server, which ultimately leads to ~more testing~, which is always a good thing. Can a dev say "no this will never happen" so I can stop talking about it?



That's an interesting suggestion. I'm going to forward it to those responsible for Singularity and see if they're OK with it.

Tlar Sanqua
Gallente
Gallente Defence Initiative
Posted - 2010.08.17 22:03:00 - [126]
 

Edited by: Tlar Sanqua on 17/08/2010 22:04:07
The last paragraph of the blog has raised my eyebrows in hope. All of these blogs are focused on the lag monster - which is great and has to be the number one issue.

But the outcry was over far more than that, and I'm not seeing any movement on those grounds. The general question of incomplete expansions hasn't been addressed. And I don't just mean in terms of bugs, (otherwise I'll read the "we spend 20% of our time fixing those") but as in the feature being reiterated or finished. I could list them all off again, but I would really appreciate something with regards to the reiteration on each expansion and where it stands in a further blog. Any chance?

Trebor Daehdoow
Gallente
Sane Industries Inc.
Posted - 2010.08.18 00:46:00 - [127]
 

Originally by: Bomberlocks
One the CSM members, Vuk Lau, I think it was, stated that fleets will eventually always grow to the maximum number of supported players, and that CCP will probably only solve the lag in a realistic fashion through game design mechanics in the end, i.e. a hard limit on how many people can participate in any given system.

I don't know if Vuk said this, but I am certainly of the opinion that the only long-term fix for lag is game design changes that organically limit engagement sizes -- though I oppose hard limits. In fact, this was one of the core points I made in my election manifesto.

One of the more interesting conversations I had at the Summit was a dinner conversation with Meissa (Stephan). We agreed on the basic idea of organic fleet limitations, but disagreed on how to go about it. He was in favor of changes to sov that required multiple simultaneous engagements (either in the same system or in neighboring systems), so that attackers and defenders would have to split their megafleets into smaller ones. So instead of a 1000v1000 battle, you'd have 4 250v250's.

I was in favor of changes to visibility and fog-of-war that would make super-large fleets both hard to effectively command and also not much more effective that smaller subfleets, so that well-coordinated smaller fleets could have a good chance of defeating a larger, clumsy blob. In order words, the bigger the fleet got, the smaller the value of adding another ship.

We ended up by agreeing that both ideas had merit, just disagreeing about which one was the best. Cool

Anyway, the next day, Warlock gave her presentation, and there was one graph (page 12 IIRC) that came up that had me looking over at Stephan with a "See, told you so!" look... and he was giving me the same look!

And in truth, we were both right AFAICT. The long-term key to lag-fighting is reducing the number of interacting objects (roughly, the number of ships). Stephan thinks physically splitting up the fleets is the big win, and I think logically splitting them up is a better idea. His idea does have the merit of probably being easier to implement in the shorter-term. I would not be surprised if both eventually get tried.

Having written all of this, no doubt Stephan will write a reply explaining how I totally misunderstood everything he said during that dinner.

Rok Qhang'Rawl
Joint Espionage and Defence Industries
Posted - 2010.08.18 01:50:00 - [128]
 

::applause::

Great blog, great discussion. CCP Enthusiasm for the product shining through here. Very Happy

And I am very much looking forward to the Sisi Patcher!
[but :nudge: how long before it is available for Macs?]

T'Amber
Garoun Investment Bank
Posted - 2010.08.18 02:39:00 - [129]
 

Very HappyVery HappyVery HappyVery Happy

You should get a job in marketing :)
Great work.


Cheap Dude
Posted - 2010.08.18 06:00:00 - [130]
 

Well.. glad to have a blog, dont get me wrong.. but stop using words like "EVE players" and "EVE community". The blog felt like it was rewritten by some PR dude.

Also, I like to see some deeper talk about the testing. It makes good read. This blog was written so the less technical could read is also. Right now I am looking at 2 graphs with CPU and memory usage. I hope you guys have more stats like CPU queue length and such

Omal Oma
Shadowed Command
Fatal Ascension
Posted - 2010.08.18 11:29:00 - [131]
 

Hello!

Something I'd like to see in the future...

Brackets shown are based on overview selected types.

and/or

Hide/Show Friendly Brackets
Hide/Show Hostile Brackets
Hide/Show Frinedly Drone Brackets
Hide/Show Hostile Drone Brackets
Hide/Show Structure Brackets

I many times would like to visually see where a hostile vessel is in relation to me, but really don't need to see their drones. The ability to turn structures on and of would be very helpful during POS warfare or if it's just a fleet fight to disable the POS brackets but still be able to see friendly ships.

Thanks for reading and again, thanks for the updates on the blogs. It's cool to hear you guys are all hard at work. I will do my best to make it to Mass Testing, but my Alliance is hard at work defending our space...

Regards,
Omal

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.18 11:51:00 - [132]
 

This is going to be a multi-post reply, again to get under the character limit.
Originally by: CCP Warlock
Originally by: Bartholomeus Crane

Well, that depends on what you call distributed computing. Is a mail-server distributed computing? A central database interacting with many clients? A single web-server being polled by many web-browsers?

Personally I don't think so if any of these applications runs on its own, and communication is merely a two-way interaction from the client to the server.

To me distributed computing requires partitioning a single task into several sub-tasks, each being 'solved' on single nodes/cores. I have no desire to discuss definitions here, as there are many for distributed computing (and it's various flavours), but mere communication between processes, or mere number of clients, is not enough to qualify. The communication requirements between processes is what counts, as is the requirement to partition the problem space. [snip]

I suspect we agree on what constitutes "real" distributed systems, rather than what I tend to personally think of as the easy, client server case. But 'mere communication'? Communication is what creates all the issues in these systems, whether it's explicit - messaging, or implicit, two processes accessing shared memory.


The 'mere communication' is a reference to the earlier reference of there being inter-communicating processes over 200 nodes with ~50,000 clients. The point here being that the fact that different processes are communicating with each other, and many clients, on it's own, does not a distributed system make. In this sense 'mere communication' is simply not enough. The hierarchical client-server topology is then merely an evocative means to exemplify this point.

But, in my view, Information Theory has a tendency to make too much of topologies alone, leaving the requirements placed on communication by the system somewhat ignored, or rather, assumed.
For example, even if the topology were instead a full mesh, or more appropriate (and far more interesting) a scale-free small-world topology, that in itself would not make it a distributed program either. The topologies would 'merely' give the system enhanced, or different communication capabilities. But if the requirements of the system using the topology still do not use these enhanced capabilities, or if the topologies are 'merely' designed to ease communication bottlenecks (a problem in hierarchical topologies), that still says nothing of the distributed nature of the system itself. It can still be a monolithic single-process still using the topology to have a one-to-one relationship with other (monolithic single-process) services.

The focus is on distributed computing, and, in my view, not on communication, or communication protocols. The communication protocols, and the communication topology flows from the requirements thrown up by partitioning the computation load, and are 'merely' designed to facilitate or ease those required there. As such, I would argue that although communication creates issues of its own (mostly focussed on how to do the communication in the most appropriate way), the communication method, and the topology used, are, or should be, in turn, based on the requirements of the underlying (distributed) system. They are, or should be, a means to an end, and although they throw up interesting conundrums to solve by themselves, and although they do throw up constraints to what is possible or efficient (meaning there is, or should be, a feedback between the two conceptual entities), they in themselves simply do not constitute a distributed problem.

Unless ofcourse you wish to describe the problem of handling the communication itself as a distributed problem in itself. Which, under certain (you could say most) circumstances is true, but also wasn't implied earlier. And for this type of recursive discussion of semantics I have neither the time nor the inclination.

Omal Oma
Shadowed Command
Fatal Ascension
Posted - 2010.08.18 12:03:00 - [133]
 

Originally by: Trebor Daehdoow
really good stuffs :)
We were discussing this exact thing in our TS3 server the other day.

A shared health infrastructure system where multiple nodes need to be hit simultaneously.

The counter was brought up that repping would be horrible and simply "buffing" repair modules would imbalance fleet encounters when repairing other ships.

The solution was given to add a skill that allows for the repairing of structures and a bonus to doing this in fleet fights. Modules which repair would do so differently on structures and the amount of ships needed to do so could be lessened in order to help reduce the lag on the split fleets.

Other ideas being passed around were ideas to decrease the need for things like fighters/drones.

1. Double the damage they do, half the amount you can field.

2. A new module that is similar to triage, but only works for structures. It doesn't make your ship dead in the water, allows you to be mobile but increases your ability to repair structures, removes the ability to control drones and doesn't make you a sitting duck.

Basically, find ways to make caps/supers/etc... more mobile in systems. They're already there and can't exactly just jump though the gates. Their out is to be cyno'd. Forcing them to be focused on attracts a crowd thus adding to "Grid lag" and overall poor game performance.

Overall, the consensus was that the things we're repairing and destroying have far to much health forcing us to be concentrated in a single area for too long. While I understand these things should be difficult to destroy, having that much health on an ihub/tower only forces to concentrate the combat to a location.

The majority of the people I spoke to said they'd welcome more "timers" to compensate for the less health.

So, fleets could be shorter, faster paced and more mobile... For example... If it currently takes 3 reinforcement timers... split the health of the specific timers into halves and make it 6 timers... but with each timer having far less health to repair/destroy.

There were a lot more ideas... the biggest winner was some form of a shared health system and systems who had Sov V were bonused with either higher resists or more health... benefits for holding sov longer etc...

All I got for now... sorry for the rambling :)

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.18 12:23:00 - [134]
 

Originally by: CCP Warlock
Originally by: Bartholomeus Crane

Even if the various functionalities are combined in one outlet (client), as they (and other functionalities) are for the EVE-client, doesn't make it a distributed program. Not when each process is independent from each other except in outlet, as I believe they are in the architecture used in EVE (but correct me if I'm wrong).

Simply put, the fact that you farm out the market to the central shared state memory (the database) while you farm out 'spaceships in space' to a SOL node somewhere else, doesn't make the problem of running EVE a distributed problem as eventually you easily splice off the event-trace for the one from the other, and they essentially never need to talk to each other again. And this is why, in short, I think that at the moment, the EVE server is not a proper distributed program. It is just a rather large collection of functionalities that are handled at different nodes in the cluster.


If you mean distributed in the sense of some of the truly de-centralised protocols, the BGP routing protocol is the first to come to mind, then no it's not. As I mention in my blog the best way to think of Eve is as a superposition of multiple distributed systems. Some of these are client server in structure, some aren't. The market systems for example run on their own nodes, and communicate with the clients and other systems as they need to. That's relatively easy because markets themselves are a distributed system.


Actually, as I said above, I'm not particularly focussed on communication itself, or the protocols used to bring that about. Although interesting in their own right, to me, they are just a means to an end. They do bring their own constraints into the mix for the underlying system, but the focus, because I think that is where the real bottleneck lies, or eventually will be, is still primarily concerned with the underlying system.

And for me, the EVE server is not super-position of multiple distributed systems, but rather, a super-position of multiple monolithic systems. Each system in the conglomerate handles it own specific functionality without much need for communication between them outside what can be rather straightforwardly provided for by the database acting as the shared state memory, whose access is purely event-based.

You call the market a distributed system for example. I beg to differ. The market in EVE is in essence pre-partitioned. No 'internal' partition is required by design. Nothing wrong with that, in fact, it throws up artefacts in the logical world that are desired. But, and this is important conceptually, the different market-nodes are spread out over the computational resource 'merely' to provide the computational resource to mask a one-to-one relationship from the client (through proxy I assume) to a one-to-one relationship to the shared state memory (database, also perhaps through proxy). As far as I'm aware there is no need for there to be a market-to-market relationship. Topologically speaking this is a three-tiered hierarchical network with no inner partition of the computational load for a market. As I see it, and correct me if I'm wrong, this is again a client-server architecture with an extra inner tier. How is this distributed computing? Colloquially speaking, there is, again, no cutting of the cake because the cake came pre-cut.

And to re-iterate, there is nothing wrong with this, as in this case, the whole market system is backed up by a shared state memory system (database) anyway, and there are pretty straightforward solutions to be found if a bottleneck occurs there (distributed databases for example). But to call the market system distributed, in my opinion, goes too far.

Verkan Vall
Posted - 2010.08.18 12:56:00 - [135]
 

Skillpoint rewards: Even 1 mil skillpoints is probably too high, 50-100k, or 10% of total skillpoints if under 1m, might be a bit more balanced. Or, do it as a lottery; 1 player gets 10 mil, the rest get 100k.

Discouraging blobs: Add a 0.01% chance per ship that a shot targets a random ship on the grid. With allied pilots on grid, there'd be a 1% chance of hitting one of them. With 1000, it would be 10%. Of course, you'd want to suppress this in highsec, and have it appear on killmails as 'mistargetted', rather than the source pilot. The exact percentage could be tweaked, I wouldn't go above 0.05%.

Reducing grid load: Would it be possible to not create wrecks and corpses until after combat has stopped, or cooled down to only a few hostile ships? This could reduce the number of items displayed substantially, and also would give a strong incentive to holding the field.

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.18 12:59:00 - [136]
 

Originally by: CCP Warlock
Originally by: Bartholomeus Crane

Now, there is nothing inherently wrong with that, unless you look at the inherent in-scalability of handling the different functionalities on one node, and on one node alone. At one point, the computational requirements for that node will surpass what is available. For some functionalities, if implemented correctly, this point will probably not be reached in the foreseeable future. For others, ready solutions will be available (distributed databases, distributed mail-services, etc.)... <snip>


As you say, there are very real limits on what client-server architecture's can do. But let me suggest, there are also equally real, if rather larger, limits on what distributed architectures can also do, especially when you consider the full implications of the Fisher consensus problem.

Eve is a distributed system and we can fairly arbitrarily move CPU load around the cluster, market's can run on Sols for example, it's a cluster optimization not to. Where we can't as in fleet fights, it's because of real time constraints.


Naturally there are limits on what distributed architectures can do. In fact, some problems can never be efficiently distributed but require distribution only when computational resource requirements out-run what is available on a single core. So there are two trade-offs, each in their own domain: one of efficiency (can I gain through cutting the cake?), and one of efficacy (should I cut the cake so I can still solve the problem?). But cutting the cake is a requirement in my view, and is also where in my view things become interesting. In the case of EVE, it is also where the focus should be, with the communication strategy (topology, protocol) following behind.

And as far as I know, you are still incapable to dynamically moving load around the cluster, so your load-balancing act is still entirely static. Barely adequate for a logical world as unpredictable and dynamic as EVE. There is nothing wrong with setting up your monolithic services on different parts of the cluster as such, but one can hardly call that a load-balancing algorithm now, can we? Moreover, since none of the services can be 'load-balanced' without moving it whole, the granularity also, one could say, leaves something to be desired. Lets face the ugly truth here, the whole distributed system[sic] is rather old-fashioned, if not an over-statement. For one, it lacks the dynamic interaction basically par-for-the-course to be otherwise. I'm glad to hear that more resources have been made available for developing it (for now, mostly debugging though), but there are, lets face it, still many functionalities missing for it to be considered a distributed system in what I view the true meaning of the word. What worries me most is that this fact remains unacknowledged, or is seen as some near-mystical future goal, while in actual fact, most of things we're talking about here were common currency even when the EVE system was being developed, and many, many developments have been made since.

What I'm getting at here is that CCP, on the highest levels, need to recognise that what they are debugging now is basically old-hat, and that it is in their (and subsequently our) interest that even after the current 'troubles' have been addressed, a firm commitment must be made to, progressively, move this old lady into the contemporary world of computer science. Because if this is not done, we'll be in the same pickle pretty soon, and you'll be writing yet another set of blogs to placate a set of disappointed players yet again. Much has been made of the term technical debt lately. This technical debt is, what, 10 years overdue, and so is, in perception at least, any pro-active initiative on this. And in my honest opinion, the supersedes a focus on new features, however shiny, by a long long way, as it goes directly to the long-term viability of EVE.

Tankamos
Posted - 2010.08.18 13:15:00 - [137]
 

Originally by: CCP Warlock
.... The problem is scaling it, both to larger clusters and to multi-core single systems.


Does this mean that EVE is running on a single thread per machine? To me that sounds like a real neglect of CPU. Because if you already have multi-thread, it's not that big a step towards multi-core as well.

Jebidus Skari
Comply Or Die
Drunk 'n' Disorderly
Posted - 2010.08.18 13:25:00 - [138]
 

Edited by: Jebidus Skari on 18/08/2010 13:42:20
good info i think thats been the problem, people think that because your not saying anything, nothing is happening..

Anyway can the devs for certain say that the software architecture or platform thats not the problem. ie Python?

I mean this may of been fine 5 years ago, when there wasnt much load, but now could be a different story.

What progress has the devs made into investigation of 'poor coding' techniques both in Python and the database environment? As much as devs hate to admit it, theres always improvements and tweaks to be made that can improve performance dramatically.

For example the new mail system, if you open that it takes ages to load, is this because every single email is being loaded or is this paged? if all emails being loaded then why not just page the data? things like that.
or is the actual SQL or Python code inefficient. I know it can be a nightmare to go through all this, but it can make a dramatic difference.

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.18 13:29:00 - [139]
 

And finally ...

Originally by: CCP Warlock
Originally by: Bartholomeus Crane

1. What took you so long to (publicly) acknowledge this? (as the inevitability of this should have been obvious years ago); and
2. Has work been started on this architecture, and if not, why not? (have you really been this preoccupied with fix)



The base architecture of Eve, especially the cluster handling in conjunction with the Stackless tasklet infrastructure, gives us all we need in terms of building blocks for scaling the system. It then comes down to how we use and develop that capability.

So we already have the architecture that you're essentially talking about. The problem is scaling it, both to larger clusters and to multi-core single systems. Work has been in progress for some time in terms of laying the groundwork for those changes, but this isn't the sort of thing that can be done without a lot of preparation, and testing. In some cases, you have to build the systems to build the systems with some of this stuff. It's also necessary to think well ahead, since the lead time on work like this can be considerable.

Better internal monitoring for the cluster certainly got a big push from the long lag problem, but was also something we needed to improve. Certainly a lot of us have lost sleep over the issue (literally when we're monitoring late night fleet fights), but for example, the thin clients were something we needed regardless.


And much of this I can neither confirm or deny. I simply have never been given enough information to do either. I'm quite sure that the basic architecture, or lay-out will provide a framework for extension, but the paradigm shift of moving from monolithic service provision, however multi-threaded, towards a distributed multi-core system, including all the architectural support structure required to facilitate this, is not something to under-estimate. And with the danger of reverting to cynicism again, the apparent in-ability of CCP to get this, rather straightforward architecture, work right, it does make one wonder.

The development of the analytical and testing tools for looking at the current system are certainly not lost when progressively moving to the new architecture, even if, it has to be said, they come about somewhat late in the day. But there are many hurdles still to overcome, and many choices, it seems to me, to be made. These require beyond mere time and testing, also sufficiently educated talent to comprehend and enact, and that means investment, and investment over the long term. It is good to hear that you have been thinking about such things, even though this isn't reflected in perception as such. But is there any solid commitment, by anyone really , that can be given of this actually being a priority, even if long-term, with the appropriate investments attached? Because at the moment previous actions by CCP do invite one to think that this fad will blow over again, and that resources will subsequently be refocussed on 'other' issues without any real progress on this underlying issue being made. And that will simply not work, and time is running out.

Is there anything that can be said about this?

Anyone? Anyone? Beuller?

Xiaodown
Guiding Hand Social Club
Posted - 2010.08.18 13:45:00 - [140]
 

Originally by: Taudia
This is a much better question. I haven't heard of a large scale fleet fight that didn't have monstrous lag in months, so the problem (if my impression is correct) probably occurs in every fleet fight, which is a clue as to where the problem (in part, at least) may lie. No doubt CCP are already well aware of this correlation though.


It could be something that they've done with the code base; it could also be a normalization of tactics across New Eden.

Remember, a "new and exciting" way of fighting is only novel for about a few months before everyone else learns how to do it, and how to counter it.

So, it could be a situation where there were "mostly-non-laggy" fleet fights, but some certain thing that one group of marauders did will trigger the lag. As time goes on, people adopt the marauders' tactics, triggering lag "all the time".

For example, what if "large fleet fights without T3 ships" still worked fine with no lag? Would anyone know that there was a scenario where there was a lag free fleet fight? Has there *been* a large fleet fight in the recent months without any T3 ships attending? I'm sure that's not it, but it could be something like that.

~X


Wolfcheck
The Ice Cartel
Posted - 2010.08.18 16:02:00 - [141]
 

Originally by: Bartholomeus Crane
And much of this I can neither confirm or deny. I simply have never been given enough information to do either. I'm quite sure that the basic architecture, or lay-out will provide a framework for extension, but the paradigm shift of moving from monolithic service provision, however multi-threaded, towards a distributed multi-core system, including all the architectural support structure required to facilitate this, is not something to under-estimate. And with the danger of reverting to cynicism again, the apparent in-ability of CCP to get this, rather straightforward architecture, work right, it does make one wonder.


It seems to me that you are giving arbitrary definitions, assuming that EVE cluster adheres to them, no: claiming that it does, and then proclaimimng it's "monilithic-ness" based on those assumptions and definitions, yet you say yourself that you have not (been given) enough information about it.

TBH, I'm not much interested in how you see the EVE cluster, and I have a feeling that if CCP was, they'd be hiring you (or offer you a counseling contract of sorts).

I've got my own technical knowhow, but I fail to understand exactly what you're pointing at. You claim that EVE is a series of monilithic systems yet you just barely hint at "underlying structures". Any attempt of Warlock to get to a specific point you have thwarted going into a vague abstract consideration of theoretical systems.
You say eve's not a distributed compunting system, yet when objections are made you just dismiss them and refuse to consider the subject, preffering vague objections.

I say one thing is asking questions to understand the issue. Another is shooting abstract assertions on distributed computing theory just to make yourself look good. And to be totally honest, your efforts here look to me to be pretty far on the "look good" side of the scale.

So, if you're genuinely interested in a technology discussion, care to explain why _exactly_ you think EVE's a "pile of monilithic systems" as opposed as a "pile of distributed systems"?

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.18 17:51:00 - [142]
 

Well, Wolfcheck, the reason why I specifically do not apply arbitrary definitions, and instead try to explain my view of the system, is that there are many definitions in this area, and ever other person that you talk to about them seems to disagree with what they really mean. So, to me, it seems rather meaningless to discuss whether this is a distributed system, a parallel system, a concurrent system, or a networked system, if no one can agree to what each of these terms really mean.

Now if you say that it takes up a lot of space not to use (spurious) definitions and instead discuss salient properties of the system and then apply a description; then I wholeheartedly agree. But there is little else I can do but invite counter-argument to my opinions/description. Perhaps that will lead to a better understanding of what the system really is. At the moment, as I said, we, the players, simply don't know much about it and by comparing perception to reality (or rather the two perceptions), some progress may be made.

As such you might not care much about my opinions, and you have every right not to, but you might be interested to find if they are shared, or not, by CCP (Warlock). But I don't know, you might not be interested in that either.

As to answer your question, I hoped I was more clear about it than apparently I am. I think that the EVE server is a collection of monolithic services because, apart from sharing a single shared state memory in the database, there is little or no communication interaction between the different services, and each service runs in its own process, on a single core. As I said; I could be wrong about that, I don't know enough of the system to say that with absolute certainty, but I have as of yet heard nothing to the contrary, so for now I go with that theory.

To me, that is not a distributed system, just as much as a cluster running many independent web-servers is not a distributed system. And that is not just because of the lack of a communication requirement between the services, but because the independent services themselves are not partitioned over different core/nodes of the cluster. The example of the markets, as I tried to explain, is in that respect not a distributed service either, because the markets in EVE come pre-partitioned on single cores and they don't (necessarily) need to communicate with each other directly and are 'merely' front-ends for the database (internally probably similarly partitioned) they rely on. An analogy there would be two independent web-servers sharing a single database in a cluster. Those two web-servers, in my view, still don't make for a distributed system.

Now you may argue that you disagree with my definition of a distributed system, but I think you'll find that the requirement to partition the computational load (with the communication requirements that entails) is not entirely unreasonable before something can be called a distributed system. Now, I know that there are those who have a vested interest in calling everything that involves more than two computers a distributed system, but that doesn't mean I have to go along with that interpretation. In fact, I think it is far to broad, far to vague; to the point of being meaningless, and this, again, goes back to what I said about those definitions earlier.

As for trying to look good, or trying to get a contract out of this. I can assure you that I already look pretty good, quite hansom in fact; and that I'm already under contract in this field, and quite busy doing work that I enjoy quite a lot. Furthermore, I doubt that my esteemed and learned colleagues will be much impressed with me posting some words on an internet spaceship forum. And given that I had to write this clarification, I doubt that I'll be able to claim any brownie points for fulfilling my educational obligations either. I just happen to like this game, and solving these types of problems.

Flying Tiger
I love you all
Posted - 2010.08.18 17:57:00 - [143]
 

Originally by: Omal Oma
The counter was brought up that repping would be horrible and simply "buffing" repair modules would imbalance fleet encounters when repairing other ships.


I would say give each module a bay full of rep drones. Some random time after the module gets damaged the repair drones come out and work on it. If you don't want those modules repaired then shoot the drones.

Omal Oma
Shadowed Command
Fatal Ascension
Posted - 2010.08.19 02:54:00 - [144]
 

Originally by: Flying Tiger
Originally by: Omal Oma
The counter was brought up that repping would be horrible and simply "buffing" repair modules would imbalance fleet encounters when repairing other ships.


I would say give each module a bay full of rep drones. Some random time after the module gets damaged the repair drones come out and work on it. If you don't want those modules repaired then shoot the drones.
nooooooo

The object is to lessen the use of drones, not increase.


Estimated Prophet
Ye Olde Curiosity Shoppe and Trading Company
EVE Trade Consortium
Posted - 2010.08.19 06:10:00 - [145]
 

Originally by: Omal Oma
Originally by: Flying Tiger

The object is to lessen the use of drones, not increase.


No, the object should be to reduce/remove the lag caused by drones.

Wolfcheck
The Ice Cartel
Posted - 2010.08.19 08:00:00 - [146]
 

Quote:
As to answer your question, I hoped I was more clear about it than apparently I am. I think that the EVE server is a collection of monolithic services because, apart from sharing a single shared state memory in the database, there is little or no communication interaction between the different services, and each service runs in its own process, on a single core.<snip>
To me, that is not a distributed system, just as much as a cluster running many independent web-servers is not a distributed system. And that is not just because of the lack of a communication requirement between the services, but because the independent services themselves are not partitioned over different core/nodes of the cluster.


Depending on the depth of the point at which you decide to look from, I can agree or not with what you say here.
A webfarm is just a number of webservers "serving" the same website (or group of them, or domain, or whatnot). It is distributed computing as in the same website is served by different "cores". May or may not on the same machine, or even cluster. The total load is distributed.
On a "per session" basis, it isn't distributed: each session is served by an indipendent "thread", "core" or process, self-contained.

Pretty much in the same way, for what I gather (and I'll take your assumptions as good), EVE is not distributed if you look at the underlying structure, as in every single task is performed (according to your assumptions and an interpretation of warlock's words) is performed by a single core (multi-threaded, not multi-cored). And that is being addressed, it appears.
Yet it IS distributed as one request (sell this item) is served by numerous taks on different areas (where are you? from "central db"; what standings you have and what skills you have? from "character services"; what is the item average price? from the "regional market cluster") of the cluster, thus ensuring the minimal possible load on each of them.
It is well known that distributing computation is not always profitable in terms of time taken. While it makes the system more easily scalable, the communication becomes bottleneck... and that is exactly what CCP (was it warlock, or fallout?) has been pointing at as a reason for some of the issues. So it could be argued that the problem is distribution itself.

TO my understanding, CCP seems to identify the issue causing "long lag" and other latencies in concurrent access more than computational limits. If that's the bottleneck, and it sounds pretty reasonable to me, distributing the load more won't help - in fact, it could possibly worsen the problem.

Quote:
Now you may argue that you disagree with my definition of a distributed system, but I think you'll find that the requirement to partition the computational load (with the communication requirements that entails) is not entirely unreasonable before something can be called a distributed system.


It's ok. I just wanted to fix a vocabulary by which we can discuss on *something*. I had the impression that every time Warlock tried to answer your definition you'd change it slightly as to define the item in a way that defied his explanation.

Quote:
As for trying to look good, or trying to get a contract out of this. I can assure you that I already look pretty good, quite hansom in fact; and that I'm already under contract in this field, and quite busy doing work that I enjoy quite a lot.


Lol. Well answered, and you'll maybe accept my apologies if I've been a bit blunt.

Quote:
I just happen to like this game, and solving these types of problems.


Same. That's why I wanted to try and define a point that I think was kept a bit too vague to reach any informational value. Maybe it was just me.

Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.19 12:01:00 - [147]
 

Yes, to a certain extend, it depends on the level you look at it. For example, take a single computer running a web-server, a mail-server, and a chat-server. Is that single computer doing distributed computing? I my view, no. None of these things are communicating to each other, and only a single computer is involved. Now, is the OS of that single computer doing distributed computing? No, not if only a single core is involved. And even if multiple cores are involved I think it is questionable, because although the OS is moving around processes to multiple cores (presumably), these processes still don't need to talk to each other. But at least the OS is capable to moving processes around different processes to different cores dynamically. And that is still functionality the EVE server does not have.

Similar for you explanation of a market interaction. Yes, in that example different processes are communicating with each other on different nodes, but each of those non-database processes (character services, market service, etc) is just a front-end hiding the underlying database. And for good reason, because it reduces the strain of all services only having to talk to the database directly. You could see those services as a cache so that the database doesn't have to respond for every event. So you've distributed away a 'cache' from the database to reduce communication strain. Nothing wrong with that, but it doesn't partition the computation load of the database itself over different nodes, and so, not distributed computing. That may seem narrow, but I did try to apply the definition as consistently as I can, given the circumstances, and at least, I hope, the meaning of the definition is clear.

But why harp on and on about this distinction? Because of the corollary. Monolithic systems, however much they communicate with other monolithic systems, are inherently in-scalable. For some of the systems in the EVE server this doesn't matter, as we're probably not near where this will be a problem for some time to come. For other systems, I suspect we are approaching this threshold. And that threshold is hard. You can't go beyond it, really, unless you change the problem itself. And, although some suggestions have been floating around on that subject, that, in the end, I suspect, will only move the goal-posts a bit, not remove them altogether. So I suspect that eventually the step towards distributed computing, at least for some services has to be made.

Now, you say this is already being done. I have seen no word of that. I've read 'thinking about it', 'we do plan to do this sometime', and other things. But nothing solid. Sure, the architecture allows for such a progression, and that's good, but given that we're apparently already hitting the CPU resource threshold, this is not something that can be put off for long.

To be sure, in the short term the current system needs to be fixed, no doubt about that. And also, the 'replacement' system is not going to be developed over-night, it can't be. But if this game is supposed to be around for a long time, as Oveur clearly stated, this medium to long term goal has to be acknowledged, and resources set aside for it and now (or soon). You can't expect players to invest in a game in the long term, but not commit to providing the infrastructure for that. Just as you can't ask me, or you, or anyone else, to think up possibilities or solutions for that long term infrastructure/architecture/system/service/whatever prior to making that commitment.

And for now, we don't have any real commitment to 'fixing things', and until we get it, I see this purely as a temporary thing, with the distinct possibility of it blowing over if the immediate need has gone. Thinking about it is just not good enough without at least some certainty that something will be done about it. Give the commitment and I'm happy to think along, but the commitment has to come first. Otherwise I'll just be wasting time.

CCP Warlock

Posted - 2010.08.19 12:18:00 - [148]
 

Edited by: CCP Warlock on 19/08/2010 12:21:59
Originally by: Bartholomeus Crane
Well, Wolfcheck, the reason why I specifically do not apply arbitrary definitions, and instead try to explain my view of the system, is that there are many definitions in this area, and ever other person that you talk to about them seems to disagree with what they really mean. So, to me, it seems rather meaningless to discuss whether this is a distributed system, a parallel system, a concurrent system, or a networked system, if no one can agree to what each of these terms really mean.


For innocent bystanders, what Bartholomeus says above is absolutely correct. There is enormous confusion of terminology in this field, and this certainly doesn't help when trying to discuss the intricacies of what is one of the most complex areas of computer science today. This is also why I tend to ask for specific examples, since it makes it easier to pinpoint some of the general confuscation.

My definition of a distributed system is any set of independent processes that use communication to work on some form of shared task. This is a little broader than the current Wikipedia definition of "A distributed system consists of multiple autonomous computers that communicate through a computer network.", but it's essentially the same thing. Interestingly, a strictly parallel system with no communication at all, would not under this definition be a distributed system. However it can be argued that the communication of tasks to the independent processes, and the gathering of results back, does in fact make it a distributed system, albeit a very simple client-server one.


Originally by: Bartholomeus Crane

As to answer your question, I hoped I was more clear about it than apparently I am. I think that the EVE server is a collection of monolithic services because, apart from sharing a single shared state memory in the database, there is little or no communication interaction between the different services, and each service runs in its own process, on a single core. As I said; I could be wrong about that, I don't know enough of the system to say that with absolute certainty, but I have as of yet heard nothing to the contrary, so for now I go with that theory.


I think some of the issue here boils down to what level of abstraction are we talking about? A system can be monolithic at one layer of abstraction, almost invariably it is at the top level, but distributed below it. The chat service is "monolithic" in the sense there's one chat channel, and client-server at that local point. It's also completely distributed across the entire cluster, SOL's handle their own local chat, and other chat channels are off on their own machines. Local chat membership also relies on communication with other systems. Trading Market's (in Eve and in real life) are as you say, intrinsically a 3-tier simple hierarchical system. But there is implicit communication between markets as money flows between them. So Markets could also be described as a distributed system providing a real-time solution of continuous varying supply-demand-price relationships.

It's certainly true that the database does maintain a single point of state cluster wide, which is very nice to have if you've worked in systems without that luxury, but it's a scarce resource. We massively cache and avoid using it unless we have to. As does everybody else. Distributed databases are a whole little holy war all of their own, but the Fischer consensus problem tells us something very important there - fundamentally, they're an oxymoron. It is impossible to guarantee consensus with asynchronously connected databases, so you have work around it.

...


CCP Warlock

Posted - 2010.08.19 12:46:00 - [149]
 

Originally by: Bartholomeus Crane

To me, that is not a distributed system, just as much as a cluster running many independent web-servers is not a distributed system. And that is not just because of the lack of a communication requirement between the services, but because the independent services themselves are not partitioned over different core/nodes of the cluster. The example of the markets, as I tried to explain, is in that respect not a distributed service either, because the markets in EVE come pre-partitioned on single cores and they don't (necessarily) need to communicate with each other directly and are 'merely' front-ends for the database (internally probably similarly partitioned) they rely on. An analogy there would be two independent web-servers sharing a single database in a cluster. Those two web-servers, in my view, still don't make for a distributed system.


I disagree with your characterisation of web servers for example as not being distributed systems, although I'm not unsympathetic with it. I agree there is a clear distinction between the simple client-server, and strictly hierarchically based systems; and the more complex mesh and partial mesh systems, like say packet switched networks, but I make it based on the topology of the inter-node communications for the system. I believe this is a useful way to make the distinction because it then leads to clear mathematical results on the communication limits that apply to the different topologies, and that is what emerges as a very distinctive limiting factor as these systems scale to large numbers.

It is also I think what you're getting at with the monolithic comments on EVE. I would also agree, if EVE was a monolithic, hierarchically organised program as you describe, then yes it wouldn't scale. It's not. EVE is based on a full mesh communication topology between all 200 odd nodes in the cluster, with clients connecting in a client-server relationship sure, but to the entire cluster. At the highest level of abstraction, clients make RPC calls, which can be executed anywhere, and the client has no awareness of the underlying routing and message handling that allows that to happen. Through the engineering elegance of Stackless those calls are handled in their own light weight process layer, which runs on each node. Those calls are structured within different services true, but this is a reflection of software organisation and structure, the services themselves can be run fairly arbitrarily across the cluster, and do indeed communicate with each other. At any given instant, individual clients are potentially communicating with several nodes in the cluster to provide game functionality, distribute load, and scale to the numbers we do, and that to me is what distributed computing is all about.

You've suggested several times that Eve needs to move into "the contempory world of computer science", but you've provided no specific examples of what you mean by this. If you could, it would probably help the discussion. Distributed Systems as a field, in the fullest sense of the word goes back over 50 years now, and if a circuit switched telephony guy reads this, we'll see an argument for it going back much further than that. The new distributed systems like Eve, are built using the older distributed systems like packet switched networks, and share many of their characteristics, and their fundamental theories. A fair amount of what is "new" at the moment, is quite often rediscovery from the perspective of the hard core networking folks.

For innocent bystanders again. This is a very complex field. Whatever disagreements I might raise, I do very much appreciate Bartholomeus and other posters here taking their time to discuss it.


Bartholomeus Crane
Gallente
The Crane Family
Posted - 2010.08.19 13:55:00 - [150]
 

Well, that is the connectionist view of distributed computing. Not that I dismiss that view of distributed computing, but in my view it over-emphasises the importance of communication in distributed computing, and, following Information Theory in that respect, it tends to neglect the importance of partitioning the problem-space and the computational aspect of that. In my view, while communication still plays an important role in distributed computing because its various methods provide constraints on how, what, and where to partition, the fact that processes communicate, on its own, is not a, or the sole, qualifier for calling something a distributed program. In my opinion, therefore, the Wikipedia definition of distributed computing is too broad, not restrictive enough.

But it is true that at different levels of abstraction you can call a system either monolithic or distributed according to the various definitions of what a distributed system is supposed to be. Looking at the whole architecture, then yes, you can argue that most systems are monolithic, but then what have we gained in insight into those systems when we do that?

But at the architectural level of the EVE cluster, we can also see a partition of whole functionality into distinct and separable functional entities, communicating with each other, or not, or with the shared memory system (database), as an entity on its own. From the connectionist view, that is enough to call the whole system distributed already, for we have split over different computers and they are communicating with each other. Even using the more restricted definition this is arguably a distributed system because the overall system is partitioned and communication between the various parts is necessary.

But then again, what have we gained in insight? Not that much I would say. The whole system, in essence, came pre-partitioned, pre-packaged if you will, in separate functional entities, and the partition essentially is out of convenience. The distribution, if you boil down to it, exists in the interface, and is static there, different static setups (during downtime for example) possible notwithstanding. Although there exists a connectionist/communication problem in making every process, of sub-functioniality, talk to each other correctly and efficiently, interesting in its own right, I would consider that a distributed problem on same level as using divide and conquer for partitioning up a single program (whole functionality) into separate functions, just with more fussy communication methods.

Instead, I think it is more worthwhile to take the architecture as a given, see the communication problems, I suspect you're dealing with now, as a given useful application of the divide and conquer paradigm, and instead consider what distributed computing can provide for individual functional entities. And on that level of abstraction, I think it safe to assume that none of the functional entities is distributed.

That may sound picky, or trying to be right regardless, but that's not the reason I want to do this. The reason is that distributed computing can provide a lot of things if we look at the architecture as a collection of monolithic processes.

One thing we can do is apply divide and conquer on the efficacy problem. That is to say, assuming that the communication layer is working correctly, we can ignore a lot of those functional entities because they do not constitute a bottleneck. Not really a distributed computing thing, but we've reduced the overall problem (getting better or any performance) a lot with that. Then we can address things on the architectural and the functional level.

For one, the static setup distribution is a restriction. If dynamic movement of the functional entities to computational resources were possible, we've not only increased the utilisation of the available resources a lot, but probably moved the scalability goalposts a lot.

Continued ...


Pages: 1 2 3 4 [5] 6

This thread is older than 90 days and has been locked due to inactivity.


 


The new forums are live

Please adjust your bookmarks to https://forums.eveonline.com

These forums are archived and read-only