General description
GnutellaNet works by "viral propagation". I send a message to you,
and you send it to all clients connected to you. That way, I only need to know about you to know about the entire rest of the network.
A simple glance at this message delivery mechanism will tell you that it generates inordinate amounts of traffic. Take for example the defaults for Gnutella 0.54. It defaults to maintaining 25
active connections with a TTL (TTL means Time To Live, or the number of times a message can be passed on before it "dies"). In the worst of worlds, this means 25^7, or 6103515625 (6 billion)
messages resulting from just one message!
Well, okay. In truth it isn't that bad. In reality, there are less than two thousand Gnutella clients on the GnutellaNet at any one time. That means that long before the TTL expires on our hypothetical message, every client on the GnutellaNet will have seen our message.
Obviously, once a client sees a message, it's unnecessary for it to process the message again. The original Gnutella designers, in recognition of this, engineered each message to contain a GUID
(Globally Unique Identifier) which allows Gnutella clients to uniquely identify each message on the network.
So how do Gnutella clients take advantage of the GUID? Each Gnutella client maintains a short memory of the GUIDs it has seen.
For example, I will remember each message I have received. I forward each message I receive as appropriate, unless I have already seen the message. If I have seen the message, that means I have already forwarded it, so everyone I forwarded it to has already seen it, and so on. So I just forget about the duplicate and save everyone the trouble.
Topology
The GnutellaNet has no hierarchy. Every server is equal. Every server is also a client. So everyone contributes. Well, as in all egalitarian systems, some servers are more equal than others.
Servers running on fast connections can support more traffic. They become a hub for others, and therefore get their requests answered much more quickly. Servers on slow connections are relegated to the backwaters of the GnutellaNet, and get search results much more slowly. And if they pretend to be fast, they get flooded to death.
But there's more to it than that.
Each Gnutella server only knows about the servers that it is directly connected to. All other servers are invisible, unless they announce themselves by answering to a PING or by replying to a QUERY. This provides amazing anonymity.
Unfortunately, the combination of having no hierarchy and the lack of a definitive source for a server list means that the network is not easily described. It is not a tree (since there is no hierarchy) and it is cyclic. Being cyclic means there is a lot of needless network traffic. Clients today do not do much to reduce the traffic, but for the GnutellaNet to scale, developers will need to start thinking about that.
|