Everything is in network byte order unless otherwise noted. Byte order of the GUID is not important.
Apparently, there is some confusion as to what "\r" and "\n" are. Well, \r is carriage return, or 0x0d, and \n is newline, or 0x0a. This is standard ASCII, but there it is, from "man ascii".
Keep in mind that every message you send can be replied by multiple hosts. Hence, Ping is used to discover hosts, as the Pong (Ping reply) contains host information.
Throughout this document, the term server and client is interchangeable. Gnutella clients are Gnutella servers.
Thanks to capnbry for his efforts in decoding the protocol and posting it.
GnutellaNet works by "viral propagation". I send a message to you, and you send it to all clients connected to you. That way, I only need to know about you to know about the entire rest of the network.
A simple glance at this message delivery mechanism will tell you that it generates inordinate amounts of traffic. Take for example the defaults for Gnutella 0.54. It defaults to maintaining 25 active connections with a TTL (TTL means Time To Live, or the number of times a message can be passed on before it "dies"). In the worst of worlds, this means 25^7, or 6103515625 (6 billion) messages resulting from just one message!
Well, okay. In truth it isn't that bad. In reality, there are less than two thousand Gnutella clients on the GnutellaNet at any one time. That means that long before the TTL expires on our hypothetical message, every client on the GnutellaNet will have seen our message.
Obviously, once a client sees a message, it's unnecessary for it to process the message again. The original Gnutella designers, in recognition of this, engineered each message to contain a GUID (Globally Unique Identifier) which allows Gnutella clients to uniquely identify each message on the network.
So how do Gnutella clients take advantage of the GUID? Each Gnutella client maintains a short memory of the GUIDs it has seen. For example, I will remember each message I have received. I forward each message I receive as appropriate, unless I have already seen the message. If I have seen the message, that means I have already forwarded it, so everyone I forwarded it to has already seen it, and so on. So I just forget about the duplicate and save everyone the trouble.
The GnutellaNet has no hierarchy. Every server is equal. Every server is also a client. So everyone contributes. Well, as in all egalitarian systems, some servers are more equal than others. Servers running on fast connections can support more traffic. They become a hub for others, and therefore get their requests answered much more quickly. Servers on slow connections are relegated to the backwaters of the GnutellaNet, and get search results much more slowly. And if they pretend to be fast, they get flooded to death.
But there's more to it than that.
Each Gnutella server only knows about the servers that it is directly connected to. All other servers are invisible, unless they announce themselves by answering to a PING or by replying to a QUERY. This provides amazing anonymity.
Unfortunately, the combination of having no hierarchy and the lack of a definitive source for a server list means that the network is not easily described. It is not a tree (since there is no hierarchy) and it is cyclic. Being cyclic means there is a lot of needless network traffic. Clients today do not do much to reduce the traffic, but for the GnutellaNet to scale, developers will need to start thinking about that.
Connecting to a server |
After making the initial connection to the server, you must handshake. Currently, the handshake is very simple. The connecting client says:
GNUTELLA CONNECT/0.4\n\n The accepting server responds: GNUTELLA OK\n\n After that, it's all data. |
Downloading from a server |
Downloading files from a server is extremely easy. It's HTTP.
The downloading client requests the file in the normal way:
GET /get/1234/strawberry-rhubarb-pies.rcp HTTP/1.0\r\n Connection: Keep-Alive\r\n Range: bytes=0-\r\n \r\n As you can see, Gnutella supports the range parameter for resuming partial downloads. The 1234 is the file index (see HITS section, below), and "strawberry-rhubarb-pies.rcp" is the filename. The server will respond with normal HTTP headers. For example: HTTP 200 OK\r\n Server: Gnutella\r\n Content-type:application/binary\r\n Content-length: 948\r\n \r\n The important bit is the "Content-Length" header. That tells you how much data to expect. After you get your fill, close the socket. |
Header | ||||||||||||||||||||||||||||||
|
Payload: ping (function 0x00) | ||
|
||
|
Payload: pong (query reply) (function 0x01) | |||||||||||||||
|
|||||||||||||||
|
Payload: query (function 0x80) | |||||||||
|
|||||||||
|
Payload: query hits (query reply) (function 0x81) | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
Payload: push request (function 0x40) | |||||||||||||||
|
|||||||||||||||
|