Gnut Manual: Ideas for Advanced Use at MROB

Gnut Manual: Ideas for Advanced Use

6. Advanced Topics

6.1 High Availability Connection Point

Gnut can be used as a publicized connection point into the main gnutella network. Just set max_incoming to somewhere around 10 or 20, and redirect to 1. It might be best not to share any files in this case as the gnutella traffic itself is liable to consume large amounts of bandwidth.

6.2 Scriptable Activity

Using expect, it should be possible to automate certain types of tasks in gnut. For example, if there is a file with a certain name that you are trying to get, but the server it is on is not often available, you could write an expect script that runs gnut, issues a search for the exact filename, verifies that a file is found, and starts (or resumes) a download.

There is also a lot that can be done with the recently-added load and eval commands and backquote command substitution. One user suggested having an eval command in his .gnutrc startup that loads a blacklist file from an FTP server somewhere and then runs it -- this is an easy way to keep up-to-date on which sites have been blacklisted.

6.3 Network Topology

gnut users often ask how to maximize their search results. In general, increasing min_connections and ttl increases the number of hosts your searches will reach, but there is more to it than that.

If you do an update command twice in a row, giving it 10 minutes to run each time, you'll see that the number of hosts changes, often significantly.

The reason the results change is because the network is always changing. Connections close, and new connections get created. Every change affects the number of hosts you can reach.

The biggest thing that affects the number of hosts you can reach is when two of your neighbors are connected by a small loop:

 k l
 \ /
 j -- d e -- m
 \ /
 YOU
 u / \ n
 \ / \ /
 t -- i -- B ----- C -- f -- o
 / \
 h g
 / \ / \
 s r q p

Notice how there are two paths from you to C. Also, see the 6 hosts (d,e,f,g,h,i) that are in the outer ring. If there wasn't a loop, nodes B and C would both have an additional branch they could use to reach out to more nodes:

 k l
 \ /
 j -- d e -- m
 \ /
 YOU
 u / \ n
 \ / \ /
 t -- i -- B C -- f -- o
 / \ / \
 s -- h v w g -- p
 / /| |\ \
 r x y z 1 q

In general, the more loops there are, the less nodes you can reach.

However, you also need loops for the nodes to be able to reach each other. In the first diagram, all the nodes can reach each other in 5 hops. In the second diagram, some nodes (like s and p, or x and q) are 6 hops away from each other. If their ttl was 5, they wouldn't be able to search each other's shared files. Also, almost all of the traffic would be going through you. Such a network is called "unbalanced" because the data traffic is unevenly distributed.

This looks like a contradiction: loops seem to both increase and decrease the number of reachable nodes. The ideal balance is for there to be lots of loops of large size, and few small loops. In fact, the most efficient large-scale multiprocessor supercomputers do exactly that: examples are the hypercube, N-dimensional torus, and butterfly networks.

gnut dynamically adjusts its network connections so as to avoid small loops. It does this by watching the duplicate-packet statistics on all the connections, and closing the connections with the most duplicate packets. It does not deliberately form large loops, but it actively breaks small loops when it finds itself is part of a small loop, and large loops always form naturally.

Small loops would be rare if all Gnutella clients picked hosts randomly, but many clients don't. Instead, they connect to hosts in whatever order they appear in the host list, and the host list usually comes from PONG packets, so nearby hosts usually end up at the beginning of the list.

6.4 Network Efficiency -- Capacity, Bottlenecks, Responsiveness

Most users care about responsiveness (how long does a search take), which is a function of message propagation time. It is also useful to consider network capacity (total bandwidth).

The average message propagation time (time for a message to get from sender to recipient) is the TTL times the average link delay time. The average link delay time is the link bandwidth divided by the total size of the buffers in the link (each end has a buffer, and there is a little buffering in the routers in between). The network capacity (bisection bandwidth) is roughly the number of nodes times the number of links per node (valence) divided by 2.

Obviously, all these numbers vary from one node to another, but for TTL we can use the smallest value used by a large number of nodes (probably still 7, the default in the original clients) and for number of links per node, we can use an average which is probably about 5 (the original default was 4, but many users have learned to increase their number of connections). I cannot speculate about the buffer size, but it's probably around 32K per link. Decreasing the buffer size would greatly improve responsiveness (speed of searches and replies) but would decrease reliability because the network would be less capable of dealing with localized momentary congestion. All clients drop packets when an outgoing buffer gets full (if they didn't, they would allocate too much memory and searches would get slower and slower, the longer you run the client).

Next Previous Contents

These gnut pages are hosted by gnutelliums.com

Back to main gnut page