Network Topology

FIXME Update this from drc-server.git

Goals

  • Redundancy - there should be no node in the network that is depending on a single connection to the network
  • Self healing - the network should be able to self-heal after failure of one or more nodes
  • Minimal loss - in fulfilling all other requirements, effectiveness should not be compromised

Design

The intention is to construct a uniform adaptive partial mesh overlay network. This approach has been chosen instead of a full mesh network because:

  • Once a link is broken in a full mesh network it becomes a partial mesh network anyway
  • The internet is not reliable enough for a full mesh network to persist on top of it, making the network in its fully connected state a rare occurrence
  • A partial mesh configuration is inevitable, so the added complexity of the software to handle the additional full mesh configuration is redundant
  • The number of links present in a full mesh network is exponential based on the number of nodes, which in any reasonably sized network will incur too much overhead and complexity

Each node in the adaptive partial mesh will have (almost) the same number of links into the network, based on the natural log of the number of nodes (this information is available since every server will know about every other server on the network). This counteracts the exponential link count that would be present in a full mesh.

Deciding on the number of persistent links to have to every node is a difficult problem. Using a fixed number of links means that some nodes cannot satisfy this requirement in some configurations or node counts.

Possible link count algorithm

  • x = ceiling(log(n)), link count is x or x+1 (n = number of nodes)
    • Only seek x connections, accept up to x+1 if initiated externally
      • This solves the problem where for some n, it is not possible to make exactly x connections to every node - implicitly, the number of nodes stuck on x-1 links must be less than n, and this provision provides n “spare” links
    • If a connection is made which results in x+1 links, check adjacent nodes for also having x+1 links - remove the link to the first node also in this state
      • Stops needless amount of redundancy, while still allowing all nodes to have at least x links

The number of links scales at roughly n*log(n)

  • Network built using priority queues
    • Highest priority is least availability (from the point of view of the node initiating the connection)
      • Availability measured in hop count (no route = infinite)
      • New connection always made to most “remote” node
        • Implicitly minimises maximum hop count

This all introduces a new problem - how do we get the hop counts?

  • Broadcast-on-connect?
    • Message contains node ID, hop count incremented on each hop

Possible connection protocol

  1. Attempt connection to most distant node, traversing priority queue until a connection is achieved
  2. n1 asks n2 for route table
  3. n2 sends route table to n1
  4. n1 merges the route table with its own
    1. The hop count for every node is incremented in the received data (to account for the hop the data has just made)
    2. Any route entries with a hop count lower than the local hop count to the same node causes an update in the local table
      1. Hop count → new hop count
      2. Gateway → n2
    3. Merged route table is sent to all adjacent nodes (those connected to n1) except n2

When a route table is received by a node:

  1. Merge with local route table as described above
  2. If the local table was modified, rebroadcast as above

This shouldn't result in an infinite update cycle because eventually all nodes will have a full route table that needs no more updates.

Useful links:

Related links

distributed_relay_chat/topology/root.txt · Last modified: 2009/05/03 14:30 by alan
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki