Distributed Hash Table

This is an unrefined attempt by Chris Riley to outline a DHT-based services database concept.

Requirements

  • Services have two main databases
    • User database
    • Channel database
  • Database's must be upto date when accessed.
  • If a node goes offline no data should be lost
  • Isolated nodes should still be able to function
  • Reconnecting nodes shouldn't request too much data
  • Data should be distributed to allow for scaling and load balancing
  • Lookup times should be minimised.
  • No 'bot' clients should be required however a msg nickserv/chanserv/operserv command could be implemented to smooth over transfer of users from regular irc.

Storage

  • Any database back end must be usable to store the data.
    • posgres sql and sql lite should be preferable.

Distributed Hash table

  • Every item in the databases will have a hash value. This dosn't have to be unique, however a secure hash function such as md5 or sha-1 should be used to ensure evenly distributed values.
    • computed off the username for the user table and channel name for the channel table
  • each node also has an id value computed with the same hash function using the ip:port of the node.
  • each node has a complete list of all nodes and all online nodes.

Booting up

first time

  • first time a node connects to the network it has no data at all.
  • it should ask all other nodes for data

Nb. if the network is totally new, there may not be any data ANYWHERE. A sarc implementation should have a method to deal with this. A -i have no data- command could help with this. Upon detection of this condition a secure solution should be used.

reconnections

Upon reconnection a node marks all its data as invalid. It should then determine which items of data it has authority over and build an authority chain for them. Using the authority chain it should ensure that it has the most upto date data for these items.

disconnection

if (a) node(s) get(s) disconnected from the main network (detected by less than 51% of all known nodes being offline) the service database goes into read only mode. In this mode no writes may be made to the service database. Other restrictions may also be appropriate. A tempoary database should be used for storing some information (G-lines, channel bans) this data can either be merged or discarded upon a reconnect.

Authority

Authority is determined as follows:

Node calculates the hash of the data item. Node builds an authority chain based on hamming distance between node id hash and data hash In the case of two nodes having equal distance, the node with the lowest id will have authority. (may give uneven distribution… will need testing as to how often this condition occurs…)

There are a few conditions that also need handling:

  • node with authority has only recently joined the network and may not have upto date information.
    • in this case, it is the responsibility of the node with authority to get the upto date data and return it.
  • add others as they arise

authority chains

An authority chain is an ordered list of nodes in order of authority.

dealing with data requests

When data is requested from a node it should build an authority chain. Nodes can only respond with data if either it is at the top of the chain OR it is being asked by a node higher up in the chain.

Transactions/events/operations

  • self connecting
  • self disconnecting
  • other connecting
  • other disconnecting
  • lookup
    • look in local data - do we have it?
    • if so return to asking node
    • inc message hop count
    • if not ask any nodes closer than we are.
  • lookup callback
    • dec message hopcount
    • return data to node that asked us for it
    • store data locally
  • store
    • store data locally
    • forward data to all nodes closer than me.
  • lookup local
    • same as lookup without returning data.
  • store local

Integrity/authority

  • data always changes first at the node with authority over the data
  • nodes have a complete node list.
  • nodes will ALWAYS know who has authority over what data
  • if node with authority is offline, NEXT closest node has authority.
  • every node stores as much data as possible BUT reads it from the node with authority.
  • authority moves implicitly as nodes join and part. THIS IS BAD!!!!!
    • move authority based on node UPTIME if a new node with a closer distance to data has only recently joined the network - based on next point, we should still ask that node for it but also ask the next authority for it if we want a speed increase.
    • OR !!! when a node joins the network, it starts with an empty table/marks all entries as invalid. When data is requested from it which it dosn't have/is invalid it requests it from the previous authoritive source.
  • nodes should over time attempt to build an entire database of upto date information.
    • updates are sent out to EVERY node.
    • every node stores all data given to it. So authority can move in an instant if a node disconnects.
  • if a node PLANS to disconnect, it IMPLICITLY sends update commands for all its data to the network
  • all nodes must strive to get a complete set of the data
    • store any data passed through them from an authoritive node
    • store any data they request
    • if nothing much is going on, request data that they may GET authority over if a node fails that isn't in their current cache*
      • how do we know about data we dont have? guessing is inefficient…
    • flip it arround. Most authoritive node over data asks NEXT authoritive node for said data. 'hey do you have x?' 'no…' 'here you go'

How do we deal with a total network failure?

  • data must persist.
  • if a node connects and it has a copy of some data and the last authority over that data DOSN'T have it (we can also check a few links back in the chain just to be safe) node assumes its data is correct.
  • if a node has NO data it should probably try to get some pretty quick.
  • user data has a lesser priority.
  • some data (opers?) should be sent to all servers on (re)connect by the server that currently has authority over that bit of data.
  • data has a correctness value associated with it. The LOWER the value the greater the need for that data to be correct when it is used
    • each time a node accesses its local cache of data the correctness is decreased by 1
      • if correctness < 0 we must refresh it from the authoritive node.
    • ie it dosn't matter much if someone is kicked from a channel that they aren't on.
      • it matters a lot more if someone can use oper cmds after being deopered
distributed_relay_chat/services/distributed_hash_table.txt · Last modified: 2009/05/02 15:21 by alan
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki