dWeb DHT


DWDHT, an acronym for "dWeb DHT," is a protocol that forms a distributed hash table of dataset identifiers (dWeb network addresses) and the peers that are announcing them. Each database identifier and its announcing peers form what is referred to as a "swarm," which can be queried for connected peers or joined (announced) by a specified amount of peers. dWeb's DHT is literally how a particular dataset is discovered and downloaded from the peers that are openly seeding it (announcing it). Each peer who is announcing a dataset on a long-term basis, by design, becomes a DHT node and is responsible for storing a specific portion of dWeb's DHT. Put another way, each DHT node stores a specific portion of the dWeb's dataset identifiers, along with the peers who currently possess the data related to each of those data identifiers.

While each DHT node stores various swarms, it also stores information regarding other DHT nodes, identified by node identifiers, which can be mathematically determined to possess information regarding a specific swarm. dWeb's DHT is based on the Kademilia Distributed Hash Table popularized by BitTorrent and others. Nodes and data in dWeb's DHT are assigned 160-bit integers as IDs, while swarms are stored in the form of key-value pairs, where the key is a 32 byte value generated by a one-way hash function, such as SHA-1 (normally a dWeb network address), where the value is an object containing public and LAN-based peers that are announcing the key (the dataset identifier).

Node & Data Distribution

The DHT defines the distance between 2 DHT nodes i and j by the bitwise exclusive OR operations (XOR), i.e., d(i,j) = i (XOR) j. This distance means for any given key i and a distance L > 0, there can only be a single key j that satisfies d(i,j) = L [ZHAN13].

All key-value pairs are stored on k nodes whose UIDs are closest to the actual key. k is an important parameter that helps determine data redundancy and how stable the DHT is at any time. Each node i always maintains multiple k-buckets. Each k-bucket stores a list of other DHT nodes, which are organized in an order that reflects the most recently active nodes. The node that is most recently active is stored at the tail, while the least active node is stored at the head.

The node whose distance from node i is in the range of [pow(2,m), pow(2,m+1)] is stored in the mth k-bucket (node that 0 < m < 160). The nodes in the k-buckets are regarded as the neighbors of the node i. A dWeb DHT node dynamically updates its neighbors upon receiving any messages from them. This process can be better explained more specifically, when node i receives a message from another DHT node j, which is located in the mth k-bucket, where the k-bucket of node i, will be updated in the following way:

-If j already exists in the k-bucket, i moves j to the tail of the list, as node j is the most recently seen. -If j is not in the k-bucket and the bucket has fewer than k nodes, node i inserts j at the tail of the list. -If the bucket is full, i PINGs the node at the head of that particular k-bucket. -If the head node responds, node i makes it to the tail and ignores node j. -Otherwise, i removes the head node and inserts j at the tail.

DHT Primitives

-PING - Probes a DHT node to check whether it's online or not. -STORE - Used to store a key-value pair. -FIND_NODE - Finds a set of nodes that are closest to a given node. -FIND_VALUE - Operates like FIND_NODE but returns a stored value.

These RPC-like primitives work in a recursive way, which improves the efficiency of dWeb's DHT. A lookup procedure is initiated by the FIND_NODE and FIND_VALUE primitives, where the lookup initiator chooses B nodes from its closest k-buckets and send many parallel FIND_NODE requests to these B nodes. If the node being searched for in a given iteration is not found, the lookup initiator resends the FIND_NODE to the nodes that were found during the previous recursive operation and repeats this iterative functionality.

A KV pair may be stored on multiple DHT nodes. Thanks to the recursive procedure explained above, the key-value pair spreads across the DHT network every hour. This process insures that multiple replicas of data exist across the network. Every key-value is deleted 24 hours after it is initially pushed into the network.

Joining and Leaving

When node i joins dWeb's DHT, it is assumed that it is aware of or knows about node j. The joining process consists of multiple steps as follows: -1. Node i inserts j into its k-buckets -2. Node i starts a node lookup procedure for its own ID, where i is made aware of other newer nodes. -3. Node i updates the k-buckets

During this process, node i strengthens its k-buckets and inserts itself into other nodes' k-buckets. When other nodes leave or fail, they DO NOT notify any other node. There is no need for a special procedure to cope with node departures, as these mechanisms insure that leaving nodes will be removed from the k-buckets.

Swarm Announcement

Announcing a swarm means that you're either: announcing a dWeb network address that doesn't exist on dWeb's DHT and therefore become the first peer related to the address; or announcing a dWeb network address that does exist on dWeb's DHT and are added to a list of peers who are announcing the address.

A swarm entry in dWeb's DHT, takes on the following format:

Key: 8f0ab2... (32 byte hexadecimal address)
= = = = = = = = = = = = = = = = = = = = = = = = =
node: { host, port },
peers: [{ host, port }, { host, port }]
localPeers: [{ host, port }, { host, port }]

If a peer is announcing a dWeb network address that matches a key in the DHT, the entry is mutated to include the peer's announced IP address and port. When announcing a dWeb network address, a peer must set their public port and public IP address during the announcement and can choose to set a LAN-based address and port as well, so that those on the same public IP who are performing a DHT lookup, can retrieve local announcers for a particular dWeb address. It's important to note that when announcing a dWeb network address on dWeb's DHT, the announcer becomes a DHT node on the network by design.

Swarm Lookups

A swarm can be looked up by its dWeb network address, which is a 32 byte buffer (normally a hash of something). When performing a lookup, the querying peer is temporarily an announcing peer, but is removed as an announcing peer (unannounces), once the lookup is finalized.

A lookup for a particular dWeb key, returns the following data:

// The dWeb DHT node that is returning this data
node: { host, port }
// List of Peers
peers: [{ host, port }, ...]
// List of LAN Peers
localPeers: [{ host, port }, ...]

DHT Bootstrap Nodes

dWeb's network utilized various bootstrap nodes to launch the initial dWeb network. These 3 bootstrap nodes are as follows:

-dht1.dwebx.net -dht2.dwebx.net -dht3.dwebx.net

These nodes going down would not affect the dWeb or a user's ability to announce or lookup dWeb network addresses, due to the way other DHT nodes on the network share data and information regarding each other. dWeb developers have the option of launching their own bootstrap nodes so that their apps have a point of entry into the dWeb's network of DHT nodes. Those nodes would initially use a set of already existing nodes to gain access to the DHT and would no longer need access to the nodes used for bootstrapping. Any node on the DHT can be used to bootstrap a new node.

DWDHT Reference Implementation

The DWDHT reference implementation was written in JavaScript and was used to launch the initial dWeb DHT. You can find it here.

You can use this reference implementation to allow dWeb-based applications (desktop, mobile or web) to act as DHT nodes.

You can also launch your own dWeb DHT node via the command-line, by using the DHT CLI here.

dWeb Swarm API

The dWeb swarm API, also known as "Swarm Programming" is a high level API built on top of the DWDHT Protocol used for finding and connecting to peers of a particular swarm and interacting with the dWeb DHT.

It expands the default set of parameters when creating a DHT node, adds specific swarm events where callbacks can be used to react to specific swarm-based events and introduces several functions, like join, leave, and connect, which make it easy to programmatically interact with the dWeb's DHT.

DHT Node Initiation Parameters

Using the dWeb Swarm API, a DHT node can be created with the dwebswarm() function, using the following parameters:

-bootstrap - An array of bootstrap servers used to initiate the dWeb DHT node. -ephemeral - A boolean that is set to false if this is intended to be a long running DHT node. -maxPeers - The total amount of peers that the node initiator will connect to. -maxServerSockets - The number to restrict the number of server socket based peer connections. Set to Infinity by default. -maxClientSockets - The number to restrict the number of client socket based peer connections. Set to Infinity by default. -validatePeer - A function for applying filters before connecting to a peer. -queue - An object for configuring peer management behavior. --requeue - An array of backoff times in milliseconds every time a failing peer connection is retried. --forget - An object for when to forget certain peer characteristics and treat them as fresh connections again. ---unresponsive - How long to wait before forgetting that a peer has become unresponsive. ---banned - How long to wait before forgetting that a peer has been banned. ---multiplex - Set to true in order to reuse existing connections between peers across multiple dWeb network addresses.

Swarm Event Emitters

Using the dWeb Swarm API, applications can listen for the following events:

-connection - A new connection has been created. Event should be handled by using the socket. This emits an info object that describes the connection using the following details:

-type (string) - Should be either "tcp" or "udp."
-client (boolean) - If true, the connection was initiated by this node.
-topics (array) - The list of dWeb network addresses associated with this connection, if "multiplex" was set to true, during DHT node initiation.
-peer (object) - Object describing peer (port, host, LAN peer details, referrer and the dWeb network address the peer was discovered under).

Using the info object, the info.ban() method can be called to ban the connected peer and will keep the DHT node from connecting to the peer in the future. The info.backoff() method can be called to get the DHT node to backoff from connecting to the peer.

-disconnection - A connection has been dropped. Emits an info object that is identical to the connection info object, describing the dropped connection.

-peer - A new peer has been discovered on the network and has been queued for connection. Emits a peer object, including the port, host referrer, and dWeb network address peer was discovered under. Also includes a boolean on whether the peer is a LAN-based peer.

-peer-rejected - A peer has been rejected as a connection candidate. Emits a peer object that is identical to the peer event's peer object, describing the rejected peer.

-updated - Emitted once a discovery cycle for a particular dWeb network address has completed. Emits a key object that identifies the dWeb network address the discovery cycle is related to.

For more information on dWeb Swarm's events, take a look at the README.md file within the reference implementation of the dWeb Swarm API here.

Swarm API Functions

There are several functions (methods in the reference JavaScript implementation) that can be used to interact with a dWeb DHT node. These functions and their required parameters are described below.

join - Join the swarm for a given topic (dWeb network address). This will cause peer to be discovered for the topic (peer event).
join() Parameters

-topic (Buffer) - The dWeb network address to list under. Must be 32 bytes in length. -options (Object) --announce (Boolean) - List this peer under the topic as a connectable target. Defaults to false. --lookup (Boolean) - Look for peers in the topic and attempt to connect to them. If announce is false, this automatically becomes true. -onion - A function that is called when your topic has been fully announced to the local network and the DHT.

connect() - Establish a connection to a given peer. You usually won't need to use this function, because the DHT node connects to discovered peers automatically. Accepts a peer object (identical to the peer object emitted by the peer event) for the peer the function is to establish a connection with. Also has a callback function that emits a connection error (err), and the established TCP or UDP socket (socket) and details regarding the established connection (details).
leave - Leave the swarm for a given dWeb network address.
leave() Parameters

-topic (Buffer) - The identifier of the peer-group to delist from. -onleave - A function that is called when your topic has been fully unannounced to the local network and the DHT.

For more in-depth examples on how to call these functions within dWeb-based applications, view the README.md file within the reference JavaScript implementation here.