P2P Peer Discovery

January 14, 2020

Decentralized applications require some way of discovering peers, or other machines in the network that are distributedly hosting some content. P2P discovery requires each node in a network to be able to find other nodes that are storing some data. Abstractly, P2P discovery is some function that takes a topic, and returns a list of peers, usually their IP and port information.

ips = discover("32d225818f3928d4f17ed4893108f630d59023ccbbda196262ecd936e4033421")

This abstraction not only makes discovery transport agnostic, but also applies to different P2P technologies: IPFS topics are content-addressable hashes, dat uses a hashed reader key, and BitTorrent trackers group peers by filenames.

Once discovery occurs, communication with the network (outside of the list of peers) is no longer needed. This allows different technologies to use the same discovery (Mainline DHT), while leaving the syncing and transferring of data up to the implementation. Each discovery mechanism has a different set of trade offs and advantages and software may use multiple techniques at once. Peers can be discovered via multicast DNS on the local network, trackers, distributed hash tables, and other mechanisms.

Multicast DNS

Multicast DNS (mDNS) uses DNS over a local network. Instead of querying a name server, nodes broadcast topics they're interested over a local network. There is of course limited reach, only finding potential peers in a local network, and depends on no centralized resources. A well-known example of using mDNS to find devices on a local network is Apple's Bonjour.

Implementations

Trackers

Trackers are servers that peers can announce themselves and topics they're subscribed to, and receive a list of peers for subscribed topics. This can be implemented as a centralized DNS discovery server. While trackers have large reach such that anyone on the internet can join, there is a single mapping for all topics to all IPs, a risky centralized server vulnerable to surveillance, downtime, and legal pressure. Private trackers can be used to mitigate some of these issues.

Implementations

Distributed Hash Tables

Hash tables are key-value stores that can set a value labeled with a key, and return that value given the same key, both in $O(1)$ (constant) time. This storage can be distributed across a network for resilience or sufficiently large amounts of data using a distributed hash table. DHTs are hash tables sharded across several nodes in a network, each with the ability to set and get some key by communicating with up to $log(n)$ nodes. Requesting a value from a peer results in either the data requested, or another peer's address that has the data, or closer to the node with the data. In P2P networks, DHT nodes map some key to a list of peers subscribed to that key. Kademlia DHTs are often used in P2P for its efficient routing, popularized by BitTorrent's Mainline DHT and seen in hyperswarm, IPFS, and Ethereum.

Nodes in the Mainline DHT network communicate over UDP on port 6881, sending k-rpc/KRPC bencoded messages using the DHT protocol (BEP 5). The many interoperable implementations implement a minimal interface (ping, find_node, get_peers, announce_peer) to get IP/port information for a list of peers subscribed to the hash.

Applications will often connect to a bootstrap node, available at a well-known address or IP (like Mainline DHT's primary bootstrap node at router.utorrent.com:6881, to join the network for the first time. Clients will store node addresses from previous sessions and attempt to use them to rejoin rather than the initial bootstrap node.

The Mainline DHT is capable of supporting networks tens of millions of nodes large, with the bootstrap node supporting 20,000 requests per second.

DHTs are broadly resilient due to their distributed nature, although IP addresses and their hash subscriptions are still distributed throughout nodes on the network. Some implementations are vulnerable to Sybil attacks and others, although there are defenses (BEP 42: DHT Security extension), discussions (@hyperswarm/dht: Secure and privacy preserving DHT), and extensions (S/Kademlia security extensions) on improving the privacy and security. DHTs may also be restricted to a local network (Deploy a private IPFS network) for decentralized, private, although limited, discovery.

Implementations

libp2p-kad-dht
WebTorrent's DHT
BitTorrent's DHT Bootstrap
@hyperswarm/dht
Open DHT, similar to Mainline DHT, offers extensibility and an optional public-key identity layer.
Kademlia/Mainline DHT in Go

Other

Google's Nearby and Apple's Multipeer Connectivity APIs find nearby peers abstracting over several sources, using Bluetooth, Bluetooth LE, Wifi and ultrasonic modem sounds when needed. These technologies may be used for discovery as well as messaging, although neither solution is open and some P2P applications may still require communication over a different channel.

Takeaway

Each technique has its own strengths and weaknesses. The ideal discovery mechanism depends on the use case, and since discovery can be abstracted as ids = discovery(topic), the same underlying P2P technology can handle discovery differently depending on the application (e.g. a torrent application can use mDNS, trackers, or DHT). To increase the likelihood of peers, applications will typically use several of these techniques simultaneously.

Discovery may only be needed on an application's initial run. Often, IP addresses discovered from previous sessions may be used instead of a discovery mechanism. For example, rather than re-querying a DHT's bootstrap node, a previously known "good" node may be used instead if it's still available, or there may be no need to search on a local network if a peer is still available with the same address.

P2P Peer Discovery

Multicast DNS

Implementations

Trackers

Implementations

Distributed Hash Tables

Implementations

Other

Takeaway

References