Jordan Santell immersive engineer

A Decentralized Web Primer: Dat

What is a dat?

a dat, or Dat archive is a set of files and dat metadata. A dat folder can contain files of any type, which can be synced to other users.

Dat Terminology

Dats are like zip archives seeded by peers, similar in spirit to BitTorrent. Originally designed for very large scientific datasets, dats are updateable with versioned changes, and may be hosted and replicated across many peers directly with no central authority. Often, dat refers to just the protocol and swarm network, which interact with dat archives.

Dat is a protocol for sharing data between computers. Dat's strengths are that data is hosted and distributed by many computers on the network, that it can work offline or with poor connectivity, that the original uploader can add or modify data while keeping a full history and that it can handle large amounts of data.

How Dat Works

The creator of the dat archive can update its contents and changes are propagated by peers. As these archives are a collection of versioned files, dats are being used as a distributed way of hosting web pages and other content. A dat archive can contain just a single file that is shared once directly between two parties, or shared media between a group. Dats can be hosted by many peers around the world, privately between a single person's devices as a data backup, or simply as a local archive that never leaves its host machine.

Dats are identified by a URL using the dat protocol and a 64-character key. A path can be appended after the key to reference a specific file in the archive.

dat://6f1760f8f9286729c57a0a9f3de4499a55a567b3b727a8a3d66a978af65eb8ba/path/to/foo.jpg

Dats can be created via cli, Dat Desktop, or the web browser Beaker, an Electron web browser that supports dat:// urls, in addition to http:// and https:// urls. Web content in a dat archive can be viewed as a decentralized website, viewable in Beaker, which provides tools for inspecting contents, customizing seed/peer settings, and managing local dat archives. New dat archives can be created from templates, local directories, or forked from another dat. Beaker can update, view, and host these archives, very different than traditional "real-only" web browsers.

A read/write web

In 1989 one of the main objectives of the WWW was to be a space for sharing information. It seemed evident that it should be a space in which anyone could be creative, to which anyone could contribute. The first browser was actually a browser/editor, which allowed one to edit any page, and save it back to the web if one had access rights.

Creator of the World Wide Web, Tim Berners-Lee,

Dat is a decentralized way of securely distributing versioned data. When that data is web content viewed on a web browser (Beaker), the tight loop of writing changes to a dat and hosting those changes approaches the ideals of a read-write web.

The web was never just supposed to be a one-way publishing system, but the first decade of the web has been dominated by a tool which has been read-only – the web browser. The goal now is to convert the web into a two-way system. Ordinary people should be able to write to the web, just as easily as they can browse and read it.

Richard Macmanus,

Technical

Each dat is associated with a Ed25519 reader (public) and writer (private) key. Dats are identified by a URL using the dat protocol with the 64-character reader key and an optional path.

Initially, dats can be modified only by the owner via the writer key, allowing the addition and removal of files in the archive. These changes are recorded in the dat as an append-only log. Peers request only the changes when updating a dat archive to the latest version, rather than redownloading unmodified files in the archive. Dat now has Multi-Writer support (demo), allowing an archive owner to grant write access to another archive owner, resolving conflicts with CDRTs, although not all interfaces that use dat expose this functionality.

Privacy & Security

The Dat protocol contents are transferred end-to-end encrypted (dat cryptography), and the protocol ensures the integrity using a Merkle tree.

Each dat is only shared over a network to those with the reader key (i.e. the dat://... url). The discovery is done via a hashed form of the reader key (discovery key) as to not leak the reader key. Broadly, anyone with the url can view dats and becomes a peer, replicating and seeding (hosting) that dat.

As the protocol communicates peer to peer, it is not anonymous, as peers can see other peers' IP address. However, peers must know the URL (i.e. have the reader key) to snoop, although hosting services or megapeers could record IP addresses of all peers for known dats.

Hyperdrive

Under the hood, Dat is implemented with the lower-level node module hyperdrive, a series of hypercores, which handles most of the P2P append-only storage magic. Dat and hyperdrive at times may be used interchangeably in discussions and resources, and although there are differences between the two, this distinction is most likely only relevant to developers. Mathias Buus Madson presented the hyperdrive implementation used in Dat as Sharing files and data with friends using a P2P shared folder.

Hyperswarm

Currently, Beaker and core dat (dat-node) use discovery-swarm (a swarm of discovery-channel) for discovery. discovery-channel uses multicast DNS for to find peers on the local network via dns-discovery, which also queries dat's tracker server (discovery1.datprotocol.com, discovery2.datprotocol.com) for peers. discovery-channel may also use BitTorrent (Mainline) DHT network via bittorrent-dht, but appears to be disabled in current implementations.

Beaker has announced hyperswarm, a new discovery mechanism (@hyperswarm/discovery) using multicast-dns for finding local peers and a hole punching DHT (@hyperswarm/dht), with improvements on DHT privacy being discussed.

Resilience

As dats are hosted distributedly, they cannot be removed by some central authority, and in an internet-connected world, comes at the price of availability compared to content on traditional web servers. The Dat client must be running and available in order to host content. If a dat is freshly created (no other peers) on a laptop via Beaker, and the url shared, once that laptop is unavailable, it is no longer possible to download the contents of that dat — there is no central server to request the data from, only that laptop. There is less reliance on a single machine's uptime once others have downloaded and seed that content.

For private notes, having no other peers is a feature. For teams in a professional environment, there's most likely sufficient connectivity during working hours. For smaller or personal web pages, relying on a single, non-dedicated machine to host will be unreliable. There are tools for setting up a server to seed content, as well as services that offer high-availability seeding of archives. As the dat key (url) cannot change, hosting providers (peers) can be swapped (or duplicated) without any reader-visible changes.

Use Cases

Other notes

Resources