White paper : data in pryv.io

White paper : data in Pryv

This white paper outlines how pryv.io manages data and why we designed it this way, hopefully shedding light on pryv.io's unique value proposition from the data model perspective.

It's targeted at both technical and non-technical audiences with a basic understanding of data storage and networks.

Privacy at the core

Pryv.io was designed from day one around privacy and decentralization. We’ll assume you, the reader of this paper, have a pretty good idea why privacy matters. And decentralization is ultimately necessary to achieve privacy, because centralized architecture implies centralized control, which means people do not truly own their data. So pryv.io stores each account’s data separately: each account has its own database, which can be moved around (on its own dedicated server if needed).

  • No big collections holding the information of all users.
  • No design headaches with confidentiality.
  • A sensible, natural way to store data.

A data structure designed for everyone

But can people really be said to own their data if they can’t understand it? We don’t think so.
So we strived to design a data model that makes sense to both developers and users.
Based on a few simple concepts, with neither too little nor too much abstraction.

Events: the base unit of data

To let data be easily understood and exchanged across systems, we provide an open directory of standard types, which we recommend developers use when interoperability is a concern (it rarely isn’t). It’s worth pointing out that pryv.io events are so easy to use and interoperable because we kept the types low-level, holding all necessary contextual information at the organizational level (our next topic). For example: we have no such things as “heart rate” events; instead in pryv.io you’ll use `frequency/bpm` events classified under the “heart rate” context.

Pryv stores things that happen in time as _events_.

  • A blood pressure measurement.
  • A photo.
  • A note.
  • Anything

To differentiate photos from notes, every event has a _type_.

Tags and streams: ready-to-use organization

No surprises here either, we strove to keep it straightforward: account owners connecting via a trusted app use _owner_ (personal) accesses; for other apps they use _delegate_ (app) accesses; to share data with other people they create _shared_ accesses. As expected, the latter two only provide a limited view of the account’s data. That view – or subset – is primarily based on context (streams and tags), because that’s how we humans handle privacy in most cases. So controlling who sees what is just a natural aspect of properly contextualising data.

At this point something might become apparent:

Pryv.io clearly sets itself apart from the many solutions out there which provide unopinionated (fully freely-structured) object storage. Those certainly have their use sometimes, but if we really want to tackle the issues of data ownership and interoperability, we need to balance flexibility with familiarity. So when designing pryv.io we paved a few cowpaths – which brought along other benefits for developers, such as a more expressive API.

No-nonsense access management

We have an idea of what data looks like in pryv.io (and why), now how do we access it? Every connection to Pryv data is mediated by an _access_.

No surprises here either, we strove to keep it straightforward: account owners connecting via a trusted app use _owner_ (personal) accesses; for other apps they use _delegate_ (app) accesses; to share data with other people they create _shared_ accesses. As expected, the latter two only provide a limited view of the account’s data. That view – or subset – is primarily based on context (streams and tags), because that’s how we humans handle privacy in most cases. So controlling who sees what is just a natural aspect of properly contextualising data.

Cross-account indexing/searching and aggregation

Pryv.io keeps every account’s data separate from other accounts’, so how do you work with data across multiple accounts?

You do it what we consider the Right Way: “client”-side, probably on a middleware service. So for example if you want medical statistics over a number of patients, you’ll ask the patients to grant your app/service access, aggregate each patient’s data subset into your own database, then compute your stats and/or maybe expose the aggregated data to other apps through an API.

How’s that a feature you ask?

We understand it might not look like one today, but we believe it actually is – because there is no way around it if you take privacy seriously. That’s just how we think personal data will work in the future. So you might consider building things this way to be a wise investment.

Future-friendly

More and more people realize there’s a problem with how most apps and services on today’s internet deal with personal data. It’s not that they’re ill-intentioned, but that they’re built on an client-server architecture that’s inherently at odds with privacy and proper data ownership.

As a growing number of us work to change the rules of the game, we can see a truly decentralised web coming that fully realizes the promise and design of the internet.
Pryv.io is our contribution to that change and while it still comprises centralised components as
of today, its data model is fundamentally ready for tomorrow’s web. Building on Pryv.io means
your apps will be, too.

Author: Simon Goumaz, co-founder Pryv SA