The Cloud, Evolved

# Posted on [02-19-2013]

Sans Filesystem, totally Semantic

^ Hacker News

Background

I’ve messed with Evernote sync and so much more - Google App Engine, Twitter’s API - the magical service called Dropbox; I’ve grown up with the internet, and we store everything online, but it’s messy - disorganized. I have to remember if I stored something in Google Drive or on my own computer, or whether it’s in iCloud or stuck on Box. There’s no single paradigm that organizes it all and defines how it works. That’s part of the beauty of it, perhaps, but I believe the cloud can be so much more and will be. So here is my way of making sense of “The Cloud”.1

Proposition: The Core

The doodle that started it allPreliminary Doodle

Imagine “The Cloud” as not a fantastical cloud somewhere high above us but rather a file cabinet. Not necessarily mundane, but part of our lives and accessible. Apple (before 2012) would say “it just works”. I write essays and store them in the file cabinet; if I want to share something about my work with the world, I store that in the file cabinet too. I want to collaborate on a project with the team so I put that in the file cabinet; if I want to look up the definition of “Just Deserts” I ask the file cabinet. Importantly, the file cabinet handles all of these requests and as a user I don’t need to mess with urls or directories or anything like that. But most importantly, there is only one file cabinet, for everything - where iCloud fails is that there is a separate datastore for every application, which makes it a pain to share data between applications. The file cabinet is, on the other hand, the datastore for all, including strangers halfway across the world.

That’s why “The Cloud” should be called (instead) "The Core". If networks are truly data-centric (and they are, because they exist only as a means to manipulate, store, or communicate data), the lone file cabinet is at “The Core” of the network. From the user’s perspective, that is all we need to know. We use it to store and retrieve data. It is the internet. Of course, though, such a system needs to be implemented - and I think this is how:

"The Core" is specifically comprised of:

  1. The Actual Storage
    • The Actual Storage is the layer comprised of the actual servers or databases that keep a hard copy of the data. These services (e.g. an evolution of Dropbox, or Amazon) store and encrypt the data. Users can pick which storage services to store data to (including private services).
    • Treating multiple data stores as one necessitates that the common datastore is flat. There is no logical organization of data at this stage (regarding “The Core”), and the organization of information is implemented on the third level.
    • Data aggregation services also live at this level - services that don’t create new data, but rather manipulate existing data to present them in new ways (like Facebook, or Twitter, for instance).
  2. The Core Protocol
    • The Core Protocol defines the high level API that the data stores implement such that “The Core” functions as a unified system. It defines a means for authentication and the privacy spectrum, enabling “The Core” to consolidate the storage of personal and public files.
    • The Core Protocol tells services how to expose methods for modifying their data such that they can be utilized by the Retriever.
  3. The Retriever
    • The Retriever is the ‘brain’ that interprets user requests and provides the interface through which the user interacts with “The Core”. It functions like a search engine, retrieving my Film Comp notes when I ask for them, or showing me the news when I ask for those.
  4. The Saver
    • The Saver’s responsibility is to save user data in the appropriate “Actual Storage” service and modify existing data. While The Retriever “gets”, The Saver “puts”.

Organizing Data without File Hierarchies

This is where I want to backtrack a little and say that no, The Core is not a file cabinet, because at this point, the metaphor falls apart. A file cabinet groups information into file hierarchies (categorizing files into different folders, and different shelves) but The Core doesn’t mandate that. Instead of walking up to a file cabinet and saying, “Here’s a PDF. Put it in the sheet music folder!” I would rather walk up to The Core and say, “Here’s something to store! Store it!”

Of course, that begs the question, how do I retrieve my copy of Chopin’s Ballade No.1 from the Core? The answer is straightforward. The Core knew that what it stored was sheet music for Chopin’s Ballade No.1 belonging to me. The identifying information is not stored as part of a directory structure or URL but rather intrinsically, in the stored entity. So when I ask The Core, “My copy of Chopin’s Ballade No. 1 please!” The Retriever can search through the data stores and pull up exactly what I want.

On the other hand, with a file cabinet, I would have to say “Go to the sheet music folder and get me [that PDF]”. That may forever be in the power users’ repertoire, but I believe we shouldn’t have to care about the sheet music folder. It should “just work”. The servers could use file hierarchies behind the scenes and I have nothing against that; admittedly, I know little about how Dropbox and other services work. But for consumers and creators like me, I think the file hierarchy is a thing of the present and past, while search is the future, if not already the present.

The hard part is how to do this, how to make The Core intelligent enough to identify my PDF as a copy of Chopin’s Ballade No. 1 - and I don’t offer a solution. But this is the beginning of a semantic Web, and as far as I can tell it will happen with the direction of Tim Berners-Lee. Until we get there, it wouldn’t be too bad to adopt a hybrid approach, and tell The Core, “Here’s my copy of the sheet music for Chopin’s Ballade No. 1!”

The takeaway is that, The Core not only stores my data - it interprets it for me.

By the way, if you don’t want a more technical explanation, it’s totally fine to stop reading here. Just follow me on Twitter and scroll down to leave a comment!

The Core In-Depth

A wholly unrelated doodle meant to brighten and supplement the potentially monotonous textFlowers and Mountains

The Actual Storage

The Actual Storage is behind the scenes and (potentially) provided by companies like Amazon or Dropbox, or private servers or government organizations (for maximum privacy, for instance). They store data. They run independent services that each comply with the Core Protocol so that they work with any Core-connected interface. Once again, though, the storage is ultimately flat from The Core’s perspective, and TAS services need not provide any level of organization.

The encryption and security of data would also be dependent on the TAS level. The Core Protocol allows TAS services to determine if a request is authenticated but it is up to the TAS to encrypt the data and do whatever is necessary to maintain privacy and security, creating a selling point for specific TAS services.

Services like Facebook and Twitter also hook in here. Why? Because the TAS is the level at which data is not only stored but potentially interpreted. Facebook-like services don’t create new data - they only interpret existing data, such as my Likes, my Tweets, my Friends, and (at least for now) assemble them into Timelines and networks which can be retrieved by the Retriever.

The Core Protocol

Remember that The Core is just an intelligent cabinet that spits out what you want; the Core Protocol makes it happen. I think the Core Protocol needs to do six things:

  1. Unify all TAS services under one umbrella
    • By treating them as one datastore - a single repository - we go back to the idea that The Core is the only cabinet.
  2. Expose and interface the umbrella with the Retriever and Saver
    • The data would have to be communicated in some way, so The Core Protocol would define a) how the Saver passes data to the TAS services, b) how the Retriever requests data from the TAS services, and c) how the Saver manages existing data on the TAS services.
  3. Establish a spectrum of privacy for data
    • The idea is that all of the stored information will be sorted into a spectrum of privacy (dependent on who is trying to access the data). Each user will have a spectrum. For instance, my personal essays would be on stored on the most private end of my spectrum, and my public portfolio would be on the most public end of my spectrum. My friend’s shared class project would be in the middle of the spectrum and Twitter’s data would be on the public end of the spectrum. I can see everything on my spectrum and nothing that’s not on my spectrum.
    • A piece of data’s location on a person’s spectrum is determined by how many other people have that data on their own respective spectrums. By shifting the location of my essays, for instance, on the spectrum, I can share it with more or fewer people.
    • Of course, privacy isn’t exactly linear, so I’m not adverse to the idea that instead of a spectrum we have a sort of “grouping of privacy” where data is sorted into groups instead of along a spectrum.
  4. Establish a means of authentication for data
    • The spectrum of privacy necessitates a system of users with regards to The Core, and the Core Protocol will have to handle authentication (which the Receiver, TAS, and Savers will also all have to take into account).
  5. Ensure that the data handled by The Core is semantically recognizable
    • The Core will fail with non-semantic, meaningless data because then the data cannot be organized and retrieved.
  6. Give every file a unique identifier
    • There remains the necessity to be able to reference specific files or pieces of information with ease; with the abolition of paths, links for instance would have to depend on identifiers for the time being. I am hopeful that the unique identifier would one day no longer be necessary, once the Retriever is sophisticated enough to be able to speedily identify specific files with straightforward queries en masse. Until that day, a unique identifier, I think, is a must.

The Retriever

The Retriever is my idea of the future Google. It is a search and delivery engine. It makes sense of my query, for instance, “My sheet music”, and by utilizing the Core Protocol’s semantic prowess, and potentially working with TAS services, the Retriever finds and delivers the results to me (all my sheet music).

There is so much potential here for powerful innovations, and fields such as speech recognition and artificial intelligence all play a role with The Retriever. My only concern is that if data stored in The Core is encrypted, it would be difficult to index, crawl, or pre-process that information. Hence, new advances in technology would be a prerequisite for the smooth functioning of a system which would have to potentially decrypt private data every time I make a query.

Or I could just give my private key to The Retriever and all would be fine and dandy.

The Saver

The Saver is, once again, responsible for adding and manipulating data already on The Core. This would include functions like changing an essay’s position on the privacy spectrum (sharing it with other people), changing an essay’s TAS location (moving from, let’s say, Dropbox to a private, university TSA), or deleting data. The Saver would have to work in conjunction with the Retriever to identify which essay to modify, but I think it’s pretty straightforward.

Usage and Experience

What using the Core would feel like (on iPad)

Let’s say The Core already works. It would function not unlike Dropbox on iPad, even with the app-centric design of iOS. I have to say that Dropbox is the closest thing to The Core right now, and it’s pretty awesome. But Dropbox isn’t ubiquitous, and its reliance on a filesystem as an interface is I think reasonable for the current state of technology but not for the future.

The Core would turn iOS into a hybrid data/app centric type of system, not unlike OS X or other desktop operating systems. It would integrate, for instance, like NSOpenPanel or NSSavePanel, and necessitate the redesign of applications (reducing the number of proprietary formats and Core Data driven stores, now that applications can access data of any type). I think that anything application-centric, though, is ultimately a dead end.

A Core-centric Operating System

Sketch of “Paper OS” that revolves around The CorePaper OS

For The Core to really shine, Core-integrated operating systems must make The Core at its center. It’s important to note that it’s not about sync - The Core is the only repository there is, so there’s nothing local to sync from the user’s perspective. With a Core-centric OS the applications don’t matter as much. Here’s what I think it would be like:

  1. There’s “The Core”. The home screen.
    • The search screen. The root of all navigation, the fundamental part of the OS that leads to everything else. It would be like Google.com, an interface for the Retriever.
    • If I need to open an essay, I just go to “The Core” and ask it for the essay I need and I open it. Things are organized not by app but by query.
  2. Applications will still exist.
    • But they won’t be as important, because they share file types. I hope to never “run” Microsoft Word. I just want to open a text document, and if MS Word can open it for me, it will.
  3. Websites are treated like essays.
    • You also open websites through “The Core”, for instance, by searching “Twitter”. Instead of Microsoft Word opening “Twitter-aggregated data” a web browser would open it.

Applications

The current model of applications - web apps, local apps - presents a problem. Are web applications treated as simple files? Do local, native applications still exist? How do applications integrate into The Core? At what level of The Core do applications and services function?

  1. Are web applications treated as simple files?
    • What needs to be made clear is the distinction between data (the individual tweets) and the application that handles that data (Twitter.com, or Twitter for Mac, etc). The Core stores and interprets data, and data alone. Apps are not services.
    • That means the distinction between web applications and local applications becomes unimportant. Once the OS has Tweet data, it searches for an application to read the Tweet data. Whether the application runs natively or lives in something like a WebView is not of concern to The Core.
    • By the way, Twitter is an amazing link-haven. You should follow me @Vervious.
  2. Do local, native applications still exist?
    • I don’t know, but I’m thinking about it.
  3. How do applications integrate into The Core?
    • This could be another whole essay, but I believe the OS is only a tool for interacting with data. And applications are only tools used by the OS to achieve that purpose.
    • A second question arises though - is “Recent Tweets” really the type of data meant to be stored in The Core? It isn’t like a text document, as it’s more of an aggregation of other data than something static. I think that’s the wrong way to approach it.
    • The Core stores data, not individual documents or files. There’s no bright-line that separates one “piece” of data from another. A single Tweet could be a piece of data, but a list of Tweets may be as well; this ultimately ties in well with the underlying assumption that The Core not only stores data, but also interprets it.
  4. At what level of The Core do applications and services function?
    • Applications function at the level of the OS and merely allow users to consume and create data.
    • Services, though, like Facebook, and Twitter, function to provide data. They hook in as either clients (users that read and write data to the Core) or at The Actual Storage level. The latter is the most realistic as the services only need to generate pieces of data to return upon request (as opposed to every time a Tweet is posted, etc).

The Social Network

I envision The Core as merely a storage place that allows for services to run inside the cabinet to aggregate and make sense of data. The social network and the graph is a whole other story. It’s another type of data, and I’m sure that The Core will be used to great effect to store and organize that data. But the social graph isn’t the focus of The Core and neither is The Core the focus of the social graph. They may live together in harmony, and honestly, I haven’t thought about it much yet.2 Thanks for reading, and please leave a comment!

Footnotes

1 On another note, the punctuation bugs me. I want to keep the period outside the quotation because it just doesn’t make sense as “The Cloud.”

2 Next potential essay topic!

Notes (6)

  1. sagardog reblogged this from vervious
  2. typographybliss reblogged this from vervious
  3. vervious posted this

Comments

comments powered by Disqus