Friday, August 15, 2008

No Rest for the Wicked

Representational State Transfer (REST) is a popular architectural style used in the construction of systems whose components are distributed across a network. First conceived by Roy T. Fielding in his famous doctoral dissertation Architectural Styles and the Design of Network-based Software Architectures, REST as an architectural style has become popular through its application in Web architectures with three pervasive technologies: HTTP, XML, and URI.

Developers have embraced these technologies, which are elements of the Web architecture, as REST itself. The result, unfortunately, is that REST has now become synonymous with building applications that use HTTP to transfer XML representations of resources identified by URIs around a network. Developers are arguing the pros and cons of something wacky like tunneling SOAP-tunneled RMI calls with GET vs. POST as if it were a philosophy of REST problem! This is so way off the mark. While some of these ideas can be put to use in a way that results in well-designed systems, the problem is that developers jumped on the bandwagon too early and left behind the true nature and timeless value of the REST architectural style.

Bloggers seem to be saying that to be RESTful, you must transfer content around the system using HTTP GET, PUT, POST, and DELETE. You must identify resources with URIs, and representations of those resources must be transferred in XML. So here we go again: the Majors of the world have socially engineered the Boxer masses once again. But the solution doesn't always fit the problem, does it? One could argue that solutions never fit the problem - that it is a subjective notion. But true practitioners know what I mean. Solutions forced into problem contexts can quickly metastasize into poor architecture decisions and signal the beginning of a brutish Hobbesian future for REST proper, RESTful system development, and all those involved (or at least those who get blamed for it).

The classic problem rears its head once again. The business ends up committing far more time and resources than originally budgeted, and there is no going back. Critical design flaws emerge and the system becomes very costly to maintain. Well into production, when its progenitors are long gone, the remaining team struggles to maintain a poorly performing system that is difficult and frustrating to use. It is so brittle and hard-coded that, outside of the depressing task of maintenance, a complete rewrite is necessary. Unfortunately, many companies find themselves in this situation, and all of it could have been prevented if a little more thinking had gone into it before the project started. Instead, REST gets blamed as a past trend by sellers of the latest snake oils in the market.

Introduction to REST

REST is an architectural style comprised of a collection of recurring architectural themes that transcend the constraints of any specific set of technologies or protocols used to build a system. According to Fielding in section 5.2 of his dissertation, the REST style is an abstraction of the architectural elements within a distributed hypermedia system.

REST ignores the details of component implementation and protocol syntax in order to focus on the roles of components, the constraints upon their interaction with other components, and their interpretation of significant data elements.

The Web architecture is a single application—just one of infinitely possible examples—of the REST architectural style. To dig a little deeper into what this means, let's explore the metaphysics of REST, look at a message passing architecture as another example of applied REST, and then use parts of Fielding's dissertation and subsequent writings to show beyond a reasonable doubt that I'm not completely off my rocker.

REST as Applied Perdurantism

Okay, okay, don't let the word scare you away. Resources are a key abstraction in the REST style. More specifically, a resource R is a temporally varying member function MR(t), which at time t maps to a set of values. The values are resource representations and/or identifiers. See? No mention of XML files on Web servers or HTTP to communicate representations of them. Great, this is music to my ears! I happen to be a fan of the school of metaphysical perdurantism. Just like Web architecture is applied REST, REST is applied perdurantism. Let's waste a little time exploring this concept further.

The idea behind perdurantism is that material objects extend through space and are identified by having different spatial parts in different places, and they also persist via temporal parts that extend through time. So objects are like four-dimensional entities comprised of parts that take up the three dimensions of space, and the collection of these spatial parts at any given point in time comprise the time dimension.

Object identity takes on an interesting characteristic with this approach: noticeable changes in the set of spatial parts making up an object at any given instant could be indexed over time and identified via epochs. So under perdurantism, person X from moment of conception until wisdom teeth are extracted, and person X from that point until now are considered the same person, despite the fact that many changes have occurred before, during, and after the extraction.

How about an example that doesn't make you cringe: a piece of code known as Foo.java from conception through all its revisions to the most recent version maintains the same identity. We still call it Foo.java. To reference a specific revision or epoch is what Fielding is getting at with his "temporally varying member function MR(t), where revision r or time t maps to a set of spatial parts" stuff. In short, line 15 of Foo.java is just as much a part as version 15 of Foo.java, they just reference different subsets of its set of parts (one spatial and one temporal).

How does this concept apply to resources in REST? Resources are a composite of the set of all spatiotemporal parts for a material object, and representations are immutable reflections on a subset of the set of parts that have identity in a resource, and whose content can take on different forms when transferred between components. Moreover, resource identifiers are abstractions that allow components involved in a transfer to identify and select subsets of the resource's parts to be transferred.

Looking back at section 5.2 of the dissertation, Fielding says a resource identifier is used to identify the particular resource involved in an interaction between components. Rather than enforce a particular standard, the author chooses a resource identifier that best fits the nature of the concept being identified. So maybe the author chooses to identify line 15 from the Foo.java file resource as Foo.java#15 and version 15 of Foo.java as Foo.java?v=15.

This fits nicely with perdurance, since each resource has its own composition of spatial and temporal parts that can be identified. Components exchanging representations of these resources need a priori knowledge of the resource and how to identify parts of it. While REST does not mandate use of a particular standard, use of the URI as a standard for identification is certainly useful, especially since its adoption is somewhat universal and has far transcended the Web.

There is an ongoing debate in the philosophical community about the notion of an object, its parts, and identity in terms of perdurantism vs. other competing schools such as endurantism. Discourse is not limited to architectural styles in building distributed systems, nor are they limited to objects materially comprised of atoms, bits, or any hybrid thereof.

Just as the REST architectural style for distributed systems can be seen as an application of perdurantism, and just as the Web architecture is an application of REST, a message passing architecture can be an application of REST as well.

Message Passing Architecture as Applied REST

Message passing architectures, also referred to as publish-subscribe or pub/sub architectures, have become popular over the years in large, event-driven distributed applications. Rather than require components to actively identify and pull representations of resources from other components, the data is more efficiently pushed out to them based on a previously established indication of interest, usually via subscription. We'll use this as our non-Web example of applied REST.

Subscribers typically subscribe to a subject (i.e., the resource identifier), and publishers publish notifications, events, or data updates through this subject. An example would be a real-time market data feed, where the subject is composed of segments that identify a particular spatial part of the service (e.g., execution venue and symbol), and temporal parts published are real-time quote and trade data updates. Some publishers, especially those providing trade and quote feed services, communicate session-level sequence numbers and allow subscribers to submit sequence inquiries if updates are missed.

This message passing architecture can meet the constraints of the REST architectural style. Fielding's dissertation provides web examples. Let us look at the architectural elements of REST and explore some examples for our message passing architecture:

Data Elements
  • Resource: order, basket, order book, trade, trade blotter, execution report, symbol, montage, exchange, NBBO, etc.
  • Resource Identifier: URI to identify the spatiotemporal "shape" of data to be transferred between components, as a subset of all parts that comprise a given resource.
  • Resource Metadata: caching semantics, version info, sequence number, identity of new resource, content type mappings; made available in a metadata directory service that exists in a system configuration manager.
  • Representation: full initial or current data snapshots; delta updates communicated to subscribers; updates with before and after images that include in and/or out of focus disposition.
  • Representation Metadata: content type, sending time, sequence number, checksum, encryption method, content length, etc.
  • Control Data: message type, computer or location identifier for sender or target (or any intermediate component in between), retransmission semantics such as possible duplicate, possible resend, original sending time, last sequence number processed.
Connectors
  • Client: facilitates connectivity for initiator of communication such as a subscriber or requester; hides details such as use of connection retry policies to handle reconnects.
  • Server: same as client connector, except facilitates connectivity for receiver of client communication such as a publisher or request handler.
  • Cache: located in both the client and server to reduce latency for data dependencies that require injection of external content to send or process requests.
  • Resolver: resolves computer or location identifiers into IP addresses and ports of target or intermediate components.
  • Tunnel: SOCKS proxy creates an SSL tunnel to a message router on behalf of any clients behind a firewall.
Components
  • User Agent: initiator of communication such as front-end GUIs for front office sales & position traders, middle office, system administrators, and test simulators.
  • Proxy: instance of a content-based router selected by initiator components to broker queued requests or perform pub/sub message routing.
  • Gateway: sits within same physical machine as the origin server and encapsulates process configuration and management of fail-over routing of content to backup message routers as well as routing requests to be processed by components operating at various nice levels.
  • Origin Server: the ultimate resource managers such as the system configuration manager, order manager, position manager, master blotter, market data server, market manager, etc.

Conclusion

The above should provide enough substance to show that a message passing architecture can fit within the constraints of the REST architectural style. Sure, we can perform an exhaustive analysis of the actual constraints and confirm that the above architecture conforms, however my focus is not REST purity. It's exploring how distributed systems can be architected using the REST style. If you have any doubt remaining, look at what Fielding himself has to say about this topic.

In future blog postings, I hope to explore this message passing architecture in the form of a content-based router as an example that transfers representations between REST components as a key component of a RESTful architecture that has no dependency on HTTP GET or POST. I'll address how requests for representations are queued and dispatched to request handlers and how representations are published to interested subscribers without jumping on the REST bandwagon and misusing the REST architectural style through the confines of the Web architecture.