Filter by topic and date
JSONPath: from blog post to RFC in 17 years
- Glyn Normington RFC 9535 Editor
21 Feb 2024
Today the JSONPath RFC (RFC 9535) proposed standard was published, precisely 17 years after Stefan Gössner wrote his influential blog post JSONPath – XPath for JSON that resulted in some 50 implementations in various languages.
As one of the editors of (RFC 9535), I’ve been reflecting on JSONPath's journey to, and through, the IETF process.
Stefan recently shared his perspective with me:
“After my online publication 'JSONPath – XPath for JSON' in 2007, there was initially a long period of silence. My interests had meanwhile shifted more towards web applications in mechanical engineering when I received requests from Glyn, Tim [Bray] and Carsten [Bormann] to jointly publish an IETF standard. Looking back, I am glad to have joined this Working Group. The new insights into the collaborative processes of the IETF, the discussions and meetings were very stimulating. Surprisingly, different points of view always led to a common consensus and finally to the current IETF standard.”
Tim Bray, one of the chairs of the Working Group, added this:
“Not everyone who stumbles into the IETF process likes it, and I’m glad Glyn wasn’t one of those who bounce off. The process – no voting, consensus calls, reviews by a wider and wider community – feels like the most natural thing in the world to me. I’m happy with the way the document came out. As the standard RFC-series language says, it represents 'the consensus of the IETF'. That’s a powerful statement; the Internet, as specified piecewise in a few thousand RFCs, is one of Homo sapiens’ largest creations. The community owes thanks to the editors and the rest of the Working Group.”
Introduction to JSON and JSONPath
According to its defining RFC 8259, JavaScript Object Notation (JSON) is “a lightweight, text-based, language-independent data interchange format”. It is widely used for storing and transmitting data. A JSON value is one of the following:
- a string, e.g."red"
- a number, e.g. 23, 8.95, 3e8, 1e-10
- one of the values true, false, or null
- an ordered collection of values, known as an array, e.g. [2, "cat", 4]
- an unordered collection of name/value pairs, known as an object, e.g. {"m": 10, "n": false}.
A value can be thought of as a tree of nodes, each of which is a value. Consider a simple example of a JSON value representing the contents of a store with two books and a bicycle:
The value is an object containing a single value named “store”. The “store” value is an object containing two values: an array named “book” and an object named “bicycle”. Each book in the array, as well as the bicycle, is an object with various values consisting of strings and numbers. The corresponding tree of nodes is shown below:
JSONPath provides a way of selecting and extracting nodes from a JSON value. For example, the JSONPath query $.store.book[0]
applied to the value above selects the first book (by Nigel Rees) whereas the query $.store.book[*].price
extracts the prices of books: 8.95
and 22.99
(since [*]
selects all the nodes of an array). The query $..price
extracts all the prices: 8.95
, 22.99
, and 399
(although not necessarily in that order; ..
selects the nodes contained in a value, known as the descendants of the value).
The result of a query can contain any number of nodes and so is considered to be a, possibly empty, list (known as a nodelist).
The uses of JSONPath include:
- Selecting a specific node in a JSON value
- Retrieving a set of nodes from a JSON value, based on specific criteria
- Navigating through complex JSON values to retrieve the required data.
For more information, please refer to the RFC. With the basics of JSON and JSONPath covered, the following is my personal perspective.
Working Group activity
A few years ago, I couldn't find a JSONPath implementation that documented the syntax and semantics clearly, so I wrote my own. Towards the end of this effort, I decided there was a need for a standard.
In 2020, I settled on the IETF as a standards body and gathered together a small group of JSONPath implementers (found via Christoph Burgmer's excellent JSONPath comparison project). Together we cooked up an initial spec and published it as an IETF Internet-Draft.
Around the same time, Stefan and IETF veteran Carsten Bormann submitted their own draft, based closely on Stefan's original blog post. In September 2020, we joined forces in a Working Group (WG) with the help of Tim, James Gruessing (who co-chaired the WG), and others. After agreeing a charter, we merged our drafts and set about iterating on the spec.
The WG met online and never in person, but I think we got to know each other fairly well by interacting on the mailing list, issues, and pull requests. The WG needed to support regular expression matching for filtering objects and arrays, so Carsten and Tim defined a subset of most other regular expression flavours (“I-Regexp”) that was ultimately published as RFC 9485.
The WG is now dormant. The mailing list is still open for business. The spec repository lists topics deferred by the WG or raised subsequently. I don't plan to work on another version, although others may.
Sticking to the charter
The WG charter included the following:
"The WG will develop a standards-track JSONPath specification that is technically sound and complete, based on the common semantics and other aspects of existing implementations. Where there are differences, the working group will analyze those differences and make choices that rough consensus considers technically best, with an aim toward minimizing disruption among the different JSONPath implementations."
There was a constant tension in trading off technical soundness against common semantics. Where there was a clear consensus among implementations, the spec tended to take that line. But where there were variations in behaviour, we had to choose what we thought made the best technical sense.
At several points, suggestions were made for going beyond the clear consensus. Many of these were captured as issues labelled revisit-after-base-done. In general, we avoided innovation.
There was also pressure to give some implementations, such as the popular Jayway Java implementation, higher precedence than others. But biassing the spec towards particular implementations would not have been consistent with the charter.
Then there was pressure to adopt features unique to particular projects, such as JMESPath which goes way beyond the original blog post, but is carefully thought through and well-documented. Again, this would not have been consistent with the charter.
Extension mechanism
One area where we had to innovate was the extension mechanism for functions inside filter expressions. We wanted it to be possible to define functions without having to change the spec, but there wasn't a common precedent to follow.
The main complexity was defining what constitutes a valid, as opposed to a merely syntactically well-formed, function expression. We introduced a type system to define validity, but there was extensive debate around:
- subtyping and implicit type conversion,
- functions to perform explicit type conversions,
- the usefulness of a
SingleNodeType
type for the result of a singular query (a JSONPath expression that necessarily outputs at most one node), and - whether to attempt to retrofit the type system to the rest of the spec.
To avoid unnecessary complexity, we ditched subtyping and the SingleNodeType
, defined just one implicit type conversion (from NodesType
to LogicalType
), and limited the type system to function expressions.
The final type system, consisting of just three types, seems to work well:
The solid arrows denote “is an instance of”; the dashed arrow denotes the implicit conversion from NodesType
to LogicalType
.
IETF openness
This was my first experience of developing an IETF standard and I was delighted by the truly open approach. As I was accustomed to working on open source projects, the openness felt reassuringly familiar and it was this that initially attracted me. Anyone is welcome to submit an Internet-Draft, get involved in a WG or to provide one-off pieces of feedback. Indeed there is no concept of membership of a WG, or even of the IETF. For example, the JSONPath Interim drafts, mailing lists, and spec repository are all publicly visible.
The journey of a spec through the IETF approval process is transparent: see, for example, the handling of an objection during the last call process for JSONPath and the archive of the Authors’ Final Review (AUTH48) process.
There was also no corporate involvement in the JSONPath WG (although my previous employer, VMware, sponsored some of the early work on the spec, but without interfering in any way). This was refreshing as I had previously represented my employers in other standards bodies and occasionally found the politics tiresome.
Consensus decision-making
Another interesting facet of the IETF was the notion of rough consensus, which helped the Working Group avoid stalemate on a few occasions. My understanding of rough consensus is simply that unanimous agreement isn't always necessary, but where there are objections, the reasons for these objections are explored by the group and a decision taken on the basis of this exploration.
This approach to decision-making suits the IETF and its WGs: since they don't have members, voting would be meaningless. It also avoids “design by committee” where politics impacts technical quality.
Adoption
Now there is a standard JSONPath, it will be interesting to see whether, and how quickly, implementations start to adopt the standard. The standard already has the following implementations:
- JsonPath.Net in C#
- jpt in Ruby
- serde_json_path in Rust
As new programming languages are created and need support for JSONPath, hopefully these will be written to conform to the standard.
Another potential use for the standard is in documenting non-standard implementations in terms of which features do, or do not, match the standard. At the very least the standard provides some well-defined terms for discussing JSONPath implementations.
Clearly, with some 50 implementations in the wild, convergence is likely to be a slow process. But the beauty of an IETF standard is that it is a permanent document that should stay the course.
Conclusion
It has been fascinating seeing the development of a RFC from scratch. I am indebted to everyone who worked on this: those who used my Go implementation and reviewed its documentation, those who contributed to a Rust implementation and a test suite (both described below), and, of course, my fellow editors, the rest of the WG, the RFC Editor, and others in the IETF who helped bring the RFC to fruition.
Follow on work
There has been good progress in developing a Compliance Test Suite for JSONPath and some possible beginnings of a Reference Implementation, both described below.
Towards a Compliance Test Suite
There is an incomplete Compliance Test Suite (CTS). The most notable omission from my perspective is testing for non-deterministic behaviour. Currently, all the tests in the CTS produce deterministic results. So any valid implementation should pass the CTS. However, the non-deterministic aspects of the spec (which stem from the unordered nature of JSON objects) are not tested by the CTS.
The implementations mentioned above have been tested against the CTS.
Carsten's Ruby implementation of RFC 9485 contains some basic tests, but a full CTS has not been developed. Perhaps a CTS could be based on the XML Schema Test Suite.
Towards a Reference Implementation
I started a Rust implementation that I hoped could be used as a Reference Implementation. However this effort is now dormant since I ended up spending much more time editing the draft than I anticipated and the parsing code, using the pest parser, was surprisingly laborious to implement (compared to my previous hand-crafted parser in Go).
The point of using the pest parser was that the parsing expression grammar (PEG) was cleanly separated from the implementation and was therefore easier to compare to the ABNF in the spec than, for example, a hand-crafted parser.
Carsten's Ruby implementation uses his abnftt tool to generate a PEG parser from the ABNF in the spec. Given that the WG tried to keep the ABNF “PEG-compatible” by ordering choices suitably, the parser is highly likely to match the spec. This may be a more promising approach for a Reference Implementation than manually comparing a hand-coded PEG to the ABNF.
Carsten also wrote a Ruby implementation of RFC 9485, again based on abnftt. The code is available on github. For usage information, issue:
$ gem install iregexp
$ iregexp --help
This is a reference implementation of RFC 9485 only insofar as it can be used to express I-Regexp regular expressions in other popular flavours such as PCRE or JavaScript regular expressions.
Editor's note: This post claims the RFC and the post itself were published on 21 February 2024, the date of the post, because of the timezone in which the RFC publication occurred and because of the whole number of years since Stefan Gössner's original blog post. The RFC was actually published in the early hours of the morning of 22 February in the UK, the timezone of this post's author, with this post published later on 22 February. The original version of this post was published on the author's blog.