JWT Profile for OAuth2 Access Tokens with Vittorio Bertocci
JWT Profile for OAuth2 Access Tokens with Vittorio Bertocci
In this episode of Identity, Unlocked, the interviewer becomes the interviewee. Our usual host and Auth0's Principal Architect, Vittorio Bertocci, is interviewed by Matias Woloski, CTO and Co-Founder of Auth0. During this discussion, they focus on the JWT Profile for OAuth 2 Access Tokens.
Season 2 of Identity, Unlocked is sponsored by the OpenID Foundation
Like this episode? Be sure to leave a five-star review and share Identity, Unlocked with your community! You can connect with Vittorio on Twitter at @vibronet, or Auth0 at @auth0.
Recording: Buongiorno everybody and welcome. This is Identity Unlocked and I'm your host Vittorio Bertocci. Identity Unlocked is the podcast that discusses identity specifications and trends from a developer perspective. Identity Unlocked is powered by Auth0. The season is sponsored by the OpenID Foundation.
Matias Woloski: Hello everybody. This is Matias Woloski, the CTO and co- founder of Auth0. I'm your host for today's Identity Unlocked episode. We wanted to choose something different for today's episode, the last one of season two. And so we decided to turn the tables and interview Vittorio, rather than the other way around. In this episode, we will focus on the JWT Profile for OAuth 2 Access Tokens. And we are chatting with Vittorio Bertocci, Principal Architect at Auth0 and author of the specification. Welcome Vittorio.
Vittorio Bertocci: Thanks, Matias, for having me.
Matias Woloski: Thanks for joining me today. It's tradition, let's start with how you ended up working identity. This is a very interesting story that I know, but I would like to hear that for our audience.
Vittorio Bertocci: Sorry. I know, Matias, you've heard this a hundred times. So please be patient as a repeat it once again. So like most of the people that showed up at this microphone for this show, my trajectory to identity was completely random. I really didn't want to do this. I wanted to do computational geometry. I studied it at the university for doing geographical terrain representation, I started, I wanted to do research from that. And then the year 2000 came and that dotcom bubble popped. And I find myself in Milan having to pay for my rent. And the only company that was hiring was Microsoft. And so I applied and they got me in, even if I'd never used Microsoft before. And interestingly, of course, we didn't do anything about visualization. So I ended up working on a customer that had both Microsoft and IBM and they really wanted to interoperate. So they were using Web Services, which was a big thing in the year 2000. And I was using a product called Web Services Enhancements, which was in beta. And it was protecting web services and it was using SAML as a token for protecting the calls. And so we did this entire project. It interoperated, it worked really well. And then when this product came out of beta, the people in Redmond decided to pull support for SAML. But we already sold the thing to the customer, and so the poor consultant had to do something. And that something was, I re- implemented support for that from scratch. And that somehow create this idea on if you have a question about identity, ask Vittorio. And so people started asking me question, and I didn't want to do turn them down, and so I Googled my answers. And so eventually it became a reality. I did learn identity just by doing that. And so also at the time I had a blog and it was 2003, so it was very early on. And people in corp noticed and they offered me a position. And so I moved to corp. I started working as an evangelist for enterprises. So not specifically identity, but identity was the most complex topic. No one wanted to touch it with a 10- foot pole and so I played with it. And so long story short, I worked more and more with that. And I made friends along the way, including you, Matias, we worked together a lot of interesting identity projects. And then eventually moved to the Product Team where I started working on the SDKs and developer features. So visual studio, support for identity, Azure, Office. And after some time I decided that they really wanted to get closer just specifically to the identity, to the developer audience for identity. And so I made this leap and I joined Auth0 about three years ago, and here we are today.
Matias Woloski: Awesome. But there is a little secret that people don't know, but I tweeted about this a couple of years ago. In 2006, when I was starting to get into... We were doing software consulting and I started to get into this whole identity thing. The first thing that really caught me into this whole topic was your video on WS- Trust, explaining in the whiteboard in the way that you love doing. And I was like," Wow, this is a super interesting topic I want to know more about." And here we are. I don't know, what is it? 15 years later working to other and improving the status quo of identity. So it's great to have you and it's always been a pleasure to work with you and learn from you.
Vittorio Bertocci: People cannot see it because it's only audio, but I am now red as a bellpepper. Thank you.
Matias Woloski: All right. Well, let's get into the topic of today, which is the JSON Web Token Profile for OAuth 2 Access Tokens. Mouthful spec name. So why don't you tell us a little bit about it? The problem statement and the value proposition of it.
Vittorio Bertocci: It is a mouthful, but the idea is really very, very simple. The idea is if you are using JWT as a format for your access token, which is something that I'd say most people in industry already does, these specification tells you how. It gives you a list of minimal claims that should show up. It tells you how to emit that token, depending on specific aspects of a request. And most importantly, I'd say, it tells you how to validate incoming token, it gives you very specific rules on how to do that. And it has sections in security and privacy that tell you what to watch out for if you decided to go that route.
Matias Woloski: I remember when we started talking about this internally, this was probably a year and a half ago, and now this is becoming a new spec, it took some time. But can you walk us through how a spec goes from idea to an actual spec?
Vittorio Bertocci: I think it reasonably typical, as in lots of other specs that went through the same route. For this particular problem, this was a problem that bugged me for most of the time I worked on OAuth because OAuth is very generic on the matter of what goes inside of the access token. They just say," Well, access token, is supposed to perform this function," But then it's completely under specified. And in the normal scenario it's okay. Because in the original scenario that the authors thought, the authorization server and the resource server are co- located. And so the access token can be just a reference to database that both entities can look at. So it made sense for them not to impose any structure in this. But if you look at the market, Auth0 is just one instance of this common pattern in which you have a piece of software that performs the authorization server function and the thing that consumes the token, it is in a completely different place. And so once it receives the token, it doesn't have the opportunity to just look up the same database. It has to validate this thing. And of course, there are ways of doing this, there was the introspection spec that came out and similar. But if you observe what people actually do, they were actually using JWT all over the place. Pretty much everyone that I worked with were using JWT. But the thing is, with no guidance, everyone were doing pretty much a different thing. There is common guidances in ID tokens, smells similarly, although they are not used the same way. And so people did kind of an ID token. So in this particular case, this spec came out of my frustration of observing that everyone is doing almost the same thing, functionally the same thing, but with syntax comes in the way and so we cannot really interoperate. And also a lot of people that are not super deep in the spec made the occasionally questionable choices. For example, I needed to send a JWT, the closest thing is an ID token. So I send an ID token without realizing that there are security issues in sending an ID token in lieu of that. And so the idea is, okay, let's see what happens. So I started doing some research. I contacted my friends at Identity Server, like Dominick and Brock. I contacted Brian at Ping Identity. I contacted Karl in Okta, and I contacted my friends in Azure AD. And I asked them, " Can you please send me an example of access token for users? And example of access token for applications," like for client credentials. And then two years ago, in actually March over two years ago, there was this wonderful conference in Stuttgart 00:09:34. It was then an OAuth Security Workshop. And all the big brains of the space were there. And so I did a little impromptu presentation showing them that these slides saying," Okay, here is a table with all the claims that all these vendors that I mentioned use." There was also AWS, which I got from the documentation. And you can see that this stuff is almost the same, so why don't we do a profile? And there I got an explosion of feedback, a lot of interesting stuff. But the point was, why don't you write a proposal and you present these at the IETF meeting? Which was two weeks later. So on the plane back, I furiously wrote the draft of this specification, which has to follow a really arcane format. There's XML2RFC, which is interesting and very moody. But anyway, I spent the entire flight back writing this thing, and then I did a little presentation, I submitted it. And the working group voted for adopting it. So this is typically the first step in the lifecycle of a new spec. Someone has an idea, writes what is called an Internet Draft. There is a tool in the IETF, which allow you to upload the thing. And then this thing appears as an individual draft, which at that stage means nothing. It just means this person wrote this, instead of on a napkin, on something that got uploaded. But in fact, then at this point, it is in a place in which it can be easily referenced and discussed.
Matias Woloski: I've seen what you're saying. When I started reading all these specs earlier in my career, and I was confused about, okay, what is actually an official one versus not? How do I realize if something is... I guess it's like the whole draft scene, but it always stays draft forever it seems. How so do I pick up, okay, what is actually serious and what is making progress versus not?
Vittorio Bertocci: So there, the main differences in the URL. Let's say that the moment in which you submit a draft, at the first stage is called an individual draft. And typically in the name of the file, hence in the URL, there is your surname. And so the first submission was DraftBertoccOiAuthAccessTokenJWT. Then once the group meets in-person, or virtual IETF, or on the mailing list and similar, and the consensus is reached that the working group wants to adopt these and actually work to improve it and make it become a standard, then this thing change name. Someone like... let's say an Officer. An Officer in IETF goes and says," Okay, now let's transform this thing from DraftBertocciOAuthAccessToken to DraftIETFAuthOAccessToken. And then when you go in the Datatracker, which is another page in the IETF, you see these lanes in which you see all the various versions, and you see that there is the lane and it starts with individual and then it becomes work. Some stuff remains individual because it's not... The fact that it doesn't become a working group item, it's not necessarily an indictment that it's bad. It might simply be that it's a bit niche, that it's a bit specific, or that it talks about stuff which is not specifically for one working group, but maybe crosses other, or maybe the times aren't right for that. But in general, the main difference is if you see IETF in the name of the spec, then it is something that is being worked on. And then when you actually open the file, it will be different degrees of draft. And then when it actually says RFC instead of draft in the name, that means that it went through all the pain which is necessary to turn it into a spec. And then it is officially a spec.
Matias Woloski: So we are getting a bit of trouble. I think this is interesting for people to understand, for someone to write one of the things, anyone can do it, but there's certain tooling that you have to use, right? How that works?
Vittorio Bertocci: Anyone can write it. The main thing that makes these a bit funny is that there is a specific syntax that you are supposed to use for writing RFCs. And when I started this back two years ago, the common thing to do was to use a specific XML dialect. And this XML dialect is used for structuring your document and every internet draft has always the same shape. It has a header, which has to contain specific fields, whenever you make a reference to something there is a specific tag you use for doing this reference. And then when you author this stuff, then you can use tools for validating whether the stuff is working, is not working, wherever you are making violations. So there is a lot of work in terms of learning the syntax, so to say. So it's not just a matter of having idea, but you have also to learn a bit the tools. One thing I'd say is that the community is absolutely amazing. If you are trying to do this and you're stumbling on issues, the people that work in the space, the people that you see answering emails on the mailing list, are incredibly kind, super open to help. And in fact, here I have to publicly acknowledge Brian Campbell who has been incredibly patient when I was learning this stuff. And then now there is another alternative, which is Markdown. There is actually two flavors of markdown that you can use it for achieving the same result. It is significantly simpler. And for that being said, I have to thank Torsten Lodderstedt. I know I am butchering his surname. Sorry, Torsten. Sorry. And Daniel Fett who both work in Yes.com and they really helped me with Markdown, which I'm using for other specs that I'm working on.
Matias Woloski: Okay. Let's go back to the original track of the JWT profile for OAuth 2 access tokens. So you mentioned before that this created an explosion of feedback and ideas. Can you talk a little bit about that? What triggered the amount of debate?
Vittorio Bertocci: Basically during the Stuttgart first presentation, it was first the notion of actually providing something that gives structure to an access token. For a while people have been used to not rely on a structure. And so some people were worried that by bringing this in we would violate some principles. And then we just debated, concluded... No. As long as we maintain the layering that we normally have, as in the client keeps not looking inside the token, but only the recipient does, then we'll be fine. And then there were other things. For example, one aspect that I observed some vendors do is they express the token the way in which the token was obtained. Did the client authenticate itself or not? And so I added it as one of the things that I was suggesting we should use. And people pushed back saying," No. Too much confusion." In general, this was a harbinger of things to come. The feedback at that point was still pretty gentle. As in, people in front of a beer, we were in Stuttgart after all, discussing things we are passionate about. But the moment in which this thing became something that we started working on as a working group, then the flood gates really did open. We started really debating hard on things like... For example, I wanted to make the request to only include one resource. Because if you have more than one resource and your token has in the audience more than one resource then you get into trouble, because you might have scopes that have the same string for multiple resources and so you don't know for what this scope applies to. And so we debated stuff like that. Or for example, in the metadata that you use for declaring their keys that you will use for validating stuff, which is similar to what you have in OpenID Connect. At first, I thought that one could differentiate the keys that you use for access tokens versus ID tokens. And instead I had a dialogue with Annabel Backman from AWS. Who thought hard, but it was probably before I even mentioned this into the spec. And basically, she demonstrated very clearly that today, without extending metadata, you cannot reliably do that. The moment in which you have keys in your metadata, there is no way for the resource server to really differentiate. And to me, that example is one of the big values of the process of doing standards. Because I'll be completely honest, I worked with access tokens in JWT with hundreds of customers. I answered tons of questions on S tack Overflow. I presented and explained this many, many times. So once I started writing this thing, I was expecting I would mostly write my experience and my experience would be it. And instead, in the process I really learned that I know very little actually. Let's say that there are so many scenarios that I didn't experience, but nonetheless, they are super valuable. And without this process, without putting this stuff in front of experts, without giving them the time to contribute their experience, those things would have been missed. And so I'm still not crazy about how slowly and how painfully things move in the context of standards. Not every interaction is as productive. Sometimes there will be someone who didn't follow the discussion at all, and will show up and say something. You have to say," Here. There's a link to this other discussion we already had," and maybe they don't even read it. Or there is someone who have a pet peeve and say," I want to save the world and we try to use this particular spec for saving the world." And say," Well, saving the world is out of scope." So not every interaction is productive, but the ones that are productive are absolutely key for truly giving to the community the right guidance. This has been incredibly humbling for me as... Truly, I've learned a lot. Both about this topic, but if the meta- learning is yes, this process is truly necessary.
Matias Woloski: Yeah. That's super interesting. I've seen some of those debates in the mailing list and how it can be frustrating at points. But then when you get the practitioners debating and getting their point of view, that's when things are good. So back to this value proposition because just to round up like the whole idea of this. When you started thinking about this and if you now fast forward five years from now and think all these resource servers, and middlewares, and API gateways adopted this profile, what is going to be easier? What that's going to mean for everyone?
Vittorio Bertocci: I think about there are a number of potential advantages to this. The main one is interop. Let's say that today we have a number of middlewares out there, which do their best to provide the ability to validate incoming access tokens, but in fact, that there is a limit to what we can do. And say that we mimic what we do for ID tokens, but access tokens are different. There are scopes , other indications in there? So having a truly interoperable SDK, having authorization service that emit tokens which are truly interoperable, should allow us to take even more concerns away from the developer that can focus on creating the app instead of thinking, " Oh, this particular provider uses with scopes with a string 'scp'. Whereas, this other uses a JSON array and it's called a 'scope'." So the obvious kind of things we can get rid of and we can have a generic SDK for that. The other effect that I'm really hoping we can achieve is to stop people from using ID tokens instead of access tokens. Let's say that apart from very, very special cases, normally it's a bad idea. Because the audience of the ID token is the client, and so if you started using the ID token against an API, which is that as a different identifier, now every token that you can get from your client can be used with any API that can accept this. And so it kind of defeats the purpose of having an audience. Because now if that token gets leaked, it can be used with way too many APIs. So I know a lot of people in the industry do that officially. Kubernetes, for example, they need a JWT and so they use an ID token, but there are those challenges. The other thing is that the ID tokens can not be sender constrained because they are not designed to be sent away from the client. And so today it's still a bit academia because we don't have a lot of sender constraint out there, but the hope is that it will become more and more common. And so by adding access tokens, we'll be able to actually do that more reliably. And then finally, I think that we can help with these by creating a more reliable structure for asking for tokens and for sending information out. Because today scopes are the only construct that people know that can do authorization in transit. There are other specs, like SCIM and similar, that speaks about data addressed. As in, how do I represent groups roles and similar. But then when it comes to actually placing those in a token, which a lot of people do, then everyone has to do it in a slightly different way. And very often you'll see people abusing scopes and using scopes to express that kind of information. And so by officially separating those, we'll have hopefully a better way, again, of creating SDKs, of creating gateways that can consume that information automatically so that we can get the authorization a bit more streamlined. Instead of today, everyone having to reinvent the wheel. And finally, the last thing I'd say is that a lot of people include identity information in the access token. Because if you are in a scenario in which you are sending the access token as a first party, the access token is the only artifact you are receiving. And so if you want identity information, unless you do something out of band, you've got to put it in there. And people do put it in there, but the privacy implications, the consent prompts that you need to meet in that case, they are not always understood. And so that spec gave me the opportunity to gather the collective intelligence of the community and place recommendations in there, so that hopefully people when they design the systems can now have more concrete, actionable guidance to get things right and preserve privacy.
Matias Woloski: Yeah. This is great explanation. Thank you. So do you want to close with some call to action for this?
Vittorio Bertocci: Absolutely. So with mixed feelings, I would invite people to follow the discussion for this particular spec on the list. Now we just enter the IESG phase. And so it's, again, open for feedback. And so if you use JWT as a form for your access tokens, by all means, please take a look and see if there is something that you'd like to include as a scenario, if you want something different and similar. I say mixed feelings, because I really want to be able to close this thing, but at the same time, I really want it to be right. There's no point of getting to an RFC and then it doesn't work for the scenarios it's meant to work for. The second part of it I'd like to issue is a bit more complex, but I want to do it anyway. If you look at most of the interactions that you see in there, you'll see that we are mostly guys. We are mostly middle- aged, mostly white guys and I think that's a shame. And I think that the process is not always very friendly, but there are no technical constraints that stop anyone from subscribing to a mailing list and contributing to a mailing list. You don't have to be an expert on everything. As long as the scenario that is being discussed, not just this spec, any spec that is discussed, if it is a scenario that you have an experience on, don't worry about knowing everything else. You know what is your experience with a thing. And your experience is practical and hence, it is important, it is interesting for everyone. So we are missing your voice substantially. And so here, my call to action would be please participate. And if you need help, if you need a Virgil that guides you through Hell and says," Here is how you do this," I'm always open. Find ways of connecting with me. I know that others in the community are in the same place. So please, if you want to contribute, do not hesitate, get in touch and we do what we can to help you because we really want to hear your voice.
Matias Woloski: Excellent. That's a really nice call to action. And you really gave us a lot to think about today with this new spec. And join the mailing list. Don't give too much feedback, just enough so that it gets you perfect status and the opportunity to get also mentor and work together with Vittorio in these type of things. And if you want to participate in this mailing list, you need to get to it because it's a lot of white guys talking. This is the opportunity you should definitely take on that. So thanks a lot, Vittorio, for your time today. This was a bit of a special one. We reversed roles and it's an honor to be the host on this Identity Unlocked podcast. Which by the way, it was a bit of a baby that we created two years ago in some brainstorming sessions in a one- on- one. So I'm super happy to see how this is evolving and the amount of topics that you're talking about, and the amount of knowledge that we are sharing through these vehicles. So super happy with and again, thank you.
Vittorio Bertocci: Thank you so much for interviewing me today and the backing of the initiative until now. Thank you.
Recording: The OpenID Foundation is a proud sponsor of the Identity Unlocked podcast. Since its formation in 2007, the foundation is committed to promoting, protecting, and advancing the OpenID community and the technologies. Please consider joining the foundation and contributing to current working groups. To learn more about the OIDF, please visit www. openID. net. Thanks everyone for listening. Subscribe to our podcast on your favorite app, or at identityunlocked. com. Until next time, I'm Vittorio Bertocci and this is Identity Unlocked. Music for this podcast composed and performed by Marcelo Wolowski. Identity Unlocked is powered by Auth0. Copyright 2020, Auth0 Incorporated. All rights reserved.