A Lap Around the OAuth2 Security BCP with Daniel Fett
Buongiorno everybody and welcome. This is Identity, Unlocked, and I'm your host, Vittorio Bertocci. Identity, Unlocked is the podcast that discusses identity specs and trends from a developer perspective. Identity, Unlocked is powered by Auth0. In this episode, we focus on the security BCP for OAuth 2.0, to the top, and our esteemed guest today is Daniel Fett, Security Specialist at yes. com and author of various key IETF specs. Welcome, Daniel.
Daniel Fett: Hi, Vittorio.
Vittorio Bertocci: Thanks for joining me today. Can we start with how you ended up working in identity?
Daniel Fett: Sure. I think that's actually quite a good question. Back in 2015 I was doing my PhD at the University of Stuttgart, actually at Trier at that time, and then later I joined Stuttgart, and at that time we were developing a technique for the formal analysis of web security. That is, use mathematical, logical approaches to security in order to analyze the security of protocols like TLS protocol, or in this case web protocols. One area that we found interesting is various kinds of authentication protocols. Do you remember Mozilla BrowserID?
Vittorio Bertocci: I do, I do.
Daniel Fett: That was actually the first research object for us. We analyzed that and found various security bugs in there, but it was a really interesting system. So we published two papers on that topic and then another paper, but people kept asking," Have you looked at OAuth yet?" Every conference when we presented something cool on the BrowserID they said," Have you looked at OAuth?" We were like, no, not yet. In all honesty, we thought that OAuth would be boring. So, in BrowserID you had post message communication, you had various IFrames and windows and so on, so it was really complicated and super interesting to analyze. We thought, well, with OAuth it's essentially to redirect. I mean, what could go wrong there? There are so many researchers and so many papers on OAuth already and we worry that we will not find anything interesting there. You know, if you want to publish a paper you need to find something and the best thoughts you find in the tech. Then we said, okay, let's do OAuth so that people stop asking us about OAuth. Then we analyzed OAuth using our former analysis technique and actually found several bugs there. Most of them were not that critical, but the mix up tech was really interesting. Yeah. So we found mix up tech and this is how he got in contact with the OAuth working group at the IETF. We asked them whether they think this is a real problem and they said yes. I scrambled for an emergency meeting in Darmstadt, Germany and then I got to learn all the people from the group. This is how all of this stuff started for me.
Vittorio Bertocci: That's nice. So you ended up being hooked because now your name is on a number of specs. So clearly you liked what you found and you are now a very active member of that community.
Daniel Fett: That's right. Yeah. Also, it's a great community.
Vittorio Bertocci: Oh yeah. It's a nice cast of characters. Speaking of which, one question I ask to all of our guests, given that we have all been in this space for a while is, how you and I actually met, like, do you remember how we met?
Daniel Fett: I'm actually not so sure when we met because you read the names on a mailing list. So the names are there already. I'm not sure when we met in person for the first time but I think, was it in Stuttgart?
Vittorio Bertocci: I believe that in person, we actually met for the first time in Stuttgart. Yeah.
Daniel Fett: I think so as well. Yeah. So that's OAuth Security workshop.
Vittorio Bertocci: That's a really nice event. All right, fantastic. So we definitely have the absolute best person to talk about the topic of the day, which is the security BCP. Before we get into the meat of it, I'd like to ask you, what is a BCP document and how is that different from the core specification?
Daniel Fett: To my understanding, sir, the best current practice document means that the document's not really a completely new approach or a new technique or some, say a new standard, but it says how to use standards and techniques in the best way possible or in the best way that is known today. That to my understanding is what a BCP should do. It should tell you how to use something securely and otherwise better design to how you would do that today.
Vittorio Bertocci: I see. So it provides practices as the name suggests, but doesn't necessarily override the core specification, as in it tells you what's the best way of using that specification, but doesn't really change. Like, if you are doing something different from that, it's not like you lose compliance. It's just maybe it's not very wise, but you are still compliant as long as you follow the core.
Daniel Fett: In our case, in the OAuth case, we have the situation that OAuth really gives you a lot of flexibility. So you have many, many options to do things, some of which I've documented in the original RC6749 and 6750. But out of all of these options, we now know that some of them are not as secure as others, and there might be other things you need to be doing to have a secure implementation. This is what we want to document in that RFC.
Vittorio Bertocci: So that's the goal of this BCP. So what's your involvement in that BCP?
Daniel Fett: My involvements are, I'm a coauthor, Torsten is the editor. Since I started working on that document, which I started three years ago or something like that, I moved through all the parts of the document and proofed it here and there. In the beginning, of course my focus was on the attacks that we found back when I was at the university, but in the meantime I touched most of the parts of the document at some point.
Vittorio Bertocci: Very nice. We are all very grateful that you are working on that. My understanding of that is that it's a long laundry list of things. Practices, things to do, things to avoid and similar. It's a quite comprehensive document so we won't have time to go through the entire thing today but what I wanted to ask you is if you were to name the top three most impactful recommendations in that BCP, which one would those be?
Daniel Fett: Yeah. As I said, there are a lot of recommendations in there. We really tried to cover all the aspects, but I think there are three very impactful and very important things. The first one would be the recommendations to not use the implicit grant any longer. The second one would be, if you use the authorization program then you should also use PKCE. And the third one would be to use standard constraining for the access tokens when that is possible, or at least have some kind of a rotation for the refresh token these of course should also be sender constrained otherwise.
Vittorio Bertocci: Oh wow. Definitely big ones. Okay. So let's pick all of those apart and see if we can understand what is the risk that people incur if they don't follow the recommendation and what's the meat. Like, what's the impact of the recommendation. The first one you mentioned is the recommendation to stop using the implicit grant. What are the problems with implicit grant?
Daniel Fett: Yeah. As you know, the implicit grant means that in OAuth the authorization server creates the access token and sends it to the browser that requested the access tokens then to say, this is a pattern that you would usually use in a single- page application and traditionally this would be the only pattern where we're forcing the page application, because when OAuth was written, things like cross- origin resource sharing were not another thing, so the authorization server had to send the access token directly to the browser, if you want to use the excess token in the browser.
Vittorio Bertocci: With "directly" you mean from the authorization endpoint and by placing it in the URL somehow.
Daniel Fett: Exactly. So, the authorization server, after the user authorizes the application to access our data, the authorization server would just send a URL to the browser where the access token is contained in the URL. So with the experience that we now have, we saw that there are several problems with that from security standpoint. Of course, from the usability standpoint I would say it's fine, but from a security standpoint there are several problems. The first thing is that whether you need the access token actually in the browser or not, you will have the access token in the browser. There are several applications where the access token is not actually needed in the browser so the browser would just send the access token back to the server of the client. Nonetheless, you will end up with the access token in the browser. When the access token is in the browser, it's susceptible to things like cross- site scripting, our favorite, web security problem, which still has not gone away. So you have this really powerful token in the browser and I think that's just not a good place for it.
Vittorio Bertocci: This is going to be controversial I'm telling you, because many developers do like to get an access token in JavaScript and although as security people we prefer that not to happen. The reality is that so many applications do require that. So we tell them that the implicit is an insecure way of delivering that access token in there. Instead, there were better ways of doing that. I agree with you, of course, that having the access token in the browser is less than ideal from a security perspective, but we need to face the fact that people need it, and so the best we can do is to give them a way to deliver that token in a more secure way. Right?
Daniel Fett: Yeah. That brings us to another problem with the implicit flow and that is that the access token is delivered in the URL through the browser. That is the cause for many troubles. For example, URLs are just not a good way to have anything secret. For example, URLs end up in your web browser history. On mobile devices they can be logged as well. They might end up in web proxies on either side of the connection, of course. Of course, the big topic with OAuth, if you have a redirection that doesn't go to the intended receiver but somewhere else, and we still see these kinds of problems quite often, then the access token ends up at some other place. At attacker.com instead of...
Vittorio Bertocci: Right. Referer header.
Daniel Fett: Yeah. The Referer header. So there are just so many ways that things from URLs can leak, and this is really why you should use the authorization code flow instead. With the authorization code flow, you just have this authorization code in the URL. It's still a secret, it's still in the URL, but it's short- lived and you can do just many other things to protect this authorization code against misuse.
Vittorio Bertocci: That makes a lot of sense. I like what you said earlier that when the implicit flow was first conceived the use of CORS wasn't particularly widespread. You do need CORS in order to do the authorization code flow from the browser. Right?
Daniel Fett: Absolutely. Because we need to do the backend call, so the application in the browser gets the authorization code and then has to call the token endpoint at a maybe completely different origin to exchange the authorization code for the access token.
Vittorio Bertocci: That's great. Perfect. That's really useful. One thing I think it's useful to clarify is out there, there are tens of thousands of applications using the implicit flow today, and clearly it's not great, but at the same time, I guess that we are not telling people that we found new vulnerabilities and they need to drop everything they have and move to the new system. Right? I guess this is more like a best practice, but it's no reason to panic for people that are already using the implicit flow and taking steps to mitigate its biggest issues.
Daniel Fett: That's right. This is not a new security flow, as you said, but it's just that we found that this is a repeated source of problems. If you have a secure application, chances are that you're not directly affected by this. Of course, there's no real difference in dev so if the URL leaks, somebody else gets the access token, it's game over. We can try to protect against that. Many websites do so very successfully, but we've just seen that there's no second layer of defense there and we've seen that OAuth, in the beginning it was conceived for let's say social media applications or something like that. Maybe for your company's login or something. Depending on how exactly you use it, these are not real high stakes environments. Now we see OAuth being used for financial transactions. We see it being used for so many applications, just that we didn't even think about back when it was standardized that you now need more than just one layer of security. If you're talking about financial transactions, if a bank transfer depends on the access token, you really don't want it to be leaked somewhere else because of some cross- site scripting in the application or something like that.
Vittorio Bertocci: Makes complete sense. So defense in depth, definitely for new apps, you want people to use the new practice and not build instant legacy and for high- value transactions this is a good defense in depth. Fantastic. That's great. Very last thing I want to mention it because it's totally a pet peeve of mine. When people say implicit, they just put everything in the same cauldron and there are all the scenarios that we described that implicit in which the token ends up in the URL, which are all super bad. There is one scenario where implicit is used, where instead, things appear to be still okay. That's when people do OpenID Connect with a Form POST, which basically they use OpenID Connect just like SAML in which instead of using the back channel, you send the token using a POST. Those flows are still okay and not touched by this recommendation. Correct?
Daniel Fett: Well, the access token doesn't end up in the URL then so you'll already exclude many of the problems. Or, essentially the main problems that we see with the implicit grant.
Vittorio Bertocci: Yeah. There is only the ID token and only in the POST, so no URLs. Okay, perfect. Good. Here, I just wanted to make sure I clarify because a lot of people that are not familiar with this space, as soon as the hear implicit has been deprecated, they immediately also worry about that flow. Instead, I think it's important to clarify that part. Fantastic. Great. Thank you. This was the best explanation I heard so far about why implicit shouldn't be used. What was the second item in your list?
Daniel Fett: Second item would be proof key of code exchange or PKCE. That's RC7636. That was something when I first read it, it was like, okay, nice. Nowadays I would say it's really a must have, at least for any new implemention of OAuth.
Vittorio Bertocci: The recommendation here in the BCP is, wherever you use an authorization code grant, you should use PKCE, which my understanding is, different from what was the original scope of PKCE, which was specifically for single- page apps or similar. So tell me a bit more about what PKCE does and why it's a good idea to use it absolutely everywhere now.
Daniel Fett: Yeah. Originally, PKCE was intended just for public clients and the idea there is that you're not to protect the authorization code. So in the authorization code grant, and you know, as I said, that this authorization code comes back from the authorization server through the user's browser in the URL. As we saw before, URLs can leak in various places, so we'd like to protect that code. We'd like to ensure that not anybody can use that code and just go to the token endpoint and exchange it for the access token, because otherwise we wouldn't gain any security here.
Vittorio Bertocci: Now I'm remembering. I remember the first time I heard about that, it was when there was this big push for native clients to start using the system browser instead of embedded browser. Then now you had a problem that you had the code that was acquired by the system browser, and then had to come back to your app as opposed to be everything in the embedded browser. So now there was the opportunity of a rogue app to inject itself between the system browser and your target app and steal the code. That was the main scenario back in the day.
Daniel Fett: Exactly. If you look at this closely, it's just one other way how this authorization response can leak. This is also really important. Yeah.
Vittorio Bertocci: I see. So now we want to use it everywhere. So there's another way to describe it, I understand why it's useful because you could have this app that gets in the middle, but what about all the other scenarios? Like, I don't know, for example, say that I am a website, a confidential client, and I am using the authorization code flow for making a call to an API from my backend code. How does PKCE help in that scenario?
Daniel Fett: For a long time, the assumption was that if you steal a code that is bound to a confidential client, nobody will be able to use that code because of course if you have a code that is bound to a confidential client, you can't just take the code as an attacker and go to the token endpoint and send the code and get the token, because you would need to authenticate yourself to the token endpoint. So that was the assumption. One of the takeaways of our mix up tech was that this assumption is not correct, because what you can do is if you have this code, you can as an attacker... I'm speaking as the attacker. I have stolen an authorization code for a confidential client from somewhere else. What I can do is a so- called code injection attack. So I go to the same client, assuming that's a website, and I just start a new OAuth flow with the same client and the same authorization server. What I can then do is then... I go through the first steps of the flow until I get to the authorization response, and in the authorization response which now contains the code that is bound to my account at the authorization server, I have exchanged the code. So I take away the code that was issued for my account and I fill in there the code that I stole from my victim. I send the code that I stole in the authorization response to the original client and the original client will now exchange that code for me at the token endpoint.
Vittorio Bertocci: Wow, that's bad.
Daniel Fett: Yeah. So now I have a client that is under my control because it's my session, that uses the authorization code or the access token that was bound to that authorization code, and that I stole from somewhere else. This is the code injection attack.
Vittorio Bertocci: Wow, that's powerful. Of course, PKCE defeats this attack, right?
Daniel Fett: Exactly, and that's the point. With PKCE, the client for each run will invent a new code challenge or a code verifier and will only accept a code that is bound to the same code challenge. That means that if I now try to inject another code, it will not be bound to the correct code challenge.
Vittorio Bertocci: Nice. That's a great, excellent reason for using PKCE and server- side flows as well. That's great. Now I'm going to touch on something that has been very controversial on the IETF mailing list. You already know what I'm referring to, which is if my system is not pure OAuth but I am implementing OpenID Connect and I'm doing an authorization flow and a hybrid flow, anything that deals with codes and similar, and I'm using Nonce, should I change my system to also support PKCE? Or is Nonce enough? What do you think?
Daniel Fett: In fact, Nonce works in a similar way as PKCE so you can actually use Nonce to some extent in place of PKCE. There are some caveats to this. You need to be careful that you have a confidential client actually. For public clients this doesn't work, it's not secure, but if you have a confidential client you actually get a similar level of security. Nonetheless, I would recommend to use PKCE in new deployments at least all the time, because then it's easier to turn to enforce it. As an authorization server you can just check it and PKCE is used all the time. There's also an attack that I discovered where you might be able to trick an authorization server into accepting a code that was bound with PKCE in a flow where it's not expecting PKCE. So you can just exfiltrate a code from one flow and use it in another flow. That only happens if the same authorization server accepts flows or supports flows with and without PKCE. So at the authorization server you really need to make sure that you enforce it all the time.
Vittorio Bertocci: I see. So the idea is PKCE in general gives you better coverage, and if everyone would enforce PKCE all the time, then these kinds of attacks would go away. So for the people that today are already supporting Nonce, they seem to be more or less covered for that part, but for all the new work like the BCP is just going to recommend blanket support for PKCE absolutely everywhere.
Daniel Fett: We are going to support that, yeah. We say that if you want to actually use Nonce, then you really need to make sure that Nonce is actually used by the client and also checked by the client, is an important point because Nonce has to be checked on the client and we have seen that clients are often not as good implemented as authorization servers are, so as an authorization server you should make sure all the ecosystem, you need to make sure that Nonce is actually checked at the client.
Vittorio Bertocci: That's an excellent point and at this point, if you do the Nonce you are also trusting that the client has been implemented in the right way, and as a provider it's hard to ensure that. You kind of need to trust them. So that's an excellent point. There was one last thing. Ah, yes. If we use PKCE, now we can basically stop using the state as the mechanism for protecting ourselves from cross- site scripting. So now basically, I no longer need to put anything unique in the state and I can use the state just for remembering stuff between calls, right?
Daniel Fett: Yeah. That's something that we also have in the security BCP now, because we noticed that if you use PKCE, it really has all the functionality that you get with the state, with one exception. That is error messages. So if you get back an error message from the authorization server, PKCE just doesn't cover it. There's no CSF support for error messages. I don't think that this is a big problem, or at least I couldn't come up with any kind of attack except for maybe disrupting the user flow there, but this is something you should know. Except for the error responses, you get the same coverage for CSF with PKCE, and it's easier to use, I think, because we need to use it anyway. So you don't need to come up with a separate state. State can now be used for essentially anything you like. Of course, you need to keep in mind that state is not integrity protected, so you don't really know whether it's the state you... When you sent the state to the server and you get back another value, you don't know whether it's actually the same state that you sent before or not.
Vittorio Bertocci: I see. So basically if you want it integrity protected, you've got to do it on your own. As in, what you place in there, you've got to sign it in your own way and it's outside of a protocol. Fantastic. This is great. In fact, it's so great that we are running out of time. So let's go on to the last point, which you mentioned was sender constraint. What's the recommendation that BCP does around sender constraint?
Daniel Fett: On a high level, sender constrain your tokens. That is, the access token and depending on how you use it, also the refresh token should be bound to one party that is the client that is supposed to use the token. This is the big picture.
Vittorio Bertocci: I see. When we say sender constraint in practice, what do we mean really?
Daniel Fett: There are two practical ways to do that right now. The one would be to use MTLS, so mutual TLS. That means that the client comes up with a certificate that doesn't need to be from some certificate authority. It can just be assessed on certificate. On the TLS connection to the token endpoint, the client uses that certificate to authenticate itself to the authorization server. Then later on when the client wants to use the access token, so the authorization server binds the access token through the public key of that certificate, and later on when the client wants to use the access token, it has to present the same certificate and of course prove that it owns the private key on the next TLS connection, so to say.
Vittorio Bertocci: Great. So if someone steals the access token but they don't have that certificate, they just cannot use it. So we defeat the usual bearer token vulnerability.
Daniel Fett: Exactly. This also means that if you use MTLS, you really take this proof that you are the rightful owner of the access token or user of the access token, you take it out of the application layer to some extent, which depending on your application might make it harder or easier to use. But I think it's an interesting approach. If you want to do it on the application layer, there's a specification called DPoP, demonstrating proof of procession.
Vittorio Bertocci: Yeah. We had one in entire episode with Brian Campbell all about DPoP, so I'd encourage our listeners that they would go and endure 30 minutes of me chatting with Brian about it.
Daniel Fett: Great.
Vittorio Bertocci: You also had a very important part on the DPoP spec, right?
Daniel Fett: I think I'm the editor of the DPoP spec. Yeah.
Vittorio Bertocci: Yeah. Okay. Doesn't get more important than that. Okay, great. Fantastic. This one is for public clients. For confidential clients we have other tricks for doing sender constraints, right?
Daniel Fett: Yeah. Especially for the refresh token, the refresh token is already bound to one specific sender so the sender has to authenticate itself at the token endpoint. Essentially we had all of this functionality already.
Vittorio Bertocci: So for a refresh token, you don't need to use either MTLS or DPoP for refresh tokens. When you are a confidential client, the fact that you have to use your credentials is enough.
Daniel Fett: Exactly. Yeah. You get the same kind of security there.
Vittorio Bertocci: That's great. So the tricky part of this part of the recommendation is today MTLS and DPoP are not particularly available in public implementations, right?
Daniel Fett: Yeah. Both of them means that you're allowed to use some crypto on the client and so it might be hard or impossible to put this stuff into your TLS layer, depending on your environment. Might be hard to sign something. So another method that you can use is refresh token rotation where once a refresh token was used, the authorization server issues the new refresh token for the client. The nice thing there is that if somebody is able to exfiltrate a refresh token and that somebody uses the refresh token, this means that the same refresh token will be used twice at the authorization. In any case, whether the attacker uses the refresh token first, before the legitimate user or afterwards, doesn't matter, the authorization server sees the same token twice and can then essentially remove all the tokens that have been issued in the whole process. So the access token or the refresh tokens and so on can cancel essentially the whole flow and this makes this somewhat secure.
Vittorio Bertocci: That's really powerful. And that's also the thing that makes it okay to use a refresh token in the browser. I remember when I had less white in my beard that using a refresh token inside of a browser was one of the things that we had to stop people from doing it because it was oh, no, no, no, absolutely blasphemy, too dangerous. But now that instead we can do rotation, we are kind of okay with people doing that, right?
Daniel Fett: Rotation is not really a new thing. We can now do it if you have the authorization code flow and CORS in the browser and so on. It can be tricky. You have the problem that if a request doesn't arrive at the authorization server or you think that request didn't arrive at the authorization server, and then you send the same request again with the same refresh token, you're essentially acting the same way as the attacker would do. I'm not sure how big this problem is in practice. So you might want to look into DPoP.
Vittorio Bertocci: Yeah, that is definitely messy. The other thing is that when you introduce the rotation then there is a new class of errors that you need to deal with. Like, if you correctly redeem your refresh token at N but you fail to receive N plus one, now, next time you try it you will be stuck because you don't have any tokens and all the tokens no longer work. So that's why I guess that this is a opt- in feature for many. Like, if you look at many practical implementations, they don't have it on by default.
Daniel Fett: I would hope that it's essentially DPoP in these libraries more often.
Vittorio Bertocci: So DPoP doesn't suffer from these kinds of shortcomings.
Daniel Fett: Exactly. With DPoP, essentially you'll just sign your requests with the same key all the time. But of course, regenerate. Or, an attacker that would exfiltrate the refresh token or the access token would not be able to do the same.
Vittorio Bertocci: That's great. So basically this last recommendation we are mentioning, which is use sender constraint wherever possible, it's something that will probably come to fruition in practice once more implementations will start to support the necessary components such as DPoP.
Daniel Fett: Yep, exactly. I think the PKCE, and of course the authorization code flow are already widely supported. Sender constraining is a bit more exotic.
Vittorio Bertocci: Yeah. That makes sense. No, that's great. Sometimes people that are not used to the language used in both specs, they look at it and they really want to comply as much as possible, but then they do their Google searches and find out that some parts are not doable. So I think it's important to set expectations. I think that in the last 30 minutes, you did a beautiful job in making the content, those very important recommendations, accessible to everyone. I want to thank you for your time. It was really, really interesting and I hope I'll have opportunities for you to come back on the show and perhaps cover some of the other things that you're working on.
Daniel Fett: Thank you for the invitation. Was a pleasure talking to you.
Vittorio Bertocci: Thank you, and thanks everyone for tuning in. Until next time. Thanks everyone for listening. Subscribe to our podcast on your favorite app or at IdentityUnlocked. com. Until next time, I'm Vittorio Bertocci and this is Identity, Unlocked. Music for this podcast composed and performed by Marcelo Woloski. Identity, Unlocked is powered by Auth0.
DESCRIPTION
On the fourth episode of Identity, Unlocked, host Vittorio Bertocci, principal architect at Auth0 is joined by Daniel Fett, a security specialist at yes.com. Daniel received his PhD from the University of Stuttgart through research on the formal analysis of web protocols. Daniel joins the podcast today to talk about the security BCP document.
A BCP document is a document that describes the best current practices for any given field. Fett is a co-author and has been working on the OAuth 2.0 Security Best Current Practice document. This document gives the update on the best industry practices, but it does not override the core specifications. Instead, this document provides additional information and practices in OAuth. While there are many great recommendations in this document, three of the most important that stand out to Daniel are:
1. Recommendations to not use the implicit grant any longer.
2. If the authorization code grant is used, PKCE should also be used.
3. Use sender constraining for the access tokens when possible, or, at least have rotation for the refresh tokens
The implicit grant is the process where an authorization server creates the access token and sends it to the browser that requested the access token. Although this is beneficial from a usability standpoint, there are several problems from a security perspective. Daniel unpacks several of the security concerns of implicit flow. Next, he and Vittorio discuss using PKCE when using the authorization code grant. PKCE helps to combat a mix-up attack. This type of attack is called a code injection attack, where attackers gain access to a confidential access code. With PKCE, the client for each run will invent a new code challenge or code verifier and will only accept a code that is bound to the same code challenge. This means that, if an attacker tries to inject another code, it will not be bound to the correct code challenge. Finally, Daniel explains the BCP recommendations around sender constraint. He recommends using sender constraint for access tokens whenever possible.
Make sure to subscribe and join us for the next episode where John Bradley discusses WebAuthN and FIDO2.
Music composed and performed by Marcelo Woloski.