Intro
A user of a web form noticed any password that includes an accented character is rejected. He came to use as the operator of the web application firewall for a fix.
More details
The web server was behind an F5 device running ASM – application security manager. The reported error that we saw was Failed to convert character. What does it all mean?
One suggestion is that the policy may have the wrong language, but the application language of this policy is unicode (utf-8), just like all our others we set up. And they don’t have any issues. I see where I can remove the block on this particular input violation, but that seems kind of an extreme measure, like throwing out the baby with the bathwater.
I wondered about a more granular way to deal with this?
Check characters on this parameter value is already disabled I notice, so we can’t further loosen there.
Ask the expert
So I ask someone who speaks a foreign language and has to deal with this stuff a lot more than I do. He responds:
Looking at the website I think that form just defaults to ISO-8859-1 instead of UTF-8 and that causes your problem.
Umlauts or accented letters are double byte encoded in UTF-8 and single byte in ISO-8859-1
To confirm the problem with the form, he enters an “ä” as the username, which the event log shows encoded to %E4 which is not a valid UTF-8 sequence.
Our takeaway
To repeat a key learning from this little problem:
Umlauts or accented letters are double byte encoded in UTF-8 and single byte in ISO-8859-1
So the web form itself was the problem in this case; and I went back to the user/developer with this informatoin.
So he fixed it?
Well, turns out his submission form was a private page he quickly threw together to test another problem, the real problem, when he noticed this particular issue.
So, yes, his form needed to mention utf-8 if he were going to properly encode accented characters, but that did not resolve the real issue, which remains unresolved.
It happens that way sometimes.
But, yes, the problem reported to us was resolved by the developer based on our feedback, so at least we have that success.
Conclusion
If like me, your eyes glaze over when someone mentions ISO-8859-1 versus UTF-8, the differences are pretty stark, easy-to-understand, and, just sometimes, really, important! I think ISO-8859-1 will represent some of the popular accented characters in positions 128 – 255, but not utf-8. utf-8 will use additional bytes to represent characters outside of the Latin alphabet plus the usual special characters.
We’ll call this one Case Closed!
References and related
I like to do a man ascii on any linux system to see the representation of the various Latin characters. I had to install the man-pages package on my RHEL system before that man page was available on my system.