Background
TIS had many email address that were considered to be invalid, in most cases due to additional whitespace but also some records had multiple email addresses concatenated together in a way that was not supported.
These issue had been benign for a long time, until TSS started using the email field for profile lookups. The email must be an exact match or the profile will not be found, this led to a required sanitisation of many email records.
We implemented stricter validation in TIS and TSS to avoid this problem in the future, however since the initial values for the email field come from Oriel we also wanted to understand what validation they perform.
Current State
TIS uses an extremely strict and complex regex based validation.
private static final String REGEX_EMAIL = "^$|[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?";
Oriel uses a much lighter touch validation, required only an @
symbol in the value.
The standards around email address characters and validation of them is extremely complicated and there are no universally agreed validation requirements.
Next Steps
With no widely agreed standards and the difficulty in requesting Oriel’s validation is changed we have instead recognised that our validation may need to be relaxed. Primarly so that we don’t reject potentially valid addresses coming from Oriel, but also so we don’t reject real addresses that our validation just hasn’t accounted for.
This relaxation could come in the form of a simplified validity check, or reducing the validation from hard error to soft warning. However, as we have identified no real world examples that fall outside of our validation we do not feel this change needs to be made immediately. Instead it has been documented here for future reference.
Add Comment