Email Validation

Status

Decided

Decision leader

@Andy Dingley

Contributors

 

Date

Sep 1, 2022

Outcome

Degree of email validation, in light of validation levels in place on Oriel

Background

TIS had many email address that were considered to be invalid, in most cases due to additional whitespace but also some records had multiple email addresses concatenated together in a way that was not supported.

These issue had been benign for a long time, until TSS started using the email field for profile lookups. The email must be an exact match or the profile will not be found, this led to a required sanitisation of many email records.

We implemented stricter validation in TIS and TSS to avoid this problem in the future, however since the initial values for the email field come from Oriel we also wanted to understand what validation they perform.

Current State

TIS uses an extremely strict and complex regex based validation.

private static final String REGEX_EMAIL = "^$|[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?";


Oriel uses a much lighter touch validation, required only an @ symbol in the value.

The standards around email address characters and validation of them is extremely complicated and there are no universally agreed validation requirements.

Next Steps

With no widely agreed standards and the difficulty in requesting Oriel’s validation is changed we have instead recognised that our validation may need to be relaxed. Primarly so that we don’t reject potentially valid addresses coming from Oriel, but also so we don’t reject real addresses that our validation just hasn’t accounted for.

This relaxation could come in the form of a simplified validity check, or reducing the validation from hard error to soft warning. However, as we have identified no real world examples of active email addresses which fall outside of our validation we do not feel this change needs to be made immediately. Instead it has been documented here for future reference.