php - Validate Australian DVA numbers (which may have variable length)

I'm doing validation on Australian DVA numbers, the rules are:

  1. String length should be 8 or 9
  2. First char should be N, V, Q, W, S or T
  3. The next part should be letters or space and can have up to 3 characters
  4. Next part should be number and can have up to 6 number
  5. If the string length is 9 then last char is a letter, if 8 then it must be a number // This is the tricky part

Here is my current attempt and it's working fine:

if (strlen($value) == 9 && preg_match("/^[NVQWST][A-Z\s]{1,3}[0-9]{1,6}[A-Z]$/", $value)) {
    return true;
}
if (strlen($value) == 8 && preg_match("/^[NVQWST][A-Z\s]{1,3}[0-9]{1,6}$/", $value)) {
    return true;
}
return false;

My question: Is there any way that I can combine these conditions in 1 regex check?

Answer

Solution:

You can use

^(?=.{8,9}$)[NVQWST][A-Z\s]{1,3}[0-9]{1,6}(?:(?<=^.{8})[A-Z])?$

See the regex demo.

Details

  • ^ - start of a string
  • (?=.{8,9}$) - the string should contain 8 or 9 chars (other than line break chars, but the pattern won't match them)
  • [NVQWST] - N, V, Q, W, S or T
  • [A-Z\s]{1,3} - one, two or three uppercase letters or whitespace
  • [0-9]{1,6} - one to six digits
  • (?:(?<=^.{8})[A-Z])? - an optional occurrence of an uppercase ASCII letter if it is the ninth character in a string
  • $ - end of string.

Answer

Solution:

Based on rules and details pulled from these links:

I've crafted a comprehensive and strict regex to validate Australian DVA numbers.

$regex = <<<REGEX
/
^
([NVQWST])
(?|
  ([ ANPVX])(\d{1,6})

  |(
     BG
    |CN
    |ET
    |F[RW]
    |G[RW]
    |I[QTV]
    |JA
    |K[MO]
    |MO
    |N[FGKX]
    |P[KOX]
    |R[DMU]
    |S[AELMORS]
    |U[BS]
    |YU
  )(\d{1,5})

  |(
    (?:A(?:FG|GX|LX|R[GX])
      |B(?:A[GL]|CG|G[GKX]|RX|U[GRX])
      |C(?:AM|CG|HX|IX|LK|N[KSX]|ON|YP|Z[GX])
      |D(?:EG|N[KX])
      |E(?:G[GXY]|SX|T[KX])
      |F(?:I[JX]|R[GKX])
      |G(?:HA|R[EGKX])
      |H(?:K[SX]|L[GKX]|UX)
      |I(?:DA|ND|SR|T[GKX])
      |K(?:OS|SH|UG|YA)
      |L(?:AX|BX|XK)
      |M(?:A[LRU]|LS|OG|TX|WI)
      |N(?:BA|CG|GR|IG|RD|S[MSW]|W[GKX])
      |OMG
      |P(?:A[DGLMX]|C[AGRV]|H[KSX]|L[GX]|MS|S[MW]|WO)
      |QAG
      |R(?:DX|U[GX])
      |S(?:A[GX]|CG|EG|IN|PG|UD|W[KP]|Y[GRX])
      |T(?:H[KS]|R[GK]|ZA)
      |U(?:AG|RX|S[GKSX])
      |V(?:EX|NS)
      |Y(?:EM|GX)
      |ZIM
    )
  )(\d{1,4})
)
([A-Z]?)
$
/x
REGEX;

The first character signifies the state/territory.

  • N = New South Wales (includes Austalian Capital Territory)
  • V = Victoria
  • Q = Queensland
  • W = Western Australia
  • S = South Australia (includes Northern Territory)
  • T = Tasmania

My pattern intentionally uses "branch reset" capture groups so that the match array can be easily used to pad the inner "file number" with leading digits when desired.

Here is a demo with sample DVA strings, a .

If your application requires the DVAs to be zero padded to 8 or 9 characters, then this is a tighter pattern to enforce that.

Yes, I did this all on my phone.
No, I didn't type it all out maually.
I scraped the one webpage and used regex to format the content into array syntax for my lookup array.
Then I compacted the war code abbreviations into groups and character classes.

Source