Tutorial :Is something wrong with my regex?



Question:

I made an XML Schema and I have this in it.

<xs:element name="Email">          <xs:simpleType>            <xs:restriction base="xs:string">              <xs:pattern value="\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"/>            </xs:restriction>          </xs:simpleType>        </xs:element>  

Some of my emails in one of my XML documents fail and I get this error

Email' element is invalid - The value 'Some_Name@hotmail.com' is invalid according to its datatype 'String' - The Pattern constraint failed. LineNumber: 15404 LinePostion: 32

So just looking at all the emails that passed and the ones that failed I noticed that all the ones that failed have an "_(underscore)". So I am unsure if this is the reason or not.

Edit

So I changed my regex to this

 <xs:pattern value="[\w_]+([-+.'][\w_]+)*@[\w_]+([-.][\w_]+)*\.[\w_]+([-.][\w_]+)*"/>  

It now works but don't understand why \w is not capturing it.


Solution:1

The W3C Recommendation on datatypes defines \w as:

[#X0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters)*

The underscore character definition in Unicode is 'LOW LINE' (U+005F), category: punctuation, connector [Pc]

so XML Schema handles character classes more in accordance with Unicode definitions.

But for e-mail regexp, you shold use strict ASCII, like [0-9A-Za-z_-] intead of \w (I bet email address with nonlatin characters is invalid :) ), yet better is to find a proven regexp syntax, or look into RFC, what is the proper e-mail format


Solution:2

Something is weird because \w typically accepts underscores. Try to add _ to the \w that you would be expecting the _ in, by changing them to [\w_].


Solution:3

Could very well be, because your regex wont recognize an email w/ an underscore.

Check out this topic: Using a regular expression to validate an email address

It's one I have bookmarked for how useful it is.


Solution:4

Yes. You do not match the underscore character. Just try to add it...

\w+([-+.'_]\w+)*...  


Solution:5

Something is in fact strange; since the \w character class includes underscores, as we can see with Rubular, the email you have should validate. Is it possible there's another problemâ€"a stray space, for instance? However, the other problem with this is that there is no regular expression which correctly accepts all email addresses and nothing else; this Stack Overflow question has a good answer. There may be a better way to deal with validating email addresses than this schema/regex.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »