Tutorial :Regex that checks upper or lower case characters with or without accents


How can I make the following regular expression ignore all whitespaces?

$foo = ereg_replace("[^áéíóúÁÉÍÃ"ÚñÃ'a-zA-Z]", "", $_REQUEST["bar"]);  

Input: Ingeniería Eléctrica'*;<42

Current Output: IngenieríaEléctrica

Desired Output: Ingeniería Eléctrica

I tried adding /s \s\s* \s+ /\s+/ /s /t /r among others and they all failed.

Objective: A regex that will accept only strings with upper or lower case characters with or without (spanish) accents.

Thank you !


I see no reason as to why adding \s to that regex would not work. \s should match all whitespace characters.

$foo = preg_replace("/[^áéíóúÁÉÍÃ"ÚñÃ'a-zA-Z\s]/", "", $_REQUEST["bar"]);  


I believe this should work

$foo = ereg_replace("[^áéíóúÁÉÍÃ"ÚñÃ'a-zA-Z ]", "", $_REQUEST["bar"]);  


ereg_replace uses POSIX Extended Regular Expressions and there, POSIX bracket expressions are used.

Now the important thing to know is that inside bracket expressions, \ is not a meta-character and therefore \s won't work.

But you can use the POSIX character class [:space:] inside the POSIX bracket expression to achieve the same effect:

$foo = ereg_replace("[^áéíóúÁÉÍÃ"ÚñÃ'a-zA-Z[:space:]]", "", $_REQUEST["bar"]);  

You see, it is different from the, I think, better known Perl syntax and as the POSIX regular expression functions are deprecated in PHP 5.3 you really should go with the Perl compatible ones.


All the answers so far fail to point out that your method to match the accentuated characters is a hack and it's incomplete â€" for instance, no grave accents are matched.

The best way is to use the mbstring extension:

mb_regex_encoding("UTF-8"); //or whatever encoding you're using  var_dump(mb_ereg_replace("[^\\w\\s]|[0-9]", "", "Ingeniería Eléctrica'*;<42", "z"));  


  string(22) "Ingeniería Eléctrica"

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »