Tutorial :Should I avoid regular expressions? [closed]



Question:

Someone I know has been telling me that RegEx should be avoided, as it is heavyweight or involves heavy processing. Is this true? This made a clap in my ears, ringing my eardrums up until now.

I don't know why he told me that. Could it have been from experience or merely 3rd-hand information (you know what I mean...)?

So, stated plainly, why should I avoid regular expressions?

I want information from the masters in the SO community to share their ideas with me. Thanks guys!


Solution:1

Don't avoid them. They're an excellent tool, and when used appropriately can save you a lot of time and effort. Moreover, a good implementation used carefully should not be particularly CPU-intensive.


Solution:2

If you can easily do the same thing with common string operations, then you should avoid using a regular expression.

In most situations regular expressions are used where the same operation would require a substantial amount of common string operations, then there is of course no point in avoiding regular expressions.


Solution:3

Overhyped? No. They're extremely powerful and flexible.

Overused? Absolutely. Particularly when it comes to parsing HTML (which frequently comes up here).

This is another of those "right tool for the job" scenarios. Some go too far and try to use it for everything.

You are right though in that you can do many things with substring and/or split. You will often reach a point with those where what you're doing will get so complicated that you have to change method or you just end up writing too much fragile code. Regexes are (relatively) easy to expand.

But hand written code will nearly always be faster. A good example of this is Putting char into a java string for each N characters. The regex solution is terser but has some issues that a hand written loop doesn't and is much slower.


Solution:4

You can substitute "regex" in your question with pretty much any technology and you'll find people who poorly understand the technology or too lazy to learn the technology making such claims.

There is nothing heavy-weight about regular expressions. The most common way that programmers get themselves into trouble using regular expressions is that they try to do too much with a single regular expression. If you use regular expressions for what they're intended (simple pattern matching), you'll be hard-pressed to write procedural code that's more efficient than the equivalent regular expression. Given decent proficiency with regular expressions, the regular expression takes much less time to write, is easier to read, and can be pasted into tools such as RegexBuddy for visualization.


Solution:5

As a basic parser or validator, use a regular expression unless the parsing or validation code you would otherwise write would be easier to read.

For complex parsers (i.e. recursive descent parsers) use regex only to validate lexical elements, not to find them.

The bottom line is, the best regex engines are well tuned for validation work, and in some cases may be more efficient than the code you yourself could write, and in others your code would perform better. Write your code using handwritten state machines or regex as you see fit, but change from regex to handwritten code if performance tests show you that regex is significantly inefficient.


Solution:6

"When you have a hammer, everything looks like a nail."

Regular expressions are a very useful tool; but I agree that they're not necessary for every single place they're used. One positive factor to them is that because they tend to be complex and very heavily used where they are, the algorithms to apply regular expressions tend to be rather well optimized. That said, the overhead involved in learning the regular expressions can be... high. Very high.

Are regular expressions the best tool to use for every applicable situation? Probably not, but on the other hand, if you work with string validation and search all the time, you probably use regular expressions a lot; and once you do, you already have the knowledge necessary to use the tool probably more efficiently and quickly than any other tool. But if you don't have that experience, learning it is effectively a drag on your productivity for that implementation. So I think it depends on the amount of time you're willing to put into learning a new paradigm, and the level of rush involved in your project. Overall, I think regular expressions are very worth learning, but at the same time, that learning process can, frankly, suck.


Solution:7

I think that if you learn programming in language that speaks regular expressions natively you'll gravitate toward them because they just solve so many problems. IE, you may never learn to use split because regexec() can solve a wider set of problems and once you get used to it, why look anywhere else?

On the other hand, I bet C and C++ programmers will for the most part look at other options first, since it's not built into the language.


Solution:8

You know, given the fact that I'm what many people call "young", I've heard too much criticism about RegEx. You know, "he had a problem and tried to use regex, now he has two problems".

Seriously, I don't get it. It is a tool like any other. If you need a simple website with some text, you don't need PHP/ASP.NET/STG44. Still, no discussion on whether any of those should be avoided. How odd.

In my experience, RegEx is probably the most useful tool I've ever encountered as a developer. It's the most useful tool when it comes to #1 security issue: parsing user input. I has saved me hours if not days of coding and creating potentially buggy (read: crappy) code.

With modern CPUs, I don't see what's the performance issue here. I'm quite willing to sacrifice some cycles for some quality and security. (It's not always the case, though, but I think those cases are rare.)

Still, RegEx is very powerful. With great power, comes great responsibility. It doesn't mean you'll use it whenever you can. Only where it's power is worth using.

As someone mentioned above, HTML parsing with RegEx is like a Russian roulette with a fully loaded gun. Don't overdo anything, RegEx included.


Solution:9

Overhyped? No

Under-Utilized Properly? Yes


Solution:10

You should also avoid floating-point numbers at all cost. That is when you're programming in an embedded-environment.

Seriously: if you're in normal software-development you should actually use regex if you need to do something that can't be achieved with simpler string-operations. I'd say that any normal programmer won't be able to implement something that's best done using regexps in a way that is faster than the correspondig regular expression. Once compiled, a regular expression works as a state-maschine that is optimized to near perfection.


Solution:11

If more people knew how to use a decent parser generator, there would be fewer people using regular expressions.


Solution:12

In my belief, they are overused by people quite a bit (I've had this discussion a number of times on SO).

But they are a very useful construct because they deliver a lot of expressive power in a very small piece of code.

You only have to look at an example such as a Western Australian car registration number. The RE would be

re.match("[1-9] [A-Z]{3} [0-9]{3}")  

whilst the code to check this would be substantially longer, in either a simple 9-if-statement or slightly better looping version.

I hardly ever use complex REs in my code because:

  • I know how the RE engines work and I can use domain knowledge to code up faster solutions (that 9-if variant would almost certainly be faster than a one-shot RE compile/execute cycle); and
  • I find code more readable if it's broken up logically and commented. This isn't easy with most REs (although I have seen one that allows inline comments).

I have seen people suggest the use of REs for extracting a fixed-size substring at a fixed location. Why these people don't just use substring() is beyond me. My personal thought is that they're just trying to show how clever they are (but it rarely works).


Solution:13

Don't avoid it, but ask youself if they're the best tool for the task you have to solve. Maybe sometimes regex are difficult to use or debug, but they're really usefull in some situations. The question is to use the apropiate tool for each task, and usually this is not obvious.


Solution:14

There is a very good reason to use regular expressions in scripting languages (such as Ruby, Python, Perl, JavaScript and Lua): parsing a string with carefully optimized regular expression executes faster than the equivalent custom while loop which scans the string character-by-character. For compiled languages (such as C and C++, and also C# and Java most of the time) usually the opposite is true: the custom while loop executes faster.

One more reason why regular expressions are so popular: they express the programmer's intention in an extremely compact way: a single-line regexp can do as much as a 10- or 20-line while loop.


Solution:15

Overhyped? No, if you have ever taken a Parsing or Compiler course, you would understand that this is like saying addition and multiplication is overhyped for math problems.

It is a system for solving parsing problems.

some problems are simpler and don't require regular expressions, some are harder and require better tools.


Solution:16

I've seen so many people argue about whether a given regex is correct or not that I'm starting to think the best way to write one is to ask how to do it on StackOverflow and then let the regex gurus fight it out.


I think they're especially useful in JavaScript. JavaScript is transmitted (so should be small) and interpreted from text (although this is changing in the new browsers with V8 and JIT compilation), so a good internal regex engine has a chance to be faster than an algorithm.

I'd say if there is a clear and easy way to do it with string operations, use the string operations. But if you can do a nice regex instead of writing your own state machine interpreter, use the regex.


Solution:17

Regular Expressions are one of the most useful things programmers can learn, they allow to speed up and minimize your code if you know how to handle them.


Solution:18

Regular expressions are often easier to understand than the non-regex equivalent, especially in a language with native regular expressions, especially in a code section where other things that need to be done with regexes are present.

That doesn't meant they're not overused. The only time string.match(/\?/) is better than string.contains('?') is if it's significantly more readable with the surrounding code, or if you know that .contains is implemented with regexes anyway


Solution:19

I often use regex in my IDE to quick fix code. Try to do the following without regex.

glVector3f( -1.0f, 1.0f, 1.0f ); -> glVector3f( center.x - 1.0f, center.y + 1.0f, center.z + 1.0f );

Without regex, it's a pain, but WITH regex...

s/glVector3f\((.*?),(.*?),(.*?)\)/glVector3f(point.x+$1,point.y+$2,point.z+$3)/g  

Awesome.


Solution:20

I'd agree that regular expressions are sometimes used inappropriately. Certainly for very simple cases like what you're describing, but also for cases where a more powerful parser is needed.

One consideration is that sometimes you have a condition that needs to do something simple like test for presence of a question mark character. But it's often true that the condition becomes more complex. For example, to find a question mark character that isn't preceded by a space or beginning-of-line, and isn't followed by an alphanumeric character. Or the character may be either a question mark or the Spanish "¿" (which may appear at the start of a word). You get the idea.

If conditions are expected to evolve into something that's less simple to do with a plain call to String.contains("?"), then it could be easier to code it using a very simple regular expression from the start.


Solution:21

It comes down to the right tool for the job.

I usually hear two arguments against regular expressions: 1) They're computationally inefficient, and 2) They're hard to understand.

Honestly, I can't understand how either are legitimate claims.

1) This may be true in an academic sense. A complex expression can double back on itself may times over. Does it really matter though? How many millions of computations a second can a server processor do these days? I've dealt with some crazy expressions, and I've never seen a regexp be the bottle neck. By far it's DB interaction, followed by bandwidth.

2) Hard for about a week. The most complicated regexp is no more complex than HTML - it's just a familiarity problem. If you needed HTML once every 3 months, would you get it 100% each time? Work with them on a daily basis and they're just as clear as any other language syntax.

I write validation software. REGEXP's are second nature. Every fifth line of code has a regexp, and for the life of me I can't understand why people make a big deal about them. I've never seen a regexp slow down processing, and I've seen even the most dull 'programmers' pick up the syntax.

Regexp's are powerful, efficient, and useful. Why avoid them?


Solution:22

I wouldn't say avoid them entirely, as they are QUITE handy at times. However, it is important to realize the fundamental mechanisms underneath. Depending on your implementation, you could have up to exponential run-time for a search, but since searches are usually bounded by some constant number of backtraces, you can end up with the slowest linear run-time you ever saw.

If you want the best answer, you'll have to examine your particular implementation as well as the data you intend to search on.

From memory, wikipedia has a decent article on regular expressions and the underlying algorithms.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »