Tutorial :regex to match EOF



Question:

I have some data that look like this

john, dave, chris  rick, sam, bob  joe, milt, paul  

I'm using this regex to match the names

/(\w.+?)(\r\n|\n|,)/  

which works for the most part but the file ends abruptly after the last word meaning the last value doesn't end in \r\n, \n or , it ends with EOF. Is there a way to match EOF in regex so I can put it right in that second grouping?


Solution:1

The answer to this question is \Z took me awhile to figure it out, but it works now. Note that conversely, \A matches beginning of the whole string (as opposed to ^ and $ matching the beginning of one line).


Solution:2

EOF is not actually a character. If you have a multi-line string, then '$' will match the end of the string as well as the end of a line.

In Perl and its brethren, \A and \Z match the beginning and end of the string, totally ignoring line-breaks.

GNU extensions to POSIX regexes use \` and \' for the same things.


Solution:3

In Visual Studio, you can find EOF like so: $(?![\r\n]). This works whether your line endings are CR, CRLF, or just LF.

As a bonus, you can ensure all your code files have a final newline marker like so:

               Find What: (?<![\r\n])$(?![\r\n])              Replace With: \r\n   Use Regular Expressions: checked  Look at these file types: *.cs, *.cshtml, *.js  

How this works:

Find any line end (a zero-width match) that is not preceded by CR or LF, and is also not followed by CR or LF. Some thought will show you why this works!

Note that you should Replace With your desired line-ending character, be it CR, LF, or CRLF.


Solution:4

Contrast the behavior of Ryan's suggested \Z with \z:

  $ perl -we 'my $corpus = "hello\n"; $corpus =~ s/\Z/world/g; print(":$corpus:\n")'  :helloworld  world:  $ perl -we 'my $corpus = "hello\n"; $corpus =~ s/\z/world/g; print(":$corpus:\n")'  :hello  world:  $   

perlre sez:

  \Z  Match only at end of string, or before newline at the end  \z  Match only at end of string  

A translation of the test case into Ruby (1.8.7, 1.9.2) behaves the same.


Solution:5

Do you really have to capture the line separators? If not, this regex should be all you need:

/\w+/  

That's assuming all the substrings you want to match consist entirely of word characters, like in your example.


Solution:6

Maybe try $ (EOL/EOF) instead of (\r\n|\n)?

/\"(.+?)\".+?(\w.+?)$/  


Solution:7

Assuming you are using proper modifier forcing to treat string as a whole (not line-by-line - and if \n works for you, you are using it), just add another alternative - end of string: (\r\n|\n|,|$)


Solution:8

Recently I was looking for something like this, but for JavaScript.

Putting this here, so that anyone with the same issue can benefit

var matchEndOfInput = /$(?![\r\n])/gm;  

Basically this would match the end of the line, which is not followed by carriage return or new line characters. In essence this is the same as \Z but for JavaScript.


Solution:9

/(\w.+?)(\r\n|\n|,|$)/


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »