Tutorial :Split/tokenize/scan a string being aware of quotation marks


Is there a default/easy way in Java for split strings, but taking care of quotation marks or other symbols?

For example, given this text:

There's "a man" that live next door 'in my neighborhood', "and he gets me down..."  


There's  a man  that  live  next  door  in my neighborhood  and he gets me down  


Something like this works for your input:

    String text = "There's \"a man\" that live next door "          + "'in my neighborhood', \"and he gets me down...\"";        Scanner sc = new Scanner(text);      Pattern pattern = Pattern.compile(          "\"[^\"]*\"" +          "|'[^']*'" +          "|[A-Za-z']+"      );      String token;      while ((token = sc.findInLine(pattern)) != null) {          System.out.println("[" + token + "]");      }  

The above prints (as seen on ideone.com):

[There's]  ["a man"]  [that]  [live]  [next]  [door]  ['in my neighborhood']  ["and he gets me down..."]  

It uses Scanner.findInLine, where the regex pattern is one of:

"[^"]*"      # double quoted token  '[^']*'      # single quoted token  [A-Za-z']+   # everything else  

No doubt this doesn't work 100% always; cases where quotes can be nested etc will be tricky.



Doubtful based on your logic, you have differentiation between an apostrophe and single quotes, i.e. There's and in my neighborhood

You'd have to develop some kind of pairing logic if you wanted what you have above. I'm thinking regular expressions. Or some kind of two part parse.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »