Tutorial :get attribute value from html code in java



Question:

i have HTML string value and i want to get one attribute(id) value from that html String value can u help me how to do it??

String msHTMLFile = "<ABBR class='HighlightClass' id='highlight40001' style=\"BACKGROUND-COLOR: yellow\" >Fetal/Neonatal Morbidity and Mortality</ABBR>";  

result should come - highlight40001;


Solution:1

Try using this regular expression pattern:

\bid='([^']*)'  

And then extract the string captured by group 1. This is not foolproof; using regex to parse HTML never is. You can try to complicate the regex to make it more flexible. Or you can just use a HTML parser. I recommend the latter.


Solution:2

Also not so clean, but this should work for you. You can treat it as xml and parse it using JAXB:

ABBR.java:

import javax.xml.bind.annotation.XmlAttribute;    public class ABBR  {      @XmlAttribute public String id;  }  

Main.java:

[..]  String msHTMLFile = "<ABBR class='HighlightClass' id='highlight40001' style=\"BACKGROUND-COLOR: yellow\" >Fetal/Neonatal Morbidity and Mortality</ABBR>";  ABBR obj = JAXB.unmarshal(new StringReader(msHTMLFile), ABBR.class);  System.out.println(obj.id);  [..]  


Solution:3

If you're lucky and your HTML source produces XML-compliant HTML, JAXB or other XML parsers will do fine with it. A lot of people aren't writing particularly well-formed HTML (unclosed tags, etc), though some of my coworkers have gotten good results parsing HTML with HotSAX: http://sourceforge.net/projects/hotsax/


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »