Tutorial :Programmatic way to place a website into a new Word file… in Java


Is it possible to programmatically place the contents of a web page into a Word file?

To further complicate this, I'd like to do these steps in Java (using JNI if I must).

Here are the steps I want to do programmatically, followed by ways that I would do this manually today:

  1. Provide a method with a URL (Manually: Open page in Firefox)
  2. Copy the contents of that URL (Manually: Ctrl-A to select all)
  3. Create a new Word document (Manually: Open Microsoft Word)
  4. Paste the contents of the URL into Word (Manually: Ctrl-V to paste)
  5. Save the Word file (Manually: Save the Word file)


you could do better imho downloading the file using HTTP then create a new word file using Apache POI and copying the HTTP stream inside the word file


HTMLUnit can be used to programmatically open the page (posing as Firefox if necessary), and Apache POI can be used to create a Microsoft Word file (in Word 97 format).


This article describes a way to manipulate MS-Word doc files from within Java, just using string replace, or XSLT.

As for grabbing the content of a URL, that is the simpler part of the task, which you can accomplish with something pretty simple.

import java.net.URL;  import java.net.URLConnection;  import java.io.InputStreamReader;  import java.io.BufferedReader;      public class util  {      public String HttpGet(String urlString)    {      String resultData= null;      try      {        URL url = new URL(urlString);        URLConnection conn = url.openConnection();        conn.connect();          BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));        String line = null;        java.lang.StringBuffer sb1= new java.lang.StringBuffer();        while ( (line = br.readLine()) != null)          sb1.append(line);          resultData= sb.toString();        mStatus= "gotprice";        }       catch (java.lang.Throwable e)      {        e.printStackTrace();      }      return resultData;    }      }  

