Tutorial :Java: I've created a list of word objects to include the name and the frequency, but having trouble updating the frequency



Question:

I'm working on a project which has a dictionary of words and I'm extracting them and adding them to an ArrayList as word objects. I have a class called Word as below.

What I'm wondering is how do I access these word objects to update the frequency? As part of this project, I need to only have one unique word, and increase the frequency of that word by the number of occurrences in the dictionary.

Word(String word)  {    this.word = word;    this.freq = 0;  }    public String getWord() {       return word;  }    public int getFreq() {     return freq;  }    public void setFreq() {     freq = freq + 1;  }  

This is how I am adding the word objects to the ArrayList...I think it's ok?

String pattern = "[^a-zA-Z\\s]";          String strippedString = line.replaceAll(pattern, "");          line = strippedString.toLowerCase();          StringTokenizer st = new StringTokenizer(line);          while (st.hasMoreTokens())          {              String newWord = st.nextToken();              word.add(new Word(newWord));              count++;          }  


Solution:1

Instead of an ArrayList use a Bag. This keeps the counts for you.


Solution:2

Use a map to store the words and the Word Object. Ideally a hashset is enough to do this. But internally a hashset is going to use a HashMap anyway. The following piece of code will also be useful for you to increase the frequency of the words that you had already inserted.

Map<String, Word> wordsMap = new HashMap<String, Word>();    String pattern = "[^a-zA-Z\\s]";  String strippedString = line.replaceAll(pattern, "");  line = strippedString.toLowerCase();  StringTokenizer st = new StringTokenizer(line);  while (st.hasMoreTokens())  {      String newWord = st.nextToken();      if(!wordsMap.containsKey(newWord)){          wordsMap.put(newWord, new Word(newWord));      }else{          Word existingWord = wordsMap.get(newWord);          existingWord.setFreq();      }        count++;  }  


Solution:3

I would solve the problem with the following code:

import java.util.ArrayList;  import java.util.Collections;  import java.util.Comparator;  import java.util.HashMap;  import java.util.List;  import java.util.Map;    public class Word {      private final String word;    private int frequency;      public Word(String word) {      this.word = word;      this.frequency = 0;    }      public String getWord() {      return word;    }      public int getFrequency() {      return frequency;    }      public void increaseFrequency() {      frequency++;    }  

I didn't call this method setFrequency because it is not a real setter method. For a real setter method, you would pass it exactly one parameter.

  public static List<Word> histogram(String sentence) {  

First, compute the frequency of the individual words.

    String[] words = sentence.split("\\W+");      Map<String, Word> histo = new HashMap<String, Word>();      for (String word : words) {        Word w = histo.get(word);        if (w == null) {          w = new Word(word);          histo.put(word, w);        }        w.increaseFrequency();      }  

Then, sort the words such that words with higher frequency appear first. If the frequency is the same, the words are sorted almost alphabetically.

    List<Word> ordered = new ArrayList<Word>(histo.values());      Collections.sort(ordered, new Comparator<Word>() {        public int compare(Word a, Word b) {          int fa = a.getFrequency();          int fb = b.getFrequency();          if (fa < fb)            return 1;          if (fa > fb)            return -1;          return a.getWord().compareTo(b.getWord());        }      });        return ordered;    }  

Finally, test the code with a simple example.

  public static void main(String[] args) {      List<Word> freq = histogram("a brown cat eats a white cat.");      for (Word word : freq) {        System.out.printf("%4d %s\n", word.getFrequency(), word.getWord());      }    }  }  


Solution:4

You can use a google collections' Multiset of String instead of the Word class


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »