Tutorial :Is encapsulating Strings as byte[] in order to save memory overkill ? (Java)


Was recently reviewing some Java Swing code and saw this:

byte[] fooReference;    String getFoo() {     returns new String(fooReference);   }    void setFoo(String foo) {    this.fooReference = foo.getBytes();  }  

The above can be useful to save on your memory foot print or so I'm told.

Is this overkill is anyone else encapsulating their Strings in this way?


That's a really, really bad idea. Don't use the platform default encoding. There's nothing to say that if you call setFoo and then getFoo that you'll get back the same data.

If you must do something like this, then use UTF-8 which can represent the whole of Unicode for certain... but I really wouldn't do it. It potentially saves some memory, but at the cost of performing conversions unnecessarily for most of the time - and being error-prone, in terms of failing to use an appropriate encoding.

I dare say there are some applications where this would be appropriate, but for 99.99% of them, it's a terrible idea.


This is not really useful:
1. You are copying the string every time getFoo or setFoo are called, therefore increasing both CPU and memory usage
2. It's obscure


A little historical excursion...

Using byte arrays instead of String objects actually used to have some considerable advantages in the early days of Java (1.0/1.1) if you could be sure that you would never need anything outside of ISO-8859-1. With the VMs of that time it was more than 10 times faster to use drawBytes() compared to drawString() and it actually does save memory which was still very scarce at that time and applets used to have a hard coded memory barrier of 32 and later 64 MB anyway. Not only is a byte[] smaller than the embedded char[] of String objects but you could also save the comparatively heavy String object itself which did make quite a difference if you had lots of short strings. Besides that accessing a plain byte array is also faster than using the accessor methods of String with all their extra bounds checks.

But since drawBytes ceased to be any faster in Java 1.2 and since current JITs are much better than the Symantec JIT of that time the remaining minimal performance advantage of byte[] arrays over strings is no longer worth the hassle. The memory advantage is still there and it might thus still be an option in some very rare extreme scenarios but nowadays it's nothing that should be considered if it's not really necessary.


It may well be overkill, and it may even consume more memory, since you now have two copies of the string. How long the actual string lives depends upon the client, but as with many such hacks, it smells a lot like premature optimization.


If you anticipate that you'll have a lot of identical strings, another much better way you can save memory is with the String.intern() method.


Each call to getFoo() is instantiating a new String. How is this saving memory? If anything you're adding additional overhead for your garbage collector to go and clean up these new instances when these new references become unreferenced


This does indeed not make any sense. If it were a compile time constant which you don't need to massage back to a String, then it would make a bit more sense. You still have the character encoding problem.

It would make more sense to me if it were a char[] constant. In real world there are several JSP compilers which optimizes String constants away into a char[] which in turn can easily be written to a Writer#write(char[]). This is finally "slightly" more efficient, but those little bits counts a lot in large and heavily used applications like Google Search and so on.

Tomcat's JSP compiler Jasper does this as well. Check the genStringAsCharArray setting. It does then like so

static final char[] text1 = "some static text".toCharArray();  

instead of

static final String text1 = "some static text";  

which ends up with less overhead. It doesn't need a whole String instance around those characters.


If, after profiling your code, you find that memory usage for strings is a problem, you're much better off using a general string compressor and storing compressed strings, rather than trying to use UTF-8 strings for the minor reduction in space they give you. With English language strings, you can generally compress them to 1-2 bits per character; most other languages are probably similar. Getting to <1 bit per character is hard, but possible if you have a lot of data.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »