JSON encoding issue

Hi,

I have an issue regarding the use of the org.json.me#json Java library and Java String containing chinese characters. The following piece of code build a JSON Java String with chinese characters. Everything seems to be ok when I print the characters array. But, if I try to store this JSON String to a file and then read it again, the result is not equals to the original string (the 2 characters array are not the same). Can you help me to understand ?

Best regards,

Jean Morin

Source code:

package jsontest;

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import org.json.me.JSONException;
import org.json.me.JSONObject;

public class JSONTest {

	public static void main(String[] args) throws FileNotFoundException, IOException, JSONException {
		// create a simple JSON String using chinese characters
		String jsonString = createSimpleJSONString("message", "MyMessage_雷斯考");
		System.out.println("JSON characters (Original String):");
		printStringCharArray(jsonString);

		// write JSON message with chinese characters in a file
		File f = createNewFile("test.txt"); //$NON-NLS-1$
		writeInFile(f, jsonString.getBytes());

		// read JSON message previously written in the file
		byte[] readJsonStringBytes = readBytesFromFile(f);
		String readJsonString = new String(readJsonStringBytes);
		System.out.println();
		System.out.println("JSON characters (String read from file):");
		printStringCharArray(readJsonString);

		System.out.println();
		if (readJsonString.equals(jsonString)) {
			System.out.println("Strings are equals");
		} else {
			System.out.println("Strings are not equals");
		}
	}

	private static String createSimpleJSONString(String key, String value) throws JSONException {
		JSONObject jsonObj = new JSONObject();
		jsonObj.put(key, value);
		return jsonObj.toString();
	}

	private static void writeInFile(File f, byte[] data) throws FileNotFoundException, IOException {
		try (FileOutputStream fos = new FileOutputStream(f)) {
			fos.write(data);
		}
	}

	private static byte[] readBytesFromFile(File file) throws FileNotFoundException, IOException {
		byte[] buffer = new byte[32];
		ByteArrayOutputStream bos = new ByteArrayOutputStream();
		try (FileInputStream fis = new FileInputStream(file)) {
			int nbRead = -1;
			while ((nbRead = fis.read(buffer)) != -1) {
				bos.write(buffer, 0, nbRead);
			}
		}
		return bos.toByteArray();
	}

	private static File createNewFile(String path) throws IOException {
		File f = new File(path);
		if (f.exists()) {
			f.delete();
		}
		f.createNewFile();
		return f;
	}

	private static void printStringCharArray(String s) {
		for (char c : s.toCharArray()) {
			System.out.print("0x" + Integer.toHexString(c & 0xffff) + " ");
		}
		System.out.println();
	}

}

Console output:

JSON characters (Original String):
0x7b 0x22 0x6d 0x65 0x73 0x73 0x61 0x67 0x65 0x22 0x3a 0x22 0x4d 0x79 0x4d 0x65 0x73 0x73 0x61 0x67 0x65 0x5f 0x96f7 0x65af 0x8003 0x22 0x7d 

JSON characters (String read from file):
0x7b 0x22 0x6d 0x65 0x73 0x73 0x61 0x67 0x65 0x22 0x3a 0x22 0x4d 0x79 0x4d 0x65 0x73 0x73 0x61 0x67 0x65 0x5f 0xfff7 0xffaf 0x3 0x22 0x7d 

Strings are not equals

Hi Jean,

By comparing the traces, we can see the character 0x96f7 has been turned to 0xfff7.

This can be explained as following:

First of all, when JSONObject is filled with a value that is a Java String, the characters of the String are not be escaped using the \uxxxx notation. Only white space characters are escaped. Thus, jsonObj.toString() returns a String with the same chinese characters, as we can see in your output (Original String).

The MicroEJ default encoding is ISO-8859-1, as in J2ME, so using jsonString.getBytes() is not correct as you know that your jsonString contains chinese characters. The high part of the character 0x96f7 is lost and only the low part is stored to the byte array (0xf7).

Then, after reading the bytes from the file, calling new String(readJsonStringBytes) also uses the default ISO-8859-1 encoding to build the String. The byte 0xf7 is extended to a Java character 0xfff7. On this point, I notice there is an issue here since it should be extended to 0x00f7 without sign extension, but this does not matter for your question.

Finally, as your JSON String contains chinese characters, you must specify an appropriate encoding that supports such characters, like UTF-8, and don’t rely to the default one. More generally, it is always a good practice to explicitly specify the encoding when passing from String to bytes and bytes to String in order to prevent code portability issues.

I modified your code to specify UTF-8 on both encoding and decoding lines:

package jsontest;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import org.json.me.JSONException;
import org.json.me.JSONObject;

public class JSONTest {

	public static void main(String[] args) throws FileNotFoundException, IOException, JSONException {
		// create a simple JSON String using chinese characters
		String jsonString = createSimpleJSONString("message", "MyMessage_雷斯考");
		System.out.println("JSON characters (Original String):");
		printStringCharArray(jsonString);

		// write JSON message with chinese characters in a file
		File f = createNewFile("test.txt"); //$NON-NLS-1$
		writeInFile(f, jsonString.getBytes("UTF-8"));

		// read JSON message previously written in the file
		byte[] readJsonStringBytes = readBytesFromFile(f);
		String readJsonString = new String(readJsonStringBytes, "UTF-8");
		System.out.println();
		System.out.println("JSON characters (String read from file):");
		printStringCharArray(readJsonString);

		System.out.println();
		if (readJsonString.equals(jsonString)) {
			System.out.println("Strings are equals");
		} else {
			System.out.println("Strings are not equals");
		}
	}

	private static String createSimpleJSONString(String key, String value) throws JSONException {
		JSONObject jsonObj = new JSONObject();
		jsonObj.put(key, value);
		return jsonObj.toString();
	}

	private static void writeInFile(File f, byte[] data) throws FileNotFoundException, IOException {
		try (FileOutputStream fos = new FileOutputStream(f)) {
			fos.write(data);
		}
	}

	private static byte[] readBytesFromFile(File file) throws FileNotFoundException, IOException {
		byte[] buffer = new byte[32];
		ByteArrayOutputStream bos = new ByteArrayOutputStream();
		try (FileInputStream fis = new FileInputStream(file)) {
			int nbRead = -1;
			while ((nbRead = fis.read(buffer)) != -1) {
				bos.write(buffer, 0, nbRead);
			}
		}
		return bos.toByteArray();
	}

	private static File createNewFile(String path) throws IOException {
		File f = new File(path);
		if (f.exists()) {
			f.delete();
		}
		f.createNewFile();
		return f;
	}

	private static void printStringCharArray(String s) {
		for (char c : s.toCharArray()) {
			System.out.print("0x" + Integer.toHexString(c & 0xffff) + " ");
		}
		System.out.println();
	}

}

Before running the code, ensure the option Libraries > EDC > Embed UTF-8 encoding is checked (cldc.encoding.utf8.included=true) or an UnsupportedEncodingException will be thrown.

Now it produces the correct output:

JSON characters (Original String):
0x7b 0x22 0x6d 0x65 0x73 0x73 0x61 0x67 0x65 0x22 0x3a 0x22 0x4d 0x79 0x4d 0x65 0x73 0x73 0x61 0x67 0x65 0x5f 0x96f7 0x65af 0x8003 0x22 0x7d 

JSON characters (String read from file):
0x7b 0x22 0x6d 0x65 0x73 0x73 0x61 0x67 0x65 0x22 0x3a 0x22 0x4d 0x79 0x4d 0x65 0x73 0x73 0x61 0x67 0x65 0x5f 0x96f7 0x65af 0x8003 0x22 0x7d 

Strings are equals