2 Examples To Convert Byte[] Array To String Inwards Java
Converting a byte array to String seems slow but what is hard is, doing it correctly. Many programmers brand error of ignoring grapheme encoding whenever bytes are converted into a String or char or vice versa. As a programmer, nosotros all know that computer's exclusively empathize binary information i.e. 0 too 1. All things nosotros encounter too purpose e.g. images, text files, movies, or whatever other multi-media is stored inward shape of bytes, but what is to a greater extent than of import is procedure of encoding or decoding bytes to character. Data conversion is an of import theme on whatever programming interview, too because of trickiness of grapheme encoding, this questions is i of the most popular String Interview question on Java Interviews. While reading a String from input root e.g. XML files, HTTP request, network port, or database, you lot must pay attending on which grapheme encoding (e.g. UTF-8, UTF-16, too ISO 8859-1) they are encoded. If you lot volition non purpose the same grapheme encoding spell converting bytes to String, you lot would destination upward amongst a corrupt String which may incorporate totally wrong values. You mightiness receive got seen ?, foursquare brackets after converting byte[] to String, those are because of values your electrical current grapheme encoding is non supporting, too simply showing unopen to garbage values.
I tried to empathize why programmes brand grapheme encoding mistakes to a greater extent than oftentimes than not, too my piffling inquiry too ain sense suggests that, it may travel because of 2 reasons, get-go non dealing plenty amongst internationalization too grapheme encodings too 2nd because ASCII characters are supported past times almost all pop encoding schemes too has same values. Since nosotros mostly bargain amongst encoding similar UTF-8, Cp1252 and Windows-1252, which displays ASCII characters (mostly alphabets too numbers) without fail, fifty-fifty if you lot purpose different encoding scheme. Real lawsuit comes when your text contains special characters e.g. 'é', which is oftentimes used inward French names. If your platform's grapheme encoding doesn't recognize that grapheme thence either you lot volition encounter a different grapheme or something garbage, too sadly until you lot got your hands burned, you lot are unlikely to travel careful amongst grapheme encoding. In Java, things are piffling flake to a greater extent than tricky because many IO classes e.g. InputStreamReader by default purpose platform's grapheme encoding. What this way is that, if you lot run your programme inward different machine, you lot volition probable acquire different output because of different grapheme encoding used on that machine. In this article, nosotros volition larn how to convert byte[] to String inward Java both past times using JDK API too amongst the aid of Guava too Apache commons.
1) You tin purpose constructor of String, which takes byte array too grapheme encoding
This is the right way to convert bytes to String, provided you lot know certainly that bytes are encoded inward the grapheme encoding you lot are using.
2) If you lot are reading byte array from whatever text file e.g. XML document, HTML file or binary file, you lot tin purpose the Apache Commons IO library to convert the FileInputStream to a String directly. This method every bit good buffers the input internally, thence at that topographic point is no take away to purpose unopen to other BufferedInputStream.
In guild to correctly convert those byte array into String, you lot must get-go discover right grapheme encoding past times reading meta information e.g. Content-Type, <?xml encoding="…"> etc, depending on the format/protocol of the information you lot are reading. This is i of the argue I recommend to purpose XML parsers e.g. SAX or DOM parsers to read XML files, they receive got tending of grapheme encoding past times themselves.
Some programmers, every bit good recommends to purpose Charset over String for specifying grapheme encoding, e.g. instead of "UTF-8" purpose StandardCharsets.UTF_8 mainly to avoid UnsupportedEncodingException inward worst case. There are half dozen criterion Charset implementations guaranteed to travel supported past times all Java platform implementations. You tin purpose them instead specifying encoding scheme inward String. In short, ever prefer StandardCharsets.ISO_8859_1 over "ISO_8859_1", every bit shown below :
Other criterion charset supported past times Java platform are :
If you lot are reading bytes from input stream, you lot tin every bit good banking firm tally my before post most 5 ways to convert InputStream to String inward Java for details.
Original XML
Here is our sample XML snippet to demonstrate issues amongst using default grapheme encoding. This file contains letter 'é', which is non correctly displayed inward Eclipse because it's default grapheme encoding is Cp1252.
And, this is what happens when you lot convert a byte array to String without specify grapheme encoding, e.g. :
This volition purpose platform's default grapheme encoding, which is Cp1252 in this case, because nosotros are running this programme inward Eclipse IDE. You tin encounter that letter 'é' is non displayed correctly.
To gain this, specify grapheme encoding spell creating String from byte array, e.g.
By the way, permit me become far clear that fifty-fifty though I receive got read XML files using InputStream hither it's non a adept practice, inward fact it's a bad practice. You should ever purpose proper XML parsers for reading XML documents. If you lot don't know how, delight banking firm tally this tutorial. Since this illustration is mostly to exhibit you lot why grapheme encoding matters, I receive got chosen an illustration which was easily available too looks to a greater extent than practical.
This rules should every bit good travel applied when you lot convert grapheme information to byte e.g. converting String to byte array using String.getBytes() method. In this instance it volition purpose platform's default grapheme encoding, instead of this you lot should purpose overloaded version which takes grapheme encoding.
That's all on how to convert byte array to String inward Java. As you lot tin encounter that Java API, especially java.lang.String course of written report provides methods too constructor that takes a byte[] too returns a String (or vice versa), but past times default they rely on platform's grapheme encoding, which may non travel correct, if byte array is created from XML files, HTTP asking information or from network protocols. You should ever acquire right encoding from root itself. If you lot similar to read to a greater extent than most what every programmer should know most String, you lot tin checkout this article.
Further Learning
Data Structures too Algorithms: Deep Dive Using Java
Algorithms too Data Structures - Part 1 too 2
Data Structures inward Java ix past times Heinz Kabutz
I tried to empathize why programmes brand grapheme encoding mistakes to a greater extent than oftentimes than not, too my piffling inquiry too ain sense suggests that, it may travel because of 2 reasons, get-go non dealing plenty amongst internationalization too grapheme encodings too 2nd because ASCII characters are supported past times almost all pop encoding schemes too has same values. Since nosotros mostly bargain amongst encoding similar UTF-8, Cp1252 and Windows-1252, which displays ASCII characters (mostly alphabets too numbers) without fail, fifty-fifty if you lot purpose different encoding scheme. Real lawsuit comes when your text contains special characters e.g. 'é', which is oftentimes used inward French names. If your platform's grapheme encoding doesn't recognize that grapheme thence either you lot volition encounter a different grapheme or something garbage, too sadly until you lot got your hands burned, you lot are unlikely to travel careful amongst grapheme encoding. In Java, things are piffling flake to a greater extent than tricky because many IO classes e.g. InputStreamReader by default purpose platform's grapheme encoding. What this way is that, if you lot run your programme inward different machine, you lot volition probable acquire different output because of different grapheme encoding used on that machine. In this article, nosotros volition larn how to convert byte[] to String inward Java both past times using JDK API too amongst the aid of Guava too Apache commons.
How to convert byte[] to String inward Java
There are multiple ways to alter byte array to String inward Java, you lot tin either purpose methods from JDK, or you lot tin purpose opened upward root unloosen APIs similar Apache common too Google Guava. These API provides at to the lowest degree 2 sets of methods to gain String shape byte array; one, which uses default platform encoding too other which takes grapheme encoding. You should ever purpose later on one, don't rely on platform encoding. I know, it could travel same or you lot mightiness non receive got faced whatever occupation thence far, but it's ameliorate to travel prophylactic than sorry. As I pointed out inward my concluding post most printing byte array every bit Hex String, It's every bit good i of the best exercise to specify grapheme encoding spell converting bytes to grapheme inward whatever programming language. It mightiness travel possible that your byte array incorporate non-printable ASCII characters. Let's get-go encounter JDK's way of converting byte[] to String :1) You tin purpose constructor of String, which takes byte array too grapheme encoding
String str = new String(bytes, "UTF-8");
This is the right way to convert bytes to String, provided you lot know certainly that bytes are encoded inward the grapheme encoding you lot are using.
2) If you lot are reading byte array from whatever text file e.g. XML document, HTML file or binary file, you lot tin purpose the Apache Commons IO library to convert the FileInputStream to a String directly. This method every bit good buffers the input internally, thence at that topographic point is no take away to purpose unopen to other BufferedInputStream.
String fromStream = IOUtils.toString(fileInputStream, "UTF-8");
In guild to correctly convert those byte array into String, you lot must get-go discover right grapheme encoding past times reading meta information e.g. Content-Type, <?xml encoding="…"> etc, depending on the format/protocol of the information you lot are reading. This is i of the argue I recommend to purpose XML parsers e.g. SAX or DOM parsers to read XML files, they receive got tending of grapheme encoding past times themselves.
String str = IOUtils.toString(fis,StandardCharsets.UTF_8);
Other criterion charset supported past times Java platform are :
- StandardCharsets.ISO_8859_1
- StandardCharsets.US_ASCII
- StandardCharsets.UTF_16
- StandardCharsets.UTF_16BE
- StandardCharsets.UTF_16LE
If you lot are reading bytes from input stream, you lot tin every bit good banking firm tally my before post most 5 ways to convert InputStream to String inward Java for details.
Original XML
Here is our sample XML snippet to demonstrate issues amongst using default grapheme encoding. This file contains letter 'é', which is non correctly displayed inward Eclipse because it's default grapheme encoding is Cp1252.
xml version="1.0" encoding="UTF-8"?> <banks> <bank> <name>Industrial & Commercial Bank of PRC </name> <headquarters> Beijing , China</headquarters> </bank> <bank> <name>Crédit Agricole SA</name> <headquarters>Montrouge, France</headquarters> </bank> <bank> <name>Société Générale</name> <headquarters>Paris, Île-de-France, France</headquarters> </bank> </banks> And, this is what happens when you lot convert a byte array to String without specify grapheme encoding, e.g. :
String str = new String(filedata);
This volition purpose platform's default grapheme encoding, which is Cp1252 in this case, because nosotros are running this programme inward Eclipse IDE. You tin encounter that letter 'é' is non displayed correctly.
xml version="1.0" encoding="UTF-8"?> <banks> <bank> <name>Industrial & Commercial Bank of PRC </name> <headquarters> Beijing , China</headquarters> </bank> <bank> <name>Crédit Agricole SA</name> <headquarters>Montrouge, France</headquarters> </bank> <bank> <name>Société Générale</name> <headquarters>Paris, Île-de-France, France</headquarters> </bank> </banks> To gain this, specify grapheme encoding spell creating String from byte array, e.g.
String str = new String(filedata, "UTF-8");
By the way, permit me become far clear that fifty-fifty though I receive got read XML files using InputStream hither it's non a adept practice, inward fact it's a bad practice. You should ever purpose proper XML parsers for reading XML documents. If you lot don't know how, delight banking firm tally this tutorial. Since this illustration is mostly to exhibit you lot why grapheme encoding matters, I receive got chosen an illustration which was easily available too looks to a greater extent than practical.
Java Program to Convert Byte array to String inward Java
Here is our sample programme to exhibit why relying on default grapheme encoding is a bad thought too why you lot must purpose grapheme encoding spell converting byte array to String inward Java. In this program, nosotros are using Apache Commons IOUtils course of written report to direct read file into byte array. It takes tending of opening/closing input stream, thence you lot don't take away to worry most leaking file descriptors. Now how you lot gain String using that array, is the key. If you lot furnish right grapheme encoding, you lot volition acquire right output otherwise a nearly right but wrong output.import java.io.FileInputStream; import java.io.IOException; import org.apache.commons.io.IOUtils; /** * Java Program to convert byte array to String. In this example, nosotros receive got get-go * read an XML file amongst grapheme encoding "UTF-8" into byte array too thence created * String from that. When you lot don't specify a grapheme encoding, Java uses * platform's default encoding, which may non travel the same if file is a XML document coming from unopen to other system, emails, or patently text files fetched from an * HTTP server etc. You must get-go uncovering right grapheme encoding * too thence purpose them spell converting byte array to String. * * @author Javin Paul */ public class ByteArrayToString{ public static void main(String args[]) throws IOException { System.out.println("Platform Encoding : " + System.getProperty("file.encoding")); FileInputStream fis = new FileInputStream("info.xml"); // Using Apache Commons IOUtils to read file into byte array byte[] filedata = IOUtils.toByteArray(fis); String str = new String(filedata, "UTF-8"); System.out.println(str); } } Output : Platform Encoding : Cp1252 <?xml version="1.0" encoding="UTF-8"?> <banks> <bank> <name>Industrial & Commercial Bank of China </name> <headquarters> Beijing , China</headquarters> </bank> <bank> <name>Crédit Agricole SA</name> <headquarters>Montrouge, France</headquarters> </bank> <bank> <name>Société Générale</name> <headquarters>Paris, Île-de-France, France</headquarters> </bank> </banks>
Things to shout out upward too Best Practices
Always remember, using grapheme encoding spell converting byte array to String is non a best exercise but mandatory thing. You should ever purpose it irrespective of programming language. By the way, you lot tin receive got banknote of next things, which volition aid you lot to avoid twain of nasty issues :- Use grapheme encoding from the root e.g. Content-Type inward HTML files, or <?xml encoding="…">.
- Use XML parsers to parse XML files instead of finding grapheme encoding too reading it via InputStream, unopen to things are best left for demo code only.
- Prefer Charset constants e.g. StandardCharsets.UTF_16 instead of String "UTF-16"
- Never rely on platform's default encoding scheme
This rules should every bit good travel applied when you lot convert grapheme information to byte e.g. converting String to byte array using String.getBytes() method. In this instance it volition purpose platform's default grapheme encoding, instead of this you lot should purpose overloaded version which takes grapheme encoding.
That's all on how to convert byte array to String inward Java. As you lot tin encounter that Java API, especially java.lang.String course of written report provides methods too constructor that takes a byte[] too returns a String (or vice versa), but past times default they rely on platform's grapheme encoding, which may non travel correct, if byte array is created from XML files, HTTP asking information or from network protocols. You should ever acquire right encoding from root itself. If you lot similar to read to a greater extent than most what every programmer should know most String, you lot tin checkout this article.
Further Learning
Data Structures too Algorithms: Deep Dive Using Java
Algorithms too Data Structures - Part 1 too 2
Data Structures inward Java ix past times Heinz Kabutz
![2 Examples to Convert Byte[] Array to String inward Java Converting a byte array to String seems slow but what is hard is 2 Examples to Convert Byte[] Array to String inward Java](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwlA5qIpOmo-6ovbYe7wJrQA0mXgQ3OYqy3AafHxV8WCEGt9DwfUjtcr5kJLZ3wx4N_4k1iWp93KqymaZxxsbtKk2ceZaEfweehzvCgx5J5kuKqvTYIeFMPzSPDBKWp2n20w37hZwa7nuy/s1600/Character+Encoding,+Converting+Byte+array+to+String+in+Java.png)
Komentar
Posting Komentar