Sorting the characters in a utf-16 string in java Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?

How to leave only the following strings?

Compiling and throwing simple dynamic exceptions at runtime for JVM

Is there a verb for listening stealthily?

What were wait-states, and why was it only an issue for PCs?

Who's this lady in the war room?

How to ask rejected full-time candidates to apply to teach individual courses?

Raising a bilingual kid. When should we introduce the majority language?

Why these surprising proportionalities of integrals involving odd zeta values?

Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?

Is "ein Herz wie das meine" an antiquated or colloquial use of the possesive pronoun?

A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?

What's the difference between using dependency injection with a container and using a service locator?

Why is one lightbulb in a string illuminated?

Can this water damage be explained by lack of gutters and grading issues?

Married in secret, can marital status in passport be changed at a later date?

Are Flameskulls resistant to magical piercing damage?

How to produce a PS1 prompt in bash or ksh93 similar to tcsh

Marquee sign letters

Kepler's 3rd law: ratios don't fit data

Like totally amazing interchangeable sister outfit accessory swapping or whatever

FME Console for testing

Why did Israel vote against lifting the American embargo on Cuba?

Pointing to problems without suggesting solutions

Putting Ant-Man on house arrest



Sorting the characters in a utf-16 string in java



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago

















10















tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago













10












10








10


1






tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??






java string sorting utf-16






share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 3 hours ago









jtahlborn

47.6k56198




47.6k56198






New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 3 hours ago









dingydingy

536




536




New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago

















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago
















This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

– Guillaume F.
3 hours ago





This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

– Guillaume F.
3 hours ago












3 Answers
3






active

oldest

votes


















4














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

























  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    44 mins ago


















4














If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer

























  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    49 mins ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    45 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    42 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    14 mins ago



















3














We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    47 mins ago











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






dingy is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

























  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    44 mins ago















4














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

























  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    44 mins ago













4












4








4







I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer















I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.







share|improve this answer














share|improve this answer



share|improve this answer








edited 3 hours ago

























answered 3 hours ago









Jacob G.Jacob G.

17k52466




17k52466












  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    44 mins ago

















  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    44 mins ago
















Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

– dingy
44 mins ago





Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

– dingy
44 mins ago













4














If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer

























  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    49 mins ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    45 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    42 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    14 mins ago
















4














If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer

























  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    49 mins ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    45 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    42 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    14 mins ago














4












4








4







If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer















If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)







share|improve this answer














share|improve this answer



share|improve this answer








edited 1 hour ago

























answered 2 hours ago









Stephen CStephen C

528k72590946




528k72590946












  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    49 mins ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    45 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    42 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    14 mins ago


















  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    49 mins ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    45 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    42 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    14 mins ago

















Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

– dingy
49 mins ago





Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

– dingy
49 mins ago













I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

– Stephen C
45 mins ago






I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

– Stephen C
45 mins ago














Hmm.. sorry.. my bad =)

– dingy
42 mins ago





Hmm.. sorry.. my bad =)

– dingy
42 mins ago













See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

– Stephen C
14 mins ago






See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

– Stephen C
14 mins ago












3














We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    47 mins ago















3














We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    47 mins ago













3












3








3







We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.







share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 2 hours ago









peekaypeekay

22613




22613




New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    47 mins ago

















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    47 mins ago
















Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

– dingy
47 mins ago





Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

– dingy
47 mins ago










dingy is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















dingy is a new contributor. Be nice, and check out our Code of Conduct.












dingy is a new contributor. Be nice, and check out our Code of Conduct.











dingy is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Log på Navigationsmenu

Creating second map without labels using QGIS?How to lock map labels for inset map in Print Composer?How to Force the Showing of Labels of a Vector File in QGISQGIS Valmiera, Labels only show for part of polygonsRemoving duplicate point labels in QGISLabeling every feature using QGIS?Show labels for point features outside map canvasAbbreviate Road Labels in QGIS only when requiredExporting map from composer in QGIS - text labels have moved in output?How to make sure labels in qgis turn up in layout map?Writing label expression with ArcMap and If then Statement?

Nuuk Indholdsfortegnelse Etyomologi | Historie | Geografi | Transport og infrastruktur | Politik og administration | Uddannelsesinstitutioner | Kultur | Venskabsbyer | Noter | Eksterne henvisninger | Se også | Navigationsmenuwww.sermersooq.gl64°10′N 51°45′V / 64.167°N 51.750°V / 64.167; -51.75064°10′N 51°45′V / 64.167°N 51.750°V / 64.167; -51.750DMI - KlimanormalerSalmonsen, s. 850Grønlands Naturinstitut undersøger rensdyr i Akia og Maniitsoq foråret 2008Grønlands NaturinstitutNy vej til Qinngorput indviet i dagAntallet af biler i Nuuk må begrænsesNy taxacentral mødt med demonstrationKøreplan. Rute 1, 2 og 3SnescootersporNuukNord er for storSkoler i Kommuneqarfik SermersooqAtuarfik Samuel KleinschmidtKangillinguit AtuarfiatNuussuup AtuarfiaNuuk Internationale FriskoleIlinniarfissuaq, Grønlands SeminariumLedelseÅrsberetning for 2008Kunst og arkitekturÅrsberetning for 2008Julie om naturenNuuk KunstmuseumSilamiutGrønlands Nationalmuseum og ArkivStatistisk ÅrbogGrønlands LandsbibliotekStore koncerter på stribeVandhund nummer 1.000.000Kommuneqarfik Sermersooq – MalikForsidenVenskabsbyerLyngby-Taarbæk i GrønlandArctic Business NetworkWinter Cities 2008 i NuukDagligt opdaterede satellitbilleder fra NuukområdetKommuneqarfik Sermersooqs hjemmesideTurist i NuukGrønlands Statistiks databankGrønlands Hjemmestyres valgresultaterrrWorldCat124325457671310-5