Finding collisions of the first few bits of a SHA-1 hashUsing Levenstein distance to compare stringsReview of reservoir samplingKarp-Rabin with bitwise hashMinimum window substring which includes all the characters from substringOptimizing Java SHA-512 String-to-Hash GeneratorHackerEarth - SimpleFunctionLeetcode 49: Group Anagrams - Hash function design talkFollow-up 1: Compare 2 unordered, rooted trees for shape-isomorphismChecking whether one string contains all the characters of another stringFinding the lowest possible number of changes to turn one string into another
Why are Stein manifolds/spaces the analog of affine varieties/schemes in algebraic geometry?
How to patch glass cuts in a bicycle tire?
Python program to take in two strings and print the larger string
What is the use case for non-breathable waterproof pants?
Should there be an "a" before "ten years imprisonment"?
What did the 'turbo' button actually do?
What Armor Optimization applies to a Mithral full plate?
Translation of “with that”
Mercedes C180 (W204) dash symbol
Shorten or merge multiple lines of `&> /dev/null &`
Dealing with spaghetti codebase, manager asks for things I can't deliver
Is it legal to have an abortion in another state or abroad?
Public transport tickets in UK for two weeks
Which European Languages are not Indo-European?
How to cut a climbing rope?
SFDX: where can set Field-level security and accessibility?
Function argument returning void or non-void type
Writing style before Elements of Style
How to politely tell someone they did not hit "reply to all" in an email?
Why did Theresa May offer a vote on a second Brexit referendum?
What does kpsewhich stand for?
What is the meaning of "<&3" and "done < file11 3< file22"
USPS Back Room - Trespassing?
Can my floppy disk still work without a shutter spring?
Finding collisions of the first few bits of a SHA-1 hash
Using Levenstein distance to compare stringsReview of reservoir samplingKarp-Rabin with bitwise hashMinimum window substring which includes all the characters from substringOptimizing Java SHA-512 String-to-Hash GeneratorHackerEarth - SimpleFunctionLeetcode 49: Group Anagrams - Hash function design talkFollow-up 1: Compare 2 unordered, rooted trees for shape-isomorphismChecking whether one string contains all the characters of another stringFinding the lowest possible number of changes to turn one string into another
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.
How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)
Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found
public static boolean findCollision(int x1, int x2)
String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";
//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);
String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);
if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.
public static void main(String[] args)
Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;
while (true)
while(true)
x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);
if (x1 != x2)
break;
if (findCollision(x1, x2) == true)
break;
counter++;
System.out.println("nNumber of trials: " + counter);
If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.
java time-limit-exceeded cryptography hashcode
New contributor
$endgroup$
add a comment |
$begingroup$
My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.
How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)
Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found
public static boolean findCollision(int x1, int x2)
String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";
//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);
String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);
if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.
public static void main(String[] args)
Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;
while (true)
while(true)
x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);
if (x1 != x2)
break;
if (findCollision(x1, x2) == true)
break;
counter++;
System.out.println("nNumber of trials: " + counter);
If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.
java time-limit-exceeded cryptography hashcode
New contributor
$endgroup$
$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
5 hours ago
add a comment |
$begingroup$
My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.
How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)
Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found
public static boolean findCollision(int x1, int x2)
String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";
//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);
String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);
if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.
public static void main(String[] args)
Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;
while (true)
while(true)
x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);
if (x1 != x2)
break;
if (findCollision(x1, x2) == true)
break;
counter++;
System.out.println("nNumber of trials: " + counter);
If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.
java time-limit-exceeded cryptography hashcode
New contributor
$endgroup$
My objective is to find a hash collision of my modified hash function. Assuming my modified hash only outputs the first 36 bits of SHA-1. As we know, SHA-1 is a 160-bit hash value, hence, we only need to output 9 characters out of 40 characters for comparison.
How i begin my program is by hashing a string (I have a SHA-1 algorithm running and i will name it as sha1. I have also ensured that the output of the SHA-1 algorithm is correct.)
Firstly, i would hardcode 2 string and extract out 9 characters out of 40 characters since i require only the first 36 bits of SHA-1. The function below basically return a true if a collision is found and false if a collision is not found
public static boolean findCollision(int x1, int x2)
String message1 = "I tried " + x1 + " iteration to find a collision";
String message2 = "I tried " + x2 + " iteration to find a collision";
//hashing my string and extracting 9 characters
String message_hash1 = sha1(message1);
String message_hash2 = sha1(message2);
String modified_hash1 = message_hash1.substring(0, 9);
String modified_hash2 = message_hash2.substring(0, 9);
if (modified_hash1.equals(modified_hash2))
return true;
else
return false;
Lastly, i will have a main function that will random Integers up to MAX_VALUE in a infinite loop and will break out if and only if a hash is found.
public static void main(String[] args)
Random random = new Random();
int x1 = 0;
int x2 = 0;
int counter = 0;
while (true)
while(true)
x1 = random.nextInt(Integer.MAX_VALUE);
x2 = random.nextInt(Integer.MAX_VALUE);
if (x1 != x2)
break;
if (findCollision(x1, x2) == true)
break;
counter++;
System.out.println("nNumber of trials: " + counter);
If i tried taking only the first 24 bits of SHA-1, i could easily find a collision. However, i'm unable to find a collision for 36 bits instead despite running it for hours. Hence, I'm wondering what is other alternative way for me to find a collision with just 36 bits of SHA-1.
java time-limit-exceeded cryptography hashcode
java time-limit-exceeded cryptography hashcode
New contributor
New contributor
edited 4 hours ago
200_success
132k20160427
132k20160427
New contributor
asked 5 hours ago
Astral ZhangAstral Zhang
262
262
New contributor
New contributor
$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
5 hours ago
add a comment |
$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
5 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.
However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)
New contributor
$endgroup$
add a comment |
$begingroup$
I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.
Considering your have 36 bits of data, it means you have a total number of possibilities of $2^36 = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.
So that's easy! All you need to do is run this (pseudocode):
hashset = (new hashset that uses your hashing algorithm)
for(i = 0; i < 68719476736 + 1; i++)
if hashset.contains(string(i)) break;
hashset.add(string(i));
print("This took " + i + " tried!")
With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.
My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.
So the question is, what could we do to make this better?
We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.
We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.
What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.
The pseudocode would look like this :
generator = RandomBigIntGenerator()
numbers = []
for(i = 0; i < 308652; i++)
numbers.add(generator.random(1,68719476736+1))
hashset = (With your hashing function)
for n in numbers
if hashset.contains(n)
print("yay done");
break;
hashset.add(m);
All in all, you can hope to have collisions, but you need computing power.
In a code review point of view :
- You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.
- Don't be too random when you create your strings.
- Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)
$endgroup$
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f220756%2ffinding-collisions-of-the-first-few-bits-of-a-sha-1-hash%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.
However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)
New contributor
$endgroup$
add a comment |
$begingroup$
With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.
However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)
New contributor
$endgroup$
add a comment |
$begingroup$
With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.
However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)
New contributor
$endgroup$
With 24 bits, there are approximately 16.8 million possible hashes, so on average, you'd have to try 8.4 million pairs until you find a collision. With 36 bits, the numbers have to be multiplied by 4096, which yields 68 respectively 34 billion. You can cut this in half by not computing two hashes during each iteration, but instead computing one before the loop and keeping it constant.
However, that is probably still more time than you want to spend. One way to reduce that time is to utilize the Birthday Paradox (https://en.m.wikipedia.org/wiki/Birthday_problem). By computing a list of hashes and comparing each one with each other your chances of finding a collision are much higher. I won't try to type a complete algorithm on this phone screen, though :-)
New contributor
New contributor
answered 4 hours ago
Hans-Martin MosnerHans-Martin Mosner
1212
1212
New contributor
New contributor
add a comment |
add a comment |
$begingroup$
I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.
Considering your have 36 bits of data, it means you have a total number of possibilities of $2^36 = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.
So that's easy! All you need to do is run this (pseudocode):
hashset = (new hashset that uses your hashing algorithm)
for(i = 0; i < 68719476736 + 1; i++)
if hashset.contains(string(i)) break;
hashset.add(string(i));
print("This took " + i + " tried!")
With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.
My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.
So the question is, what could we do to make this better?
We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.
We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.
What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.
The pseudocode would look like this :
generator = RandomBigIntGenerator()
numbers = []
for(i = 0; i < 308652; i++)
numbers.add(generator.random(1,68719476736+1))
hashset = (With your hashing function)
for n in numbers
if hashset.contains(n)
print("yay done");
break;
hashset.add(m);
All in all, you can hope to have collisions, but you need computing power.
In a code review point of view :
- You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.
- Don't be too random when you create your strings.
- Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)
$endgroup$
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
add a comment |
$begingroup$
I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.
Considering your have 36 bits of data, it means you have a total number of possibilities of $2^36 = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.
So that's easy! All you need to do is run this (pseudocode):
hashset = (new hashset that uses your hashing algorithm)
for(i = 0; i < 68719476736 + 1; i++)
if hashset.contains(string(i)) break;
hashset.add(string(i));
print("This took " + i + " tried!")
With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.
My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.
So the question is, what could we do to make this better?
We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.
We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.
What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.
The pseudocode would look like this :
generator = RandomBigIntGenerator()
numbers = []
for(i = 0; i < 308652; i++)
numbers.add(generator.random(1,68719476736+1))
hashset = (With your hashing function)
for n in numbers
if hashset.contains(n)
print("yay done");
break;
hashset.add(m);
All in all, you can hope to have collisions, but you need computing power.
In a code review point of view :
- You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.
- Don't be too random when you create your strings.
- Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)
$endgroup$
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
add a comment |
$begingroup$
I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.
Considering your have 36 bits of data, it means you have a total number of possibilities of $2^36 = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.
So that's easy! All you need to do is run this (pseudocode):
hashset = (new hashset that uses your hashing algorithm)
for(i = 0; i < 68719476736 + 1; i++)
if hashset.contains(string(i)) break;
hashset.add(string(i));
print("This took " + i + " tried!")
With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.
My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.
So the question is, what could we do to make this better?
We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.
We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.
What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.
The pseudocode would look like this :
generator = RandomBigIntGenerator()
numbers = []
for(i = 0; i < 308652; i++)
numbers.add(generator.random(1,68719476736+1))
hashset = (With your hashing function)
for n in numbers
if hashset.contains(n)
print("yay done");
break;
hashset.add(m);
All in all, you can hope to have collisions, but you need computing power.
In a code review point of view :
- You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.
- Don't be too random when you create your strings.
- Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)
$endgroup$
I realized that while typing this, you should look into the Birthday Attack, which will probably much more elaborated than my answer, lol.
Considering your have 36 bits of data, it means you have a total number of possibilities of $2^36 = 68719476736$. Based on the Pigeonhole principle, if you compared the hashed of 68719476736 different strings, you'd get a collision.
So that's easy! All you need to do is run this (pseudocode):
hashset = (new hashset that uses your hashing algorithm)
for(i = 0; i < 68719476736 + 1; i++)
if hashset.contains(string(i)) break;
hashset.add(string(i));
print("This took " + i + " tried!")
With this, you're guaranteed to get a collision. But there's a problem. If every iteration took 100 milliseconds, you'd need about 217 years to get a solution. Tell your teacher your great-great-grandchildren will get back to you with a solution. You could also buy 2000 computers to run this in ~40 days.
My algorithm is pretty much guaranteed to finish (assuming none of the computers crash in any way), yours isn't. This is more of a lesson on "how to test things", which is valuable for developers. When you want to test something, try not to be random. The thing with your algorithm is that maybe you wouldn't ever get collisions. I understand that you kind of need randomness if you want this to finish running this one day.
So the question is, what could we do to make this better?
We could check about how many different strings we'd need to get a good enough chance there's a collision. It's kind of the Birthday Problem which gives us an easy way to calculate what are the chances n different persons have the same birthday in a group of m persons.
We can use the formula under Approximation of number of people and adapt it to our problem to understand that if we had 68719476736 + 1 strings in our possession (for example, every number between 1 and 68719476736 + 1), you'd need to pick 308652 of those to have about 50% chances of having a collision.
What you could try is randomly take 308652 numbers between 1 and 68719476736 + 1 and hash them to find a collision. Repeat this as long as you don't have a collision.
The pseudocode would look like this :
generator = RandomBigIntGenerator()
numbers = []
for(i = 0; i < 308652; i++)
numbers.add(generator.random(1,68719476736+1))
hashset = (With your hashing function)
for n in numbers
if hashset.contains(n)
print("yay done");
break;
hashset.add(m);
All in all, you can hope to have collisions, but you need computing power.
In a code review point of view :
- You need to keep track of the strings you already tried, they give you valuable information that can actually help you find collision.
- Don't be too random when you create your strings.
- Understand that 68719476736 is a biiggg number, but still so much smaller than the 1461501637330902918203684832716283019655932542976 possible values the SHA1 have :)
answered 4 hours ago
IEatBagelsIEatBagels
9,08223579
9,08223579
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
add a comment |
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
I will try this birthday paradox method right now. Hopefully it will work.
$endgroup$
– Astral Zhang
4 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
i now have a problem while implementing this solution. If my final output of my program has to display the two original string and it's hash values, then using hashset doesn't work because hashset doesn't allow us to store 2 datatype. From your recommended algorithm, it mentioned to store the hash value directly and compare. If it's done that way, then there's no way i can identify the number used to generate that particular collision hash.
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
$begingroup$
how do you obtain this value 308652?
$endgroup$
– Astral Zhang
2 hours ago
add a comment |
Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.
Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.
Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.
Astral Zhang is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f220756%2ffinding-collisions-of-the-first-few-bits-of-a-sha-1-hash%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Out of curiosity, why are you trying to do this? Is it because you want to see if your modified algorithm is good?
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
of course not. this modified algorithm is never good because by decreasing the hash length is never a good thing. It's an assignment for my class work and i'm ran out of ideas on how to achieve a hash collision. that's the main objective of this assignment.
$endgroup$
– Astral Zhang
5 hours ago
$begingroup$
You should mention this in your question, I believe it's important :)
$endgroup$
– IEatBagels
5 hours ago
$begingroup$
i have done so. thanks!
$endgroup$
– Astral Zhang
5 hours ago