Sunday, March 21, 2010

Compare files using MD5

When you are receiving files from different sources but the content is same, there is a possibility of having a different filename for the same content. Hence finding a duplicate file just by file name may not be sufficient. To compare by file data, there are several ways.

Usage of Message Digest (MD5)

To find duplicate files even after renamed, the content/data has to be compared after the content of files fetched. Once the file content is in data format, the data can be encoded with MD5 hash algorithm. The string result after hash can be used for comparing. MD5 is a widely used cryptographic hash function with a 128-bit hash value, and is also commonly used to check the integrity of files


MD5 in .Net Framework

.NET Framework has very rich support for encrypting and decrypting. Computing hashes and encrypting data using a variety of algorithms is very easy. Use the ComputeHash() method to compute the MD5 Hash.

Read more here: Find duplicate files

0 comments:

Post a Comment