安全可靠的密码散列

本部分解释使用散列函数对密码进行安全处理背后的原因，以及如何有效的进行密码散列处理。

为什么需要把用户的密码进行散列化？
为何诸如 md5 和 sha1 这样的常见散列函数不适合用在密码保护场景？
如果不建议使用常用散列函数保护密码，那么如何对密码进行散列处理？
“盐”是什么？
如何保存“盐”？

为什么需要把用户的密码进行散列化？

设计任何接受用户密码的应用或服务时，密码散列是必须考虑到的最基本的安全问题之一。如果没有散列处理，泄露了存储的数据，任何存储的密码都将被窃取。如果用户使用不唯一的密码，那么不仅会危害应用或服务，还会危害其他服务上的用户账户。

对用户密码应用散列算法，然后再存储，这样任何攻击者几乎不可能确定原始密码，同时仍然能够在未来将生成的散列值与原始密码进行比较。

但值得注意的是，密码散列只能保护其在数据存储中不被泄露，但并不一定能保护它们不被注入到应用程序或服务的恶意代码截获。

为何诸如 md5() 和 sha1() 这样的常见散列函数不适合用在密码保护场景？

MD5，SHA1 以及 SHA256 这样的散列算法是面向快速、高效进行散列处理而设计的。随着技术进步和计算机硬件的提升，破解者可以使用暴力方式来寻找散列码所对应的原始数据。

因为现代化计算机可以快速的反转上述散列算法的散列值，所以很多安全专家都强烈建议不要在密码散列中使用这些散列算法。

如果不建议使用常用散列函数保护密码，那么如何对密码进行散列处理？

当进行密码散列处理的时候，有两个必须考虑的因素：计算量以及“盐”。散列算法的计算量越大，暴力破解所需的时间就越长。

PHP 提供了原生密码散列 API，它提供一种安全的方式来完成密码散列和验证。

散列密码时，建议采用 Blowfish 算法，这是密码散列 API 的默认算法。相比 MD5 或者 SHA1，具有更高的计算成本，同时还有具有良好的可扩展性。

crypt() 函数也可用于密码散列处理，但仅推荐用于与其他系统的彼此协作。相反，强烈建议尽可能使用原生密码散列处理 API。

“盐”是什么？

加解密领域中的“盐”是指在进行散列处理的过程中加入的一些数据，用来避免从已计算的散列值表（被称作“彩虹表”）中对比输出数据从而获取明文密码的风险。

简单而言，“盐”就是为了提高散列值被破解的难度而加入的少量数据。现在有很多在线服务都能够提供计算后的散列值以及其对应的原始输入的清单，并且数据量极其庞大。通过加“盐”就可以避免直接从清单中查找到对应明文的风险。

如果不提供“盐”，password_hash() 函数会随机生成“盐”。非常简单，行之有效。

如何保存“盐”？

当使用 password_hash() 或者 crypt() 函数时，“盐”会被作为生成的散列值的一部分返回。可以直接把完整的返回值存储到数据库中，因为这个返回值中已经包含了足够的信息，可以直接用在 password_verify() 函数来进行密码验证。

警告

应始终使用 password_verify()，而不是重新散列并将结果与存储的散列进行比较，以避免时序攻击。

下图展示了 crypt() 或 password_hash() 函数返回值的结构。可以看出，他们包含未来密码验证所需的算法和盐的所有信息

password_hash 和 crypt 函数返回值的组成部分，依次为：所选择的算法，算法选项，所使用的“盐”，以及散列后的密码。

发现了问题？

了解如何改进此页面 • 提交拉取请求 • 报告一个错误

＋添加备注

用户贡献的备注 3 notes

down

152

alf dot henrik at ascdevel dot com ¶

12 years ago

I feel like I should comment some of the clams being posted as replies here.

For starters, speed IS an issue with MD5 in particular and also SHA1. I've written my own MD5 bruteforce application just for the fun of it, and using only my CPU I can easily check a hash against about 200mill. hash per second. The main reason for this speed is that you for most attempts can bypass 19 out of 64 steps in the algorithm. For longer input (> 16 characters) it won't apply, but I'm sure there's some ways around it.

If you search online you'll see people claiming to be able to check against billions of hashes per second using GPUs. I wouldn't be surprised if it's possible to reach 100 billion per second on a single computer alone these days, and it's only going to get worse. It would require a watt monster with 4 dual high-end GPUs or something, but still possible.

Here's why 100 billion per second is an issue:
Assume most passwords contain a selection of 96 characters. A password with 8 characters would then have 96^8 = 7,21389578984e+15 combinations.
With 100 billion per second it would then take 7,21389578984e+15 / 3600 = ~20 hours to figure out what it actually says. Keep in mind that you'll need to add the numbers for 1-7 characters as well. 20 hours is not a lot if you want to target a single user. 

So on essence:
There's a reason why newer hash algorithms are specifically designed not to be easily implemented on GPUs.

Oh, and I can see there's someone mentioning MD5 and rainbow tables. If you read the numbers here, I hope you realize how incredibly stupid and useless rainbow tables have become in terms of MD5. Unless the input to MD5 is really huge, you're just not going to be able to compete with GPUs here. By the time a storage media is able to produce far beyond 3TB/s, the CPUs and GPUs will have reached much higher speeds.

As for SHA1, my belief is that it's about a third slower than MD5. I can't verify this myself, but it seems to be the case judging the numbers presented for MD5 and SHA1. The issue with speeds is basically very much the same here as well.

The moral here:
Please do as told. Don't every use MD5 and SHA1 for hasing passwords ever again. We all know passwords aren't going to be that long for most people, and that's a major disadvantage. Adding long salts will help for sure, but unless you want to add some hundred bytes of salt, there's going to be fast bruteforce applications out there ready to reverse engineer your passwords or your users' passwords.

down

swardx at gmail dot com ¶

10 years ago

A great read..

https://nakedsecurity.sophos.com/2013/11/20/serious-security-how-to-store-your-users-passwords-safely/

Serious Security: How to store your users’ passwords safely

In summary, here is our minimum recommendation for safe storage of your users’ passwords:

    Use a strong random number generator to create a salt of 16 bytes or longer.
    Feed the salt and the password into the PBKDF2 algorithm.
    Use HMAC-SHA-256 as the core hash inside PBKDF2.
    Perform 20,000 iterations or more. (June 2016.)
    Take 32 bytes (256 bits) of output from PBKDF2 as the final password hash.
    Store the iteration count, the salt and the final hash in your password database.
    Increase your iteration count regularly to keep up with faster cracking tools.

Whatever you do, don’t try to knit your own password storage algorithm.

down

-5

tamas at microwizard dot com ¶

4 years ago

While I am reading the comments some old math lessons came into my mind and started thinking. Using constants in a mathematical algorythms do not change the complexity of the algorythm itself.

The reason of salting is to avoid using rainbow tables (sorry guys this is the only reason) because it speeds up (shortcuts) the "actual" processing power.
(((Longer stored hashes AND longer password increases complexity of cracking NOT adding salt ALONE.)))

PHP salting functions returns all the needed information for checking passwords, therfore this information should be treated as constant from farther point of view. It is also a target for rainbow tables (sure: for much-much larger ones).

What is the solution?
The solution is to store password hash and salt on different places.
The implementation is yours. Every two different places will be good enough.

Yes, it will make problems for hackers. He/she needs to understand your system. No speed up for password cracking will work for him/her without reimplementing your whole system.

This is my two cent.