What can be learned from the break-in of Matrix.org?
It's been a short news story that matrix.org's largest server was hacked into, and they have reached one homeserver of the decentralised chat network and was able to access information generally closed from the large public (but accessibe for a server operator).
As a nice touch the person who's been behind the breakin was nice enough to share his/her thoughts about the case, and I believe (despite the fact that GitHub have been deleted the comments) there's lot to learn from the useful advices of out black hat colleague.
Let me share them with you, a bit trimmed and rephrased here and there.
The original break-in was made possible by a Jenkins bugs (CVE-2019-1003000, CVE-2019-1003001 and CVE-2019-1003002) which made it possible to run arbitrary code on the server. This happens.
-
Complete compromise could have been avoided if developers were prohibited from using ForwardAgent yes or not using -A in their SSH commands. The flaws with agent forwarding are well documented.
This is a reference on various posts calling for avoiding AgentForward and use ProxyCommand (ProxyJump) instead.
- Escalation could have been avoided if developers only had the access they absolutely required and did not have root access to all of the servers. I would like to take a moment to thank whichever developer forwarded their agent to Flywheel. Without you, none of this would have been possible.
- Once I was in the network, a copy of your wiki really helped me out and I found that someone was forwarding 22226 to Flywheel. With jenkins access, this allowed me to add my own key to the host and make myself at home. There appeared to be no legitimate reason for this port forward, especially since jenkinstunnel was being used to establish the communication between Themis and Flywheel.
- * I was able to login to all servers via an internet address. There should be no good reason to have your management ports exposed to the entire internet. Consider restricting access to production to either a vpn or a bastion host.
- * On each host, I tried to avoid writing directly to authorized_keys, because after a thorough peak at matrix-ansible-private I realized that access could have been removed any time an employee added a new key or did something else to redeploy users. But sshd_config allowed me to keep keys in authorized_keys2 and not have to worry about ansible locking me out.
- * The internal-config repository contained sensitive data, and the whole repository was often cloned onto hosts and left there for long periods of time, even if most of the configs were not used on that host. Hosts should only have the configs necessary for them to function, and nothing else. Kudos on using Passbolt. Things could have gotten real messy, otherwise.
- * Let's face it, I'm not a very sophisticated attacker. There was no crazy malware or rootkits. It was ssh agent forwarding and authorized_keys2, through and through. Well okay, and that jenkins 0ld-day. This could have been detected by better monitoring of log files and alerting on anomalous behavior. Compromise began well over a month ago, consider deploying an elastic stack and collecting logs centrally for your production environment.
- * There I was, just going about my business, looking for ways I could get higher levels of access and explore your network more, when I stumbled across GPG keys that were used for signing your debian packages. It gave me many nefarious ideas. I would recommend that you don't keep any signing keys on production hosts, and instead do all of your signing in a secure environment.
- * 2FA is often touted as one of the best steps you can take for securing your servers, and for good reason! If you'd deployed google's free authenticator module (sudo apt install libpam-google-authenticator), I would have never been able to ssh into any of those servers. Alternatively, for extra security, you could require yubikeys to access production infrastructure. Yubikeys are cool. Just make sure you don't leave it plugged in all the time, your hardware token doesn't do as much for you when it's always plugged in and ready for me to use. Alternate-Alternatively, if you had used a 2FA solution like Duo, you could have gotten a push notification the first time I tried to ssh to any of your hosts, and you would have caught me on day one. I'm sure you can setup push notifications for watching google-authenticator attempts as well, which could have at least given you a heads up that something fishy was going on. Anyways, that's all for now. I hope this series of issues has given you some good ideas for how to prevent this level of compromise in the future. Security doesn't work retroactively, but I believe in you and I think you'll come back from this even stronger than before.