Meltdown/Spectre - Aftermath Edition
Incident Report for Pantheon Operations
Resolved
This incident has been resolved.
Posted Feb 06, 2018 - 15:39 PST
Update
This is a follow up to our last week’s update on Meltdown/Spectre vulnerabilities. (http://pantheon.statuspage.io/incidents/x9dmhz368xfz)

As we mentioned, Pantheon is committed to ensuring security is not compromised and thus we patched the infrastructure running your PHP code with high priority. We observed a large increase in CPU utilization after updating OS kernels. Even after adding additional resources to the platform, a number of customers reported impacted performance for their websites.

We added even more CPU resources to the platform while our engineers worked diligently to identify methods to restore site performance to pre-update levels. That work has paid off and we have some good news to share with you. The patched kernel (https://fedoramagazine.org/kpti-new-kernel-feature-mitigate-meltdown/) with subsequent optimizations is showing great results. We see significant improvement across all customer websites. As noted earlier, we might not be able to get the same output which we had pre-Meltdown/Spectre, but we are committed to making things as fast as they can possibly be.

In the coming days, our engineers will publish a blog post explaining this work in depth. Check out our blog (https://pantheon.io/blog) if you are interested in learning more.

We appreciate your feedback and staying engaged with us.

Pantheon
Posted Jan 18, 2018 - 14:07 PST
Monitoring
In the recent wake of the disclosure of the Meltdown/ Spectre vulnerabilities, Pantheon has taken steps to ensure the security of our customer sites. We applied appropriate patches to the servers running your PHP code. Pantheon did the right thing for your business, however, it came with a penalty. This penalty is imposed not only on your site nor Pantheon alone. It’s a global impact - where CPU performance on patched machines has taken a performance hit, anywhere between 10%-30%. We understand it’s difficult to accept this hit, but security cannot be ignored.

We are working diligently to minimize the impact on our customers. For example, we have already provisioned additional capacity so that your website and users get the best throughput in the post-Meltdown world. It isn’t sufficient to compare last week’s New Relic graphs/performance (pre-Meltdown) and this week’s performance (post-Meltdown). We all have to acknowledge that CPUs are now slower and that affects your website performance.

If you think the performance of your site has degraded significantly and it’s impeding your users' ability to conduct business with you, do share those insights. We cannot promise next steps right now, but we are working cross-functionally to offer you the best results. Upgrading PHP to version 7.x is one step customers can take to improve performance if they haven’t already done so.

Now is the perfect time to prune away all those redundant and duplicate modules from your codebase. Check your sites’ dashboard status pages for guidance on unused modules or plug-ins. Anything that adds unnecessary overhead such as UI modules can often be deactivated in production.

If you are interested to learn more about this, here are good explanations from Cloudflare - https://blog.cloudflare.com/meltdown-spectre-non-technical/, and The Register https://www.theregister.co.uk/2018/01/09/meltdown_spectre_slowdown/

Pantheon Support
Posted Jan 10, 2018 - 16:16 PST