observability tools
Engineering AI Systems for Autonomy and Resilience with Krishna Sai
Enterprise IT systems have grown into sprawling, highly distributed environments spanning cloud infrastructure, applications, data platforms, and increasingly AI-driven workloads.
Infrastructure Monitoring with Mark Carter
At Google, the job of a site reliability engineer involves building tools to automate infrastructure operations. If a server crashes, there is automation in place to create a new server.





