Matthew Rose
Skills and Technologies
- Cloud Platforms: AWS, Azure
- Containerization & Orchestration: Docker, Kubernetes, ArgoCD
- Infrastructure as Code: Terraform, CloudFormation
- Programming Languages: Go, Python, Shell Scripting
- Monitoring & Logging: Grafana, DataDog, Prometheus, OpenTelemetry, Loki
- CI/CD Tools: Jenkins, GitHub Actions, various build tools
- Networking Protocols: Proficiency in TCP/IP, BGP, DNS, HTTP/S, and TLS
- Operating Systems: Mac, Windows, Linux (ubuntu, RHEL)
Experience
Staff Reliability Engineer
data.world, Austin, TX – March 2024 – Present
- Led the transformation of legacy infrastructure and application code to support multi architecture docker builds, leading to a 20% performance increase
- Implemented support for multiple AWS capacity providers within a single ECS cluster, creating distinct compute environments that streamlined AMI deployments and facilitated the CPU architecture transition
- Developed lambdas with Golang and Docker which utilized AWS Config and Systems Manager to transition away from Lacework, achieving annual cost savings of ~$100,000
- Automated the build and deployment processes for Dockerized AWS Lambdas, demonstrating the feasibility and effectiveness of managing a monorepo Terraform repository that supports diverse application code
- Created a Terraform and Golang-based project to perform image scanning on newly built AMIs, ensuring no images with vulnerabilities were released. Implemented alerting mechanisms to notify relevant teams through various channels
Senior Site Reliability Engineer
data.world, Austin, TX – March 2023 – March 2024
- Designed and implemented an enterprise-scale Terraform solution by developing robust modules for a private registry, integrating Golang-based testing with Terratest, and configuring Terraform Cloud, thereby standardizing and modernizing infrastructure deployments across the organization
- Led the migration of legacy Infrastructure as Code from CloudFormation to Terraform, fostering a collaborative environment for IaC changes, improving visibility and auditability, and bolstering security through seamless OIDC integration
- Introduced comprehensive testing protocols for critical infrastructure using Golang and Terratest, uncovering previously unidentified issues and implementing effective solutions to close security and performance gaps
- Developed and maintained reusable GitHub Actions and workflows deployed enterprise-wide, promoting DRY principles and minimizing errors in routine tasks
Senior Site Reliability Engineer
OpenGov – May 2021 – March 2023
- Leveraged statistical methods to analyze CPU and memory consumption across 400+ databases, optimizing their distribution within Azure Elastic Pools to balance resource utilization and reduce the number of required pools—driving scalability and cost efficiency
- Responded to and resolved production incidents, conducting thorough investigations and prioritizing mitigation efforts to maintain high system reliability and minimal downtime
- Established and maintained comprehensive alerts and monitoring systems using Grafana, Azure Monitor, and New Relic to track critical SLIs and SLOs across infrastructure and applications, ensuring proactive issue detection and system health
Principal Software Engineer
Northrop Grumman, Huntsville, AL – October 2020 - May 2021
- Researched and prototyped advanced DevOps practices to assess their applicability in highly secure, classified on-premises environments, ensuring deployments meet strict compliance and security standards
- Utilized Docker, Docker Compose, the ELK stack, and Jenkins CI/CD pipelines to deliver software seamlessly to classified environments, establishing robust logging, monitoring, and automated deployment processes that enhance system reliability
Software Engineer
Northrop Grumman, Huntsville, AL – February 2019 - October 2020
- Implemented DevOps practices for legacy systems to streamline release cycles and accelerate system updates, contributing to enhanced reliability and faster incident response
- Automated the build and delivery processes across multiple operating systems (RHEL and Solaris), reducing manual intervention and ensuring consistent, reliable application deployments
- Designed and maintained Jenkins pipelines to improve visibility into the CI/CD process, ensuring rigorous monitoring and consistent delivery of critical applications
Previous Experience
- Associate Systems Software Engineer, Abaco Systems, Huntsville, AL — May 2018 – February 2019
- Software Engineering Aide, Radiance Technologies, Huntsville, AL — November 2015 – May 2018
Education
- Bachelor of Computer Science – 2018
University of Alabama in Huntsville - Bachelor of Music – 2011
University of Montevallo