Our client is a well-funded start up in the cybersecurity space that is creating a platform to help companies better understand and explore online communities. The Site Reliability Engineer will own the technical vision of the company infrastructure, access and make recommendations on current cloud provider, migrate infrastructure to a new cloud-based platform that suits the business, process management, and setting a standard for how the company handles outages.
- Analysis, design and implementation of the cloud strategy to enable a flexible and highly available environment
- Own the overall health of the infrastructure
- Evaluate the existing environment, research and make recommendations for migration to a new cloud provider
- Create a platform to help understand and explore online communities.
- Work with a rapidly growing team of software engineers, front end engineers, and data-scientists.
- Build out systems that can scale and are reliable.
- Collaborating will be huge, and there's a lot to do – there's no shortage of exciting work.
Required Skills & Qualifications
- 3+ years managing infrastructure
- Experience with cloud-based distributed systems including Azure, AWS, and/or GCP
- Experience in an Agile project delivery life-cycle
- Experience with Docker and a swarm/cluster management framework (Kubernetes, MesosOS, etc.)
- Experience building scalable products and data pipelines
- Experience with the following is a plus: Ansible, Jenkins, Terraform, Logstash
- Experience with a startup
- Skills and personality to operate effectively in a very fast-paced, complex business
- Knowledge of data warehousing and data processing technologies and trends