
Site Reliability Engineer/ System Administrator at ENGIE
- Kenya
- Permanent
- Full-time
- We are seeking a talented and experienced System Administrator/Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems and services. You will collaborate with cross-functional teams to implement and maintain robust infrastructure solutions, focusing on automation, monitoring, and incident response. The ideal candidate is passionate about optimizing and enhancing system reliability, possesses strong problem-solving skills, and is committed to driving excellence in operational practices.
- Infrastructure Automation:
- Develop and maintain automation tools and scripts for provisioning, configuration, and deployment.
- Implement infrastructure as code (IaC) practices to ensure consistency and reproducibility.
- Monitoring and Incident Response:
- Set up and maintain monitoring systems to detect and respond to performance issues and outages.
- Participate in on-call rotations and respond promptly to incidents, troubleshoot, and implement solutions to prevent recurrence.
- Performance Optimization:
- Optimize system performance through continuous analysis and tuning.
- Reliability Engineering:
- Implement best practices for reliability, such as error budgeting, SLIs/SLOs, and blameless post-mortems.
- Work towards minimizing manual intervention through automation.
- System Administration:
- Manage and maintain server infrastructure, including installation, configuration, and troubleshooting of operating systems.
- Implement and maintain security measures, such as firewalls and intrusion detection systems.
- Perform regular system backups and recovery procedures.
- Collaboration and Communication:
- Collaborate with cross-functional teams to align infrastructure and operational requirements.
- Provide technical guidance and support to colleagues in areas related to reliability.
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Proven experience as a Site Reliability Engineer or System Administrator.
- Strong Linux and Bash scripting skills.
- Proficiency in cloud platforms (e.g., AWS, Azure, GCP, Linode, DigitalOcean).
- Experience with container orchestration tools (e.g., Kubernetes, Docker, LXD).
- In-depth knowledge of networking, security, and system administration.
- Familiarity with infrastructure as code tools (e.g., Terraform, Ansible).
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
- Experience with CI/CD pipelines and related tools.
- Knowledge of distributed systems and microservices architecture.
- Familiarity with observability tools (e.g., Prometheus, Grafana, ELK stack).
- Familiarity with programming languages (e.g., Python, Ruby).
Jobs in Kenya