Job Description
Snapshot
The Site Reliability Engineer IV will collaborate with data and platform engineers, data scientists, and business teams to create infrastructure designs, monitor the health of our pipelines, and guide the development and implementation of cloud applications, systems and processes. You should be familiar with:
- SME - Cloud and virtualization-based technologies (AWS/Azure/VMWare or similar)
- Cloud infrastructure and application deployment automation ( Ansible / Terra form / Cloud Formation)
- Strong knowledge in Bigdata
- Infrastructure monitoring tools (ELK stack or similar)
- Strong knowledge of implementing and effectively developing helpdesk and IT operations best practices, including expert knowledge of security, storage, data protection, and disaster recovery protocols.
- Experience with Apache web server configuration and management.
- Management, deployment and troubleshooting of Java and Tomcat applications.
- Experience analyzing and evaluating the security of new and existing IT systems and the procedures to protect information system assets from intentional or inadvertent modification, disclosure, or destruction.
- Experience analyzing and evaluating the design and operating effectiveness of Information technology and security controls that are in place.
- SME - Bigdata based technologies (CDH/MapR or Similar)
- CI-CD tooling experience (Atlassian or similar)
- Java application server administration (Tomcat or similar)
- Containerization technologies (Docker/Kubernetes or similar)
- SME - Operating system administration (Linux/RedHat/Centos)
- SME - Configuration management tools (Chef/Puppet or similar)
- SME - Scripting (Bash) and programming (Python/Ruby or similar)
How you will help:
- Day to day operations of all in-house developed, open source, and commercial applications
- Ensure system availability, performance, capacity, and security
- Monitor and address incidents, events, and problems in a timely manner
- Develop procedures to automate building and deployment of systems and tasks
- Execute system administration of hosting platforms on private and public infrastructure
- Automate infrastructure and application deployment tasks
- Mentor and manage a team of 15+ full stack IT Operations engineers. Facilitate training on leadership, ownership and advanced technologies.
- Manage and coordinate the day-to-day planning, design and implementation of services, infrastructure automation, application deployments, change requests, infrastructure projects and customer implementations.
- Lead resolution/prevention of application issues/problems to ensure resolution in a timely manner to maintain high customer satisfaction.
- Responsible for IT Service delivery and remotely managing application infrastructure in US, Europe and Asia by collaborating with SMO, PMO SCV, Network Engineering, EOC and DB teams.
- End-to-end Incident and Change Management responsibilities using ITIL standards within a follow-the-sun context.
- Provide service to Synchronoss customers as part of 24x7 operations support team.
- Manage communication with internal and customer stakeholders.
Who we have in mind:
- Minimum 10+ years varied experience in operational roles supporting large complex 24x7 platform.
- Minimum 5 years of relevant hands-on technical management experience with systems architects/administrators as well as a record of individual technical achievement.
- Good understanding and experience working with different technical architectures, different hosting methods (IaaS, SaaS, PaaS)
- Good understanding of PCI, SSAE 16, ISO or equivalent certifications.
- Strong background in infrastructure automation and application deployment principles.
- Experience with AWS, private cloud and other public cloud offerings.
- Expert knowledge of Operational support processes, reporting, and related capacity and performance monitoring tools.
- Experience in delivery of large-scale Linux/UNIX, Oracle/MySQL and JVM based API services.
- Excellent skills in collaborative and participatory teamwork across global geo-regions.
- Excellent written, verbal, and presentation communications skills.
- Proficient in bash scripting and a programming language like Python or Ruby
- At least 8 years of experience in Software Development Life Cycle (SDLC)
- At least 8 years of experience in Build and CI/CD technologies
- At least 8 years of experience deploying Tomcat/Apache/WebLogic applications on premise and cloud
- At least 2 years of experience with Containerized environments and orchestration tools
- Familiarity of distributed file systems
- 2-3 years of experience Leading a team of 6-8
- 2-3 Experience in delivering and running Projects related to SaaS Operations