Site Reliability Engineer (SRE) Lead

CloudMD Markham, ON
Share:

Who We Are

At CloudMD, we are revolutionizing healthcare delivery through technology and a patient-centric approach, ensuring continuity of care remains at the forefront of our mission. By harnessing the power of healthcare technology, we are constructing an integrated platform that caters to every step of a patient's healthcare journey, ultimately granting improved access to care and enhanced outcomes. Within our technical team, we uphold a pragmatic approach to software design, development, and deployment. Our primary focus lies in creating software solutions that excel in real-life scenarios, addressing genuine needs and challenges. Recognizing the significance of well-defined and repeatable development and release processes, we prioritize consistency, simplicity, and automation wherever possible. This commitment allows us to continually advance our services and exceed expectations.

 

Who You'll Work With

As a Site Reliability Engineer (SRE) you will be managing, leading, mentoring a small team and working hands-on with Microsoft Azure cloud. Reporting into to the Director of Software Engineering, you will work with a close-knit team of diverse professionals from Engineering, Product and Design.

 

What You'll Be Doing

Join CloudMD as a Full Stack Developer and be at the forefront of revolutionizing the digital healthcare landscape. As a valuable team member, you will be instrumental in designing and implementing Container based Cloud Infra. in creating innovative products, with a primary focus on our flagship platform, Kii Health. This comprehensive platform amalgamates various CloudMD services, including mental health services, iCBT, virtual medical care, employee health assistance programs, and more. Embracing a collaborative team-driven approach, you will work in an environment centred around sprints and Jira. Kii Health is powered by modern Java Quarkus services at its core, complemented by a Vue.js frontend, and efficiently hosted on Azure Kubernetes, all deployed using automated CI/CD processes. Seamless integrations are built between our internal services and third-party partners using OIDC and SAML 2.0 protocols.

To thrive in this role, the ideal candidate should possess experience in developing and maintaining largescale enterprise applications. With an SRE cloud background leveraging python, Go, etc. Additionally, familiarity with contemporary Typescript/JavaScript frameworks like Vue.js, React, or Angular would be advantageous, though not mandatory.

 In this role, you will play a critical role in implementing, and maintaining Azure infrastructure while ensuring the reliability, availability, and performance of the data platform services. As an SRE Azure Architect, you will collaborate with cross-functional teams to build scalable and efficient solutions that align with our business goals.

  • Azure Infrastructure Design: Analyze and optimize the design and architecture of Azure-based Enterprise Data and Analytics platfrom that meet the organization's performance, scalability, security, and cost-efficiency requirements.
  • Reliability Engineering: Implement SRE principles to improve the reliability and availability of services by designing automated monitoring, alerting, and incident response systems. 
  • Infrastructure as Code (IaC): Utilize Infrastructure as Code tools (e.g., Terraform, ARM templates, YAML, Shell) to automate the provisioning and management of Azure resources.
  • Performance Optimization: Identify and address performance bottlenecks within the Azure environment through monitoring, analysis, and tuning of infrastructure components.
  • High Availability and Disaster Recovery: Design and implement solutions for high availability and disaster recovery across Azure regions and availability zones.
  • Automation: Develop and maintain automation scripts and tools to streamline deployment, scaling, and management of Azure resources. Build and Manager DevSecOps Pipeline automation using Azure DevOps, Github etc.
  • Collaboration: Collaborate with development, operations, and security teams to ensure smooth deployment and operation of applications on Azure infrastructure.
  • Incident Response: Participate in on-call rotations, responding to incidents, diagnosing and resolving issues promptly, and conducting post-incident reviews.

 What you need to be successful

  • Bachelor's degree in Computer Science, Information Technology, or related field. Master's degree is an asset. 
  • 5+ years of Cloud SRE experience preferably in Microsoft Azure.
  • Professional certifications such as Microsoft Certified: Azure Solutions Architect Expert, Microsoft Certified: Azure DevOps Engineer Expert, or relevant SRE certifications.
  • Extensive experience designing, implementing, and managing Azure-based solutions in a production environment.
  • Strong background in Infrastructure as Code (IaC)- ARM, Terraform practices and tools.
  • Proficiency in scripting and automation using languages like PowerShell, Python, or Bash.
  • Hand-on experience with Databricks Clusters, ADF - Azure data factory, Azure data lake, Apache and Spark - UI and command line - ability to analyze, debug, and deliver insights for driver logs and executor logs.
  • Deep understanding of SRE methodologies, including monitoring, alerting, incident management, and capacity planning.
  • Knowledge of Cloud Security capabilities and frameworks.
  • Knowledge of network architecture, security best practices, and compliance standards within the Azure ecosystem.
  • Excellent problem-solving skills and the ability to troubleshoot complex technical issues efficiently.
  • Strong communication skills and the ability to work collaboratively in cross-functional teams. 
  • Prior experience in mentoring or leading junior SREs or engineers is a plus Collaborating with other developers, product managers, and stakeholders

 

Additional skills required  

  • Application development experience and background.
  • Familiarity with common stacks, such as MEAN, MERN, LAMP, etc.
  • Knowledge of multiple front-end languages and libraries, such as Python, Go HTML, CSS, JavaScript, jQuery, React, Angular, etc.
  • Knowledge of multiple database technologies, such as MySQL, MongoDB, PostgreSQL, etc.
  • Knowledge of web servers, such as Apache, Nginx, etc.
  • Knowledge of web development tools, such as Git, Webpack, Babel, etc.
  • Knowledge of web development best practices, such as Agile methodologies, RESTful APIs, etc.
  • Knowledge of Docker, Kubernetes, Helm
  • Ability to work independently and with a team
  • Ability to learn new technologies quickly
  • Ability to solve complex problems creatively
  • Attention to detail and quality

We thank everyone who is interested in our role. We'll reach out to candidates directly if we are interested in moving forward.

CloudMD is an equal opportunity employer. We do not discriminate on the basis of race, ancestry, religion, color, national origin, gender, sexual orientation, gender orientation or expression, political belief, age, marital status, or disability status.

CloudMD is also committed to fostering a culture of belonging, which includes ensuring an accessible work environment and employment practices. If you require an accommodation in completing any pre-employment assessments or applications, interviewing or otherwise participating in the recruitment process, please email recruiting@cloudmd.ca.

Protecting the safety and welfare of employees, clients, and patients that use our services is of utmost importance to us. For this reason, final applicants will be asked to undergo a background check.

 
 
By clicking the button, I agree to the GetHired Terms of Service and Privacy Policy
GetHired.com member? Login to Apply
 
Powered by GetHired.com | Terms of Service | Privacy Policy