Senior Site Reliability Engineer

Full Time
Baltimore, MD 21244
Posted
Job description

Description

Job Description:

The Civilian Health Solutions Group has an opening for a Sr. Site Reliability Engineer to support a large healthcare contract.

Leidos is hiring an energetic, motivated, innovative individual to be a part of our team supporting Center for Medicare and Medicaid Services (CMS) in Baltimore, MD. The Cloud DBA works closely with the Program team to manage, maintain, and optimize application’s data and infrastructure that support CMS and the public. You will deliver solutions that ultimately ensure that the functions of Medicare, Medicaid, and Marketplace are carried out for the US citizen and contribute to efforts to reduce healthcare costs.

The role of a Systems Reliability Engineer will require you to develop solutions that are highly innovative and achieved through research and integration of best practices. Influence development of solutions that impact strategic project/program goals and business results while also leading work of other technical staff. You will resolve highly complex problems using significant application of technical knowledge, conceptualizing, reasoning, and interpretation. You will interact daily with various technical resources across different vendors which are fulfilling technical requirements for the customer.

The current work environment is remote leveraging various tools such as Slack, Microsoft Teams, and Zoom.

Primary Responsibilities

  • Successful candidate will be a member of a cross functional team comprised of well-rounded engineers who rapidly learn new skills and work across multiple disciplines to carry out end-to-end delivery of infrastructure services.

  • The Systems Reliability Engineer works closely with the Integrated Service Delivery (ISD) teams to manage, maintain, and optimize application’s data and infrastructure that support CMS and public health. You will maintain and deliver solutions that ultimately ensure that the functions of Medicare, Medicaid, and the Healthcare.gov Marketplace are carried out for the US citizen and contribute to efforts to reduce healthcare costs.

  • Work closely with Leidos Engineering and Operations staff as well as the customer’s application owners to solve technical problems at the network, system, and application levels.

  • Serve as the technical lead and point of contact in all areas of telemetry and observability.

  • Help design, test, and deploy technical solutions that are innovative and that leverage new technologies and new methods to shape customer operations.

  • Responsible and accountable for managing and following up on incidents, changes, and application release problems through the management channels.

  • Participate in on-call rotation and respond to incident alerts.

  • Building software and systems while managing the platform infrastructure and applications

  • Creating and maintaining various continuous integration/continuous development pipeline (CI/CD).

  • Focus on proactivity and enablement of self-healing systems.

  • Serve as the expert in creation of KPI’s and alerting thresholds for meaningful metrics relative to the health and performance of the applications the team manages.

  • Ensure availability, reliability, and security and performance of all resources across various applications; and reporting them to owners in a timely manner.

  • Must be a team player, but able to work independently on large, complex projects and assignments in fast paced environment.

  • Provide leadership in problem determination/analysis, isolating system problems utilizing diagnostic and system management tools.

  • Always provide professional and courteous service with excellent verbal and written communications skills.

  • Model inclusive leadership to teammates by building diversity into activities and meetings.

Basic Qualifications:

  • BS degree in in computer science or some equivalent, highly technical discipline. Experience may be substituted in lieu of degree.

  • 5+ years in technical engineering relative to the responsibilities of the Site Reliability Engineer position.

  • Thorough understanding of microservice based architecture.

  • Through understanding of coding best practices, including knowing how to code, typically in a variety of languages, both in a structured and OOP way (e.g., Python, Golang, Ruby, C/C++).

  • Proficient in programming languages for automation (e.g., python) and shell scripting (e.g., bash).

  • Deep knowledge of version control (e.g., Git) and ability to create GitOps practices.

  • Extensive experience with configuring and maintaining monitoring and alerting tools such as Nagios, CloudWatch, Grafana, Prometheus, Splunk ITSI.

  • Proficient in incident management tools (e.g., Splunk On-Call, PagerDuty)

  • Experience with variety of relational and non-relational databases/RDS (e.g., DynamoDB, MongoDB, CosmoDB, PostgreSQL).

  • Strong and relevant experience in cloud technologies, cloud services, IaC, cloud storage, cloud networking and cloud security.

  • Strong knowledge and experience with Cloud IaaS, PaaS, and SaaS offerings.

  • Strong experience with automation and CI/CD tools (e.g., Argo, Jenkins, Travis, Ansible).

  • Knowledge of cloud-based security tools, best practices and policies including demonstrated experience protecting all layers of the application stack.

  • Knowledge of the Software Delivery Life Cycle (SDLC).

  • Excellent writing and verbal communication skills.

  • Ability to manage conflict effectively.

  • Ability to adapt and be productive in a fast-paced dynamic environment.

  • Excellent communication and collaboration skills supporting multiple stakeholders and business operations.

  • Self-starter, self-managed, and a team player.

Preferred Qualifications

  • Cloud certification (e.g., AWS Solutions Architect Associate, Azure Administrator).

  • Monitoring certification (e.g., Splunk, Prometheus, DataDog)

  • Experience with containerization and orchestration tools (e.g., Kubernetes, Docker).

  • Experience with orchestrating ChatOps.

  • Experience with setting up self-healing components within a application’s infrastructure.

  • Agile-based knowledge and skill, including experience with Scrum Ceremonies and work management tools (e.g., (JIRA, Confluence).

  • Security Skills—Knowledge of information assurance compliance and information security basics within CMS.

Required Clearance

  • Ability to obtain a Public Trust clearance.

All candidates supporting the CMS programs must have lived in the United States at least three (3) out of the last five (5) years prior in order to be considered.

Pay Range:

Pay Range $97,500.00 - $150,000.00 - $202,500.00

The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.

#Remote

gatheringourvoice.org is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, gatheringourvoice.org provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, gatheringourvoice.org is the ideal place to find your next job.

Intrested in this job?

Related Jobs

All Related Listed jobs