ESOM ECO OTG Problem Solving Engineer
Veterans Engineering is seeking a Problem-Solving Engineer on our project, supporting the VA. The successful candidate will work with operation triage group incident and event management, problem management, and DevOps teams to guide the investigation and resolution of major Enterprise IT Incidents. Using multiple monitoring tools, logs, customer reports and other inputs, you will assist in diagnosing culpable systems or components and recommending corrective actions. detect, investigate, and diagnose system problems and defects across Enterprise level applications and technology stacks. Upon resolution of incidents, you will provide retrospective analysis and improvement recommendations to help prevent similar future outages.
The PSE will be on the front line of the VA Enterprise working with system and application owners and incident management teams. They will leverage comprehensive workflows and applications processes within multiple system environments and work across technology and development teams to diagnose outages and recommend changes to increase reliability. You will work with application developers, system administrators, cyber security/identify access management and network administrators to troubleshoot performance issues and outages.
This position will allow remote delivery anywhere within the U.S., to include the District of Columbia and may include shift work support during weekends, holidays, or off-hours, as required.
Requirements:
- 10+ years of experience in one or more Technology Areas (Network, Windows, Desktop, Unix/Linux, AWS or Azure Cloud, WebSphere Middleware, Java/JS Development, Microsoft or Oracle Database)
- 5+ years of experience working with key indicators for IT system operability, reliability, application performance and code quality
- Experience deploying, maintaining, and troubleshooting complex applications at an enterprise scale while working with cross-functional teams
- 3+ years monitoring and troubleshooting experience with one or more of the following APM tools, SolarWinds, AppDynamics, DynaTrace, Aternity, or ServiceNow Operator Workspace.
- Experience monitoring and troubleshooting application logging using Splunk
- Experience in service virtualization, AWS or Azure Cloud technologies, and SaaS and PaaS implementation.
- Possession of strong critical thinking and error assessment capabilities
- Experience with using Microsoft Office, including Word, Excel, and PowerPoint
- Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
- Master’s Degree in Computer Science, Engineering, or Equivalent and 10 total years of experience; or 20 total years of relevant experience in lieu of a degree
Desired Skills
- Experience with test-driven development, distributed systems, microservices and cloud-native application implementation
- Experience with the following tools: ScienceLogic SL-1, Riverbed – Aternity, and ServiceNow
- Possession of excellent written and verbal communication skills
- Experience working in an Operations Center or other fast-moving operations environment.
- Virtual team management