Site Reliability Engineer

Share:

Site Reliability Engineer

 Ironnet Cybersecurity Inc  Fulton, MD or McLean, VA

Job Description

As a Site Reliability Engineer at IronNet, you will be a technology leader in a dynamic team continuously improving the reliability and scalability of our AWS and on-prem deployment platforms, ensuring the highest levels of uptime and performance.

Responsibilities

  • Troubleshoot issues across IronNet’s large scale, complex product stack including hardware, software, application and network.  Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
  • Lead post-sales site surveys and implement scalable, robust solutions that meet the needs of IronNet customers. 
  • Drive standardization, enhanced monitoring, troubleshooting best practices and automation efforts across the organization
  • Recommend and make improvements to our existing production and staging environments.

 Technical Requirements

  • Fundamental knowledge of operating systems, networking, and distributed systems
  • Expert level Linux systems administration and management.  Knowledge of Linux/UNIX systems administration and performance tuning
  • Deep understanding of: Ethernet, VLAN, IPv4/IPv6, ARP, DHCP, DNS, and TCP.  Comfortable configuring DNS, DHCP, and LAN/WAN technologies. 
  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
  • Expert level knowledge of at least one public or private cloud technology such as Amazon AWS or OpenStack
  • Familiarity with OS container technology: Docker, LXC, namespaces/cgroups
  • Solid understanding of systems and application design and service design, including messaging protocols & behavior, caching strategies and software design practices   
  • Practical, solid knowledge of shell scripting and at least one higher-level language (Python or Ruby preferred).  Ability to develop clean, tested, and maintainable automation and other tools using (one or more of) Python, Ruby, Perl, or Go
  • Experience with distributed compute (e.g., Spark or Hadoop), storage (relational databases such as Postgres or MySQL, horizontally-scalable non-relational databases such as HBase, Riak, or Cassandra), and search infrastructure (such as ElasticSearch or Solr/Lucene)

Other Requirements

  • Minimum 7 years of managing services in an internet scale *nix environment
  • Previous application operations (a.k.a. "site reliability engineering", "production engineering") or experience in a large scale 24/7 production environment as a software engineer, systems administrator, operations engineer, release engineer, or similar role
  • Comfortable interfacing with customers as well as across engineering, product, security teams.  Excellent written and interpersonal communication, and documentation skills
  • Must be adaptable and have the ability to prioritize tasks and work independently
  • Excellent troubleshooting, diagnosis and analytical skills
  • B.S. in computer science or similar field desirable
  • Must have the ability to travel up to 50% and provide after-hours support to customer environments

 

OR
 
 
By clicking the button, I agree to the GetHired Terms of Service and Privacy Policy
GetHired.com member? Login to Apply
 
Powered by GetHired.com | Terms of Service | Privacy Policy