The Site Reliability Engineers (SRE) works independently and is responsible for the overall performance, stability, reliability and scalability of enterprise-wide internet-facing systems; ensures ’s complex, web-scale systems are healthy, monitored, automated, and designed to scale. The SRE will direct and lead continuous improvement efforts and incident response of cross-functional support teams to troubleshoot and address database, OS, application, network and any other issues. |
PRIMARY DUTIES AND RESPONSIBILITES |
Essential Functions: · Subject matter expert in several technical domains, including infrastructure, storage design, operating systems, networking, engineering. · Lead and drive continuous improvement initiatives in domain expertise. · Uses technical expertise to reviews and makes recommendations for engineering reliability into code, infrastructure, OS, network, and processes used to ensure the application is always fast, available, and scalable. · Oversees the monitoring of software performance, packets flow, and hardware and how code interacts in support of managing services; predicting and preventing failures · As a Subject Matter Expert, makes recommendations to development teams to ensures the availability, speed, scalability and efficiency of services by engineering reliability into software and monitoring systems · Respond to and resolve emergent service problems; builds custom tools to automate daily functions to prevent problem recurrence · Works in close contact with Architecture, Development and Infrastructure teams on software and system performance analysis and tuning, service capacity planning and demand forecasting; coordinates efforts of cross-functional teams to design and implement solutions Required Knowledge and skills Education: · 10+ years engineering and/or administering a high-volume or critical production service environment running on a UNIX/Linux platform · Strong working knowledge of C, C++ or Java and Shell, Perl or Python. · Hands-on experience in Apache,JBoss,Tomcat, Load Balancers (F5) and Firewalls · Understanding of IP networking, network devices and common topologies. · Proven technical troubleshooting and performance tuning experience. · Excellent analytical skills, coupled with a strong sense of ownership, urgency and drive. · Ability to troubleshoot and resolve customer problems that arise and with a high degree of independence.as well as manage multiple task assignments · Excellent written communication skills. Preferred Knowledge and skills · Working knowledge in the following areas: o PureData (DB2) o Mongo DB o Redis o Qpid (MRG) o DataPower Working conditions Decision Making, Autonomy, Budgeting Authority & Supervisory Responsibility Scope of impact of decisions: Result of decisions: Problem Solving: Knowledge Level: Level of Autonomy: Budget Responsibility: Management: |
Please call or email
Janelle Razzino
Razzino Associates, Inc.
Suite E-1
(O) 201-722-3111
(c) 201-925-6086
(f) 201-722-3113
http://www.razzinoassociates.com/
This email is free from viruses and malware because avast! Antivirus protection is active. |
Posted by: John Rechenberg <jar1@optimum.net>
Reply via web post | • | Reply to sender | • | Reply to group | • | Start a New Topic | • | Messages in this topic (1) |
No comments:
Post a Comment