
SRE_Director_Software Production Management and Reliability Engineering
- Bangalore, Karnataka
- Permanent
- Full-time
At Morgan Stanley India, we support the Firm’s global businesses, with critical presence across Institutional Securities, Wealth Management, and Investment management, as well as in the Firm’s infrastructure functions of Technology, Operations, Finance, Risk Management, Legal and Corporate & Enterprise Services. Morgan Stanley has been rooted in India since 1993, with campuses in both Mumbai and Bengaluru. We empower our multi-faceted and talented teams to advance their careers and make a global impact on the business. For those who show passion and grit in their work, there’s ample opportunity to move across the businesses for those who show passion and grit in their work.Interested in joining a team that’s eager to create, innovate and make an impact on the world? Read on…What you’ll do in the role:
- Proactively detecting, troubleshooting, and resolving all issues affecting production applications. This involves coordination with and escalation to development and external teams where necessary. This team owns all issues escalated to us until it is resolved or a workaround is provided for end user to continue functioning.
Develop and continually revise (in partnership with other teams where necessary) suitable policies and procedures to ensure appropriate application development standards are available to guide development for systems deployed to Production.
As the gatekeepers of the Production environment, responsible for ensuring the Change Implementation Management guidelines/policies are adhered to for all systems deployed to Production.* Responsible for servicing all requests for data or other activities that require access to Production systems
Work with development teams at the appropriate stages in application development to ensure any new systems or projects meet the Production standard
Responsible for maintaining and growing a body of knowledge that is accessible to all team members. Ensure information regarding any support related activities or issues are available and easily accessible. The goal is to improve self-reliance and reduce dependency on the availability of development or external team resources for the initial troubleshooting and resolution of problems.
As a team member with expertise in deep analytical triage, you will provide subject matter expertise in debugging, issue analysis and troubleshooting, working with business and technical colleagues to provide reviews and recommendations to avoid any future application issues. Produce guidance documentation, standards and procedures, products assessments, and training material including working with the various application and infrastructure support teams ensuring that they are documenting every single troubleshooting step in Morgan Stanley knowledge base system to resolve issues in a faster time frame. You will serve as a fully seasoned/proficient technical resource; provide technical knowledge in outage management and proactive solutions to improve the user experience.What you’ll bring to the role:
- At least 4 years’ relevant experience would generally be expected to find the skills required for this role
- Minimum 7 years of extensive experience in Mainframe (Batch & Backend processing), SQL, MSSQL and Teradata technologies.
- 5+ years of experience with handling mainframe & Autosys jobs abends. Good understanding of COBOL, JCL, Mainframe & Distributed DB2 technologies
- SQL/Databases (MSSQL/SYBASE/DB2): Understanding of tables, views, indexes, and stored procedures, and the ability to understand them by reading their definitions. Familiarity with SQL constructs. Understanding of transactions, query plan analysis and database troubleshooting.
- Unix / Linux: Experience of supporting Unix based applications including experience troubleshooting in a Unix environment.
- Shell Scripting: Ability to write a shell script from scratch and ability to understand existing scripts by looking at them.
- Autosys: Ability to create and debug Autosys jobs and dependencies. Ability to analyze a complex job stream and correct any inconsistencies, errors or omissions and point out potential problems.
- Experience with languages such as: Java, Cobol, PowerShell, PHP, Python, Perl, and Ruby
- Have experience with observability tools such as Prometheus, Grafana , Loki, kibana, splunk etc
- 5+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail.
- Have administrative competence in at least one major programming language or platform (for example: COBOL, JCL, Perl, Powershell, Python, Java or C#)
- Experience/knowledge with distributive web hosting services, databases and MQ processing. i.e. Tomcat, WebSphere, Microsoft IIS, Db2, MSSQL
- Experience in developing and/or supporting Distributed, Mainframe batch and ETL technologies.
- Experience working in Job Schedulers like TWS, comet and Autosys.* Good working knowledge of Cloud Engineering. Understanding of private cloud principles and exposure to public cloud offerings such as AWS, Azure or similar technology is preferred
- Willingness to embrace Agile and DevOps/SRE concepts.
- Windows: Basic understanding of the Windows environment
- Experience with incident on call and ability to respond to emergencies on a 24/7 basis
- Proven ability to understand and troubleshoot complex problems under pressure
- Hands-on experience administering large-scale, high-availability systems and the tools to monitor performance and availability
- Solid analytical skills, problem determination, and resolution recovery processes
- Ability to interface and cultivate excellent working relationships with technology teams, business analysts, and vendors
- Should be a fast learner of technologies in a quick paced environment.
- Have strong organizational skills and the ability to manage multiple tasks and high pressure situations for outage handling, management, or resolution
- Are driven to learn about new technologies, techniques and what it takes to be an integral member of this team
- Hands-on experience administering large-scale, high-availability systems and the tools to monitor performance and availability
- Experience creating technical architecture documentation
- Excellent communication and writing skills specific to technical discussions across the management layers
- BS/MS or equivalent, preferably in quantitative discipline (Computer Science, Computer Engineering, EE, Math, Physics).