Senior Staff Systems Engineer
ServiceNow View all jobs
- Hyderabad, Telangana
- Permanent
- Full-time
- Software Development
◦ Build scalable, reliable backend services/APIs/microservices with clear SLAs and contracts.
- Code Quality & Standards
◦ Participate in design/code reviews; enforce linters, static analysis, and style guides.
- Systems Design & Architecture
◦ Integrate with relational/NoSQL stores, streams/queues, and cloud-native services.
- Troubleshooting & Reliability
◦ Instrument with Prometheus/OpenTelemetry/Grafana; define actionable alerts tied to SLOs.
- Collaboration & Delivery
◦ Author crisp design docs/ADRs and runbooks; provide thoughtful PR reviews and mentorship.
- Cloud Integration
◦ Build with Docker/containerd and operate Kubernetes (ingress, autoscaling, secrets, config).
- CI/CD & Automation
◦ Codify infra with Terraform (modules/workspaces) and adopt GitOps (Argo CD/Flux).
◦ Create internal CLIs/SDKs and scripts (Go/Python/shell) to remove toil and speed dev workflows.
- Security & Compliance
◦ Bake in security checks (SAST/DAST, dependency scanning, SBOM/signing) and follow CIS/benchmarks.
- AI & Intelligent Automation
◦ Deliver AI/RAG features and embed AI in ops (log/alert triage, ChatOps);
◦ Ensure quality/safety/cost via eval sets, telemetry & token tracking, and PII-safe redaction/rate limits.QualificationsTo be successful in this role you have:
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
- 12+ years building/operating high-scale backend APIs and platforms with end-to-end ownership.
- Go expertise: idiomatic design; goroutines/channels; context/timeouts; sync patterns; memory/GC tuning; pprof/benchmarks; race detector; generics; clean error handling.
- API design & productization: REST/gRPC; OpenAPI/Protobuf; versioning; pagination/filtering; partial responses; idempotency keys; retries/backoff; circuit breakers; rate limiting/quotas; SDK generation; deprecation strategy.
- Data & caching: PostgreSQL/MySQL schema design, migrations, transactions, pooling, replicas; Redis (cache-aside/write-through), locks, TTLs; messaging with Kafka/SQS/Pub/Sub, DLQs, “exactly-once-ish” via idempotency.
- Reliability & observability: structured logs (slog/zap), request/correlation IDs, OpenTelemetry traces/metrics/logs, dashboards & SLOs/error budgets; feature flags; blue-green/canary; chaos/soak tests.
- API security: OAuth2/OIDC/JWT, mTLS, RBAC/ABAC, input validation, OWASP API best practices; secrets via Secrets Manager/Key Vault/Secret Manager; KMS and audit logging.
- Kubernetes ops: EKS/AKS/GKE; cluster upgrades; HPA/VPA; PDBs; network policies; ingress/gateways; storage classes.
- Cloud depth: Strong in one (AWS/Azure/GCP) with working knowledge of the others.
- CI/CD & supply chain: GitHub Actions/GitLab/Jenkins; multi-stage pipelines; contract/integration tests; k6/Vegeta load tests; linters/static analysis; SBOM & cosign; admission controls/policy-as-code.
- Communication & leadership: clear design docs/ADRs, constructive reviews, mentorship, crisp incident comms.
- AIOps : pragmatic use of embeddings/vector stores and RAG for runbooks/KB, with evals, cost/latency tracking, and PII-safe guardrails.