Platform & Orchestration Engineer
Rimini Street View all jobs
- Hyderabad, Telangana
- Permanent
- Full-time
- Deploy, configure, and operate Restate clusters in production Kubernetes environments (multi-node, cross-AZ replication, S3 snapshots).
- Design and implement Virtual Object patterns for business entities (invoices, purchase orders, vendors) — each a 'living' durable object with its own state and lifecycle.
- Build awakeable patterns for human-in-the-loop approval workflows that can suspend for days and resume on external signals.
- Implement durable timer patterns for non-event detection (overdue approvals, missing payments, SLA breaches).
- Design idempotency strategies per ERP system to guarantee exactly-once writes for financial transactions.
- Monitor and optimize Restate performance using the embedded SQL engine, UI, and CLI tooling.
- Manage service versioning and deployment draining to ensure zero-downtime upgrades of workflow logic.
- Design and build the workflow-as-data engine that enables runtime-configurable business processes stored as JSON/YAML definitions in PostgreSQL.
- Create parameterized workflow templates for each Rimini Solution (Finance, Procurement, Supplier, Expense, Support) with customer-adjustable decision points.
- Implement the generic Workflow Engine Virtual Object that interprets workflow definitions at runtime — approval thresholds, routing rules, escalation timing, autonomy tier per step.
- Build the foundation for a future visual workflow designer by ensuring all workflow logic is externalized from compiled code.
- Own the OPA (Open Policy Agent) policy-as-code layer — write, test, and deploy Rego policies for risk-tiered autonomy (LOW/MEDIUM/HIGH).
- Build the dual-stream audit bridge: operational logs (OpenTelemetry → Jaeger/Prometheus) and compliance logs (structured records → PostgreSQL with 7-year retention).
- Ensure every ctx.run() step in Restate handlers produces a corresponding structured audit record capturing who, what, when, authorization decision, and before/after state.
- Implement the circuit breaker hierarchy (rate limiter → cost governor → error rate breaker → objective breaker → kill switch).
- Implement the four-layer proactive architecture: Sensing (events/timers/MCP Sampling) → Governance (OPA) → Execution (Restate + reactive agents) → Audit (dual-stream logging).
- Build the 'Self-Waking Virtual Object' pattern enabling millions of monitoring agents at near-zero cost through durable sleep.
- Integrate ERP event sources (SAP Event Mesh, Oracle EBS Business Event System) with deterministic filter pipelines.
- Implement trigger recursion detection and prevention ('Did I cause this?' deduplication checks).
- 6+ years of software engineering experience with strong Java or Kotlin proficiency.
- 3+ years working with distributed systems — event sourcing, state machines, message queues, or workflow engines.
- Production experience with at least one durable execution or workflow orchestration platform (Temporal, Restate, AWS Step Functions, Cadence, Conductor, Camunda, or similar).
- Hands-on Kubernetes experience — deploying StatefulSets, managing persistent storage, configuring operators.
- Experience with event-driven architectures (Kafka, RabbitMQ, or cloud-native event services).
- Understanding of exactly-once semantics, idempotency patterns, and distributed transaction strategies (Saga, compensating transactions).
- Experience implementing audit logging and compliance controls in regulated environments (SOX, SOC 2, GDPR) is strongly preferred.
- Core platform language and framework — must be proficient.Java 21 / Quarkus:
- Durable execution platforms — production experience with either; willingness to become Restate expert.Restate or Temporal:
- Cluster operations, Helm charts, operators, StatefulSets, persistent volumes, pod affinity/anti-affinity.Kubernetes:
- Audit tables, structured compliance logging, query optimization.PostgreSQL:
- Policy-as-code authoring and testing (or strong willingness to learn).OPA / Rego:
- Distributed tracing instrumentation, span creation, context propagation.OpenTelemetry:
- Kafka or equivalent — producing/consuming events, partitioning, consumer groups.Event Streaming:
- Understanding embedded storage (Restate's internal store) — helpful but not required. RocksDB:
- Experience with ERP systems (SAP, Oracle EBS, JD Edwards) — even basic familiarity is valuable.
- DBOS Transact experience (PostgreSQL-based durable execution — our fallback option).
- Workflow engine design — building configurable workflow definitions, visual builders, or BPM-style tools.
- AI/LLM integration — understanding agent loops, tool calling patterns, non-deterministic execution flows.
- Redis — caching patterns for permission manifests and OPA policy evaluation results.
- Grafana / Prometheus — building operational dashboards for workflow health monitoring.
- Air-gap deployment experience — self-hostable architectures without cloud dependencies.
- Company
- We dream big and innovate boldly.
- Colleagues
- We work with extraordinary people who create a culture of mutual respect and collaboration.
- Clients
- We relentlessly pursue solutions that help clients achieve their goals. Our unmatched client care is rooted in our passion for exceptional service.
- Community
- We believe in leaving the world a better place than we found it. With the Rimini Street Foundation, we’ve made positive impacts in six continents for over 425 charities.
- Nasdaq-listed under ticker symbol
- Over 2,000 team members in 23 countries
- US and international recognition for industry leadership and philanthropic efforts. See all of our awards and recognitions here: https://www.riministreet.com/company/awards/