Integration Design

Integration Design Document

Project: {{PROJECT_NAME}} Integration: {{INTEGRATION_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version Date Author Changes
0.1 {{DATE}} {{AUTHOR}} Initial draft

1. Integration Overview & Context

Integration Name: {{INTEGRATION_NAME}} Type: Synchronous (REST/gRPC) | Asynchronous (Events/Queue) | Bidirectional | File-based

Business Purpose: {{WHY_THIS_INTEGRATION_EXISTS}}

Criticality: Critical | High | Medium | Low

Parties:

Party System Team Contact
Consumer (caller) {{CONSUMER_SYSTEM}} {{TEAM_A}} {{CONTACT_A}}
Provider (server) {{PROVIDER_SYSTEM}} {{TEAM_B}} {{CONTACT_B}}

2. Integration Topology Diagram

flowchart LR
    subgraph ConsumerSide["Consumer — {{CONSUMER_SYSTEM}}"]
        C_SVC[{{ConsumerService}}]
        C_CB[Circuit Breaker]
        C_RETRY[Retry Handler]
    end

    subgraph Integration["Integration Layer"]
        GW[API Gateway / Load Balancer]
        Q[Message Queue\n{{QUEUE_NAME}}]
        DLQ[Dead Letter Queue\n{{DLQ_NAME}}]
    end

    subgraph ProviderSide["Provider — {{PROVIDER_SYSTEM}}"]
        P_SVC[{{ProviderService}}]
        P_DB[(Provider DB)]
        P_WORKER[Event Worker]
    end

    C_SVC --> C_CB
    C_CB --> C_RETRY
    C_RETRY -->|HTTPS REST| GW
    GW --> P_SVC
    P_SVC --> P_DB

    P_WORKER -->|Publish| Q
    Q -->|Consume| C_SVC
    Q -->|Failed| DLQ
    DLQ -->|Alert| AlertSystem[PagerDuty]

3. Service Contracts

3.1 Integration: {{INTEGRATION_NAME_1}}

Protocol: REST/HTTPS | gRPC | GraphQL | WebSocket | AMQP Direction: {{CONSUMER}} → {{PROVIDER}} Idempotency: YES — use Idempotency-Key header | NO

Authentication

Method Details
Type Bearer JWT
Header Authorization: Bearer {{TOKEN}}
Key rotation Every {{ROTATION_PERIOD}} — coordinated via {{ROTATION_PROCESS}}
Token endpoint {{AUTH_ENDPOINT}} (if OAuth2)

Request Contract

Endpoint: {{HTTP_METHOD}} {{BASE_URL}}/{{PATH}}

Headers:

Authorization: Bearer {{JWT_OR_API_KEY}}
Content-Type: application/json
Accept: application/json
X-Request-ID: {{UUID}}
X-Idempotency-Key: {{IDEMPOTENCY_KEY}}

Request Body:

{
  "{{field1}}": "{{type}} — {{description}}",
  "{{field2}}": "{{type}} — {{description}}",
  "metadata": {
    "sourceSystem": "{{CONSUMER_SYSTEM_ID}}",
    "timestamp": "ISO8601"
  }
}

Successful Response 200 / 201:

{
  "{{responseField1}}": "{{type}}",
  "{{responseField2}}": "{{type}}",
  "requestId": "echo of X-Request-ID"
}

Error Handling

HTTP Status Error Code Consumer Action
400 VALIDATION_ERROR Log error, do NOT retry — fix request
401 UNAUTHORIZED Refresh token, retry once
403 FORBIDDEN Alert engineering, do NOT retry
404 NOT_FOUND Log, do NOT retry — check resource ID
409 CONFLICT Log, skip (idempotent)
422 BUSINESS_RULE Log error, do NOT retry — escalate
429 RATE_LIMITED Backoff per Retry-After header
500 INTERNAL_ERROR Retry with exponential backoff
502/503 UNAVAILABLE Circuit breaker — fail fast

Retry Policy

Max retries: {{MAX_RETRIES}} (retry only on 500, 502, 503, 429, network errors)
Strategy: Exponential backoff with jitter
Delays: [{{DELAY_1}}ms, {{DELAY_2}}ms, {{DELAY_3}}ms]
Timeout per attempt: {{TIMEOUT_MS}}ms

Circuit Breaker Configuration

Failure threshold: {{FAILURE_PERCENT}}% failures in {{WINDOW_SECONDS}}s window
Open duration: {{OPEN_DURATION_SECONDS}}s
Half-open test: 1 request
Alert on: Circuit open for > {{ALERT_THRESHOLD_SECONDS}}s

Rate Limiting

Limit Value Window Action when exceeded
Requests per minute {{RPM}} 60s sliding HTTP 429, Retry-After
Burst limit {{BURST}} 1s HTTP 429 immediately
Daily quota {{DAILY}} 24h HTTP 429, contact support

Timeout Configuration

Timeout Type Value Notes
Connection timeout {{CONN_TIMEOUT_MS}}ms Time to establish connection
Read timeout {{READ_TIMEOUT_MS}}ms Time to receive first byte
Total request timeout {{TOTAL_TIMEOUT_MS}}ms End-to-end budget

3.2 Integration: {{INTEGRATION_NAME_2}} (if applicable)

Protocol: gRPC Service definition:

service {{ServiceName}} {
  rpc {{MethodName}} ({{RequestMessage}}) returns ({{ResponseMessage}});
  rpc {{StreamMethodName}} ({{RequestMessage}}) returns (stream {{ResponseMessage}});
}

message {{RequestMessage}} {
  string id = 1;
  string tenant_id = 2;
  {{FieldType}} {{field_name}} = 3;
}

message {{ResponseMessage}} {
  string id = 1;
  {{FieldType}} {{field_name}} = 2;
  google.protobuf.Timestamp created_at = 3;
}

4. Event-Driven Integrations

4.1 Event Schemas (CloudEvents 1.0)

Event: {{entity}}.{{ACTION}}

Published by: {{PUBLISHER_SYSTEM}} Consumed by: {{CONSUMER_SYSTEM_1}}, {{CONSUMER_SYSTEM_2}}

{
  "specversion": "1.0",
  "type": "{{REVERSE_DNS_EVENT_TYPE}}",
  "source": "https://{{SYSTEM_DOMAIN}}/{{resource}}",
  "id": "{{UUID}}",
  "time": "2024-01-01T00:00:00Z",
  "datacontenttype": "application/json",
  "subject": "{{RESOURCE_ID}}",
  "data": {
    "entityId": "UUID of affected resource",
    "tenantId": "UUID of tenant",
    "actorId": "UUID of user who triggered event",
    "{{DOMAIN_FIELD_1}}": "domain-specific data",
    "{{DOMAIN_FIELD_2}}": "domain-specific data",
    "previousState": null,
    "newState": "{{STATE}}"
  }
}

4.2 Topics / Queues

Topic/Queue Partitions Retention Consumers Producer
{{TOPIC_NAME_1}} {{N}} {{RETENTION}} {{CONSUMER_GROUPS}} {{PRODUCER_SERVICE}}
{{TOPIC_NAME_2}} {{N}} {{RETENTION}} {{CONSUMER_GROUPS}} {{PRODUCER_SERVICE}}

4.3 Ordering Guarantees

Integration Ordering Scope Notes
{{INTEGRATION_1}} Strict order Per tenantId Kafka partition by tenantId
{{INTEGRATION_2}} Best-effort Global FIFO queue — no strict ordering
{{INTEGRATION_3}} No ordering N/A Independent events

4.4 Idempotency Strategy

For each consumed event:
1. Check processed_events table: SELECT 1 WHERE event_id = $1 AND consumer_group = $2
2. If found: log "Duplicate event skipped" and ACK (do not reprocess)
3. If not found: process event
4. On success: INSERT INTO processed_events (event_id, consumer_group, processed_at)
5. ACK message

Deduplication window: {{DEDUP_WINDOW}} (keep processed_events for this duration)

5. Data Consistency Patterns

5.1 Consistency Model

Model: Strong | Eventual | Causal Acceptable lag: {{MAX_LAG_SECONDS}}s

5.2 Saga Pattern (if used for distributed transactions)

sequenceDiagram
    autonumber
    participant O as Orchestrator
    participant S1 as {{SERVICE_1}}
    participant S2 as {{SERVICE_2}}
    participant S3 as {{SERVICE_3}}

    O->>S1: Execute Step 1
    S1-->>O: Step 1 succeeded {result1}
    O->>S2: Execute Step 2 (with result1)
    S2-->>O: Step 2 succeeded {result2}
    O->>S3: Execute Step 3 (with result2)
    S3-->>O: Step 3 FAILED

    Note over O: Compensating transactions (reverse order)
    O->>S2: Compensate Step 2
    S2-->>O: Compensated
    O->>S1: Compensate Step 1
    S1-->>O: Compensated
    O-->>Client: Transaction rolled back

Compensation strategies:

Step Compensation Notes
{{STEP_1}} {{COMPENSATION_1}} {{NOTES}}
{{STEP_2}} {{COMPENSATION_2}} {{NOTES}}

6. Integration Testing Strategy

6.1 Contract Testing (Pact)

6.2 Integration Test Environments

Environment Purpose Trigger
Local Dev testing with mocked provider Manual
Staging Full integration with staging provider Every PR merge
Production Synthetic monitoring Every 5 minutes

6.3 Test Scenarios

Happy path:

Error scenarios:


7. Monitoring & Alerting

7.1 Key Metrics

Metric Type Alert Condition Severity
integration_{{name}}_requests_total Counter
integration_{{name}}_error_rate Gauge > {{THRESHOLD}}% for 5m HIGH
integration_{{name}}_latency_p99_ms Histogram > {{THRESHOLD}}ms for 5m MEDIUM
integration_{{name}}_circuit_open Gauge == 1 CRITICAL
integration_{{name}}_dlq_depth Gauge > 0 HIGH
integration_{{name}}_consumer_lag Gauge > {{LAG_THRESHOLD}} HIGH

7.2 Distributed Tracing

7.3 Alert Routing

Condition Alert Channel Escalation
Circuit breaker open PagerDuty {{TEAM_A}} + Slack #{{CHANNEL}} On-call engineer
DLQ depth > 0 Slack #{{CHANNEL}} Investigate within 1h
Error rate > {{THRESHOLD}}% PagerDuty On-call engineer

Approval

Role Name Date Signature
Author
Consumer Team Lead
Provider Team Lead
Platform/Infra
Approver

Revision #4
Created 2026-02-24 14:52:30 UTC by John
Updated 2026-05-25 07:32:01 UTC by John