Skip to main content
Portfolio · Applying for OpenAI Support Engineering

I ship production OpenAI integrations and I want to support the developers who do too.

I’m Chad McCluskey. I run Stack Consulting AI, where I build voice agents, RAG systems, and automation pipelines on top of OpenAI’s APIs for small businesses. Below are three live demos hitting real OpenAI endpoints, plus a written debugging walkthrough for a real-world Realtime API issue.

01
Live in production

Realtime API voice agent over FreeSWITCH

OpenAI Realtime API · function calling · WebSocket

Production AI receptionist running on a self-hosted FreeSWITCH PBX with the OpenAI Realtime API as the conversational core. Calls hit Telnyx → FreeSWITCH → my Lua dialplan handler → Realtime WebSocket session. Function calls bridge the model back to Google Calendar (booking), HubSpot (lead logging), and a SIP transfer endpoint (warm hand-off to a human).

Sub-1-second answer time. Bookings up 40% in 90 days at one client. The same stack powers the live demo on this site’s homepage — visitors enter a phone number, FreeSWITCH places an outbound call, the Realtime agent qualifies them.

Architecture
Caller
  ↓ PSTN
Telnyx (SIP trunk)
  ↓ SIP/RTP
FreeSWITCH (Lua dialplan)
  ↓ WebSocket (audio frames + JSON events)
OpenAI Realtime API · gpt-realtime
  ↓ function calls
   ├── calendar.book_slot
   ├── crm.log_lead
   └── transfer.warm(extension)
freeswitch/realtime_bridge.ts (excerpt)
const ws = new WebSocket(
  "wss://api.openai.com/v1/realtime" +
    "?model=gpt-realtime",
  {
    headers: {
      Authorization: `Bearer ${OPENAI_KEY}`,
      "OpenAI-Beta": "realtime=v1",
    },
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      voice: "alloy",
      tools: [
        {
          type: "function",
          name: "book_slot",
          parameters: bookingSchema,
        },
        {
          type: "function",
          name: "transfer_human",
          parameters: transferSchema,
        },
      ],
    },
  }));
});
02
Interactive demo

RAG knowledgebase over real Stack Consulting docs

OpenAI Embeddings · Chat Completions · cosine similarity

Ten chunks of real Stack Consulting AI service documentation are embedded with text-embedding-3-small at first request and cached in-memory across the lambda’s lifetime. Each user question is embedded, scored against the cached corpus via cosine similarity, and the top-3 chunks are stuffed into the system prompt for gpt-4o-mini.

Response panel shows retrieved sources with similarity scores, token counts split across embed/prompt/completion, and total latency. Embed cache “cold” on first request, “hit” thereafter.

  • No vector DB — in-memory cosine for 10 chunks is faster than a network round-trip
  • Source citations enforced via prompt instruction
  • Graceful 503 if OPENAI_API_KEY missing

Code: app/api/portfolio/kb-chat/route.ts + lib/portfolio/kb.ts

03
Interactive demo

Structured-output lead triage with simulated routing

OpenAI Chat Completions · response_format json_schema (strict)

Inbound lead message → triage to a strict JSON schema (priority, intent category, summary, suggested next step, deal-size estimate, contact-method preference, tags). Server then computes a routing fanout plan: which Slack channels, which email templates, which CRM pipeline.

Fanout is simulated — nothing actually fires. In production this same triage feeds a real GoHighLevel webhook + Slack incoming-webhook + transactional email service.

0/1000
Submit a lead to see the structured output + routing plan.
04
Support walkthrough

How I’d debug: “Realtime sessions cut off after 90 seconds”

Customer support · debugging · technical writing

TICKET #SE-4471 · Priority: P2 · openai/realtime

“Hi — we built a Realtime API voice bot using the WebSocket transport. Calls work great for the first ~90 seconds, then the WebSocket closes silently. No error event, no close reason. We’re on Node 20 behind an AWS NLB. Reproduces consistently. Help?”

1. Clarifying questions before debugging

  • Is the 90s timing exact, or fuzzy? Exact-ish (87s, 89s, 91s) suggests an upstream idle timeout. Variable (40s–180s) suggests something else.
  • Are audio frames actively flowing in both directions during that 90s window? If the user goes quiet and the model isn’t speaking, the socket is technically idle from a TCP perspective.
  • Any logs at the NLB? AWS NLB has a tcp-idle-timeout that defaults to 350s but I’ve seen it lowered to 60s on hardened configs.
  • Is the WebSocket library sending pings? The ws Node package needs explicit ping intervals or the underlying TCP stays silent.

2. Hypotheses, ranked

  1. NLB or upstream proxy idle-timeout killing the TCP connection while the WS thinks it’s alive. (most likely — 90s is a smell)
  2. No WS keepalive pings — OpenAI’s Realtime API doesn’t require pings, but intermediate hops may.
  3. Client-side audio buffer underrun causing the bot to send no frames; combined with #1 closes silently.
  4. Server runtime restart (Lambda cold-warm cycles, container OOM); 90s aligns with some platform timeouts but customer is on Node 20 long-running — less likely.
  5. OpenAI session max_response_output_tokens or built-in max session length — documented at much higher than 90s, so unlikely root cause but worth confirming.

3. Most likely cause

90 seconds is a classic NLB idle-timeout signature. The Realtime WebSocket carries audio frames continuously when the agent is speaking, but during caller silence + model thinking the socket can go quiet at the TCP layer for >the timeout window. The NLB drops the connection, the client doesn’t get a clean close because the NLB doesn’t emit one.

Two-part fix: (a) raise the NLB idle timeout to 350s+, and (b) keep the WebSocket genuinely active with periodic application-level pings.

4. Fix + code

Application-level keepalive on the Node ws client:

import WebSocket from "ws";

const ws = new WebSocket(REALTIME_URL, { headers });

let pingTimer: NodeJS.Timeout;

ws.on("open", () => {
  // 30s ping cadence stays well below any
  // sane NLB / ALB / CloudFront idle window.
  pingTimer = setInterval(() => {
    if (ws.readyState === WebSocket.OPEN) ws.ping();
  }, 30_000);
});

ws.on("pong", () => {
  // Optional: track RTT for observability.
});

ws.on("close", (code, reason) => {
  clearInterval(pingTimer);
  console.warn("realtime ws closed", { code, reason: reason.toString() });
  // Reconnect with exponential backoff if mid-call.
});

ws.on("error", (err) => {
  console.error("realtime ws error", err);
});

Plus AWS CLI for the NLB:

aws elbv2 modify-target-group-attributes \
  --target-group-arn $TG_ARN \
  --attributes \
    Key=deregistration_delay.timeout_seconds,Value=30 \
    Key=stickiness.enabled,Value=false

Also confirm connection_idle_timeout at the NLB level (default 350s, can be raised to 6000s).

5. Customer-facing reply

To: [email protected] · Subject: Re: SE-4471 · Realtime sessions closing at 90s

Hi —

90 seconds is a strong fingerprint for an idle TCP-timeout on intermediate infrastructure, not the Realtime API itself. The Realtime session has no 90-second cap, and a clean close from our side would send a close frame with a code — the silent drop you’re seeing means the connection was severed below the WebSocket layer.

Two changes that almost always resolve this:

  1. Raise your AWS NLB connection_idle_timeout to at least 350s.
  2. Send WebSocket pings every 30s from the client. Code sample below.

If you can grab the close code/reason from a long-form log capture and the NLB CloudWatch metrics for that target group, I can confirm we’re looking at the same root cause. Happy to jump on a 15-min call if it moves faster.

— Chad

About

Why I want to do support engineering at OpenAI.

I’ve been building on top of the OpenAI APIs since the GPT-3 days — Realtime in production, Embeddings + Chat Completions for RAG, Whisper for transcript pipelines, function calling and structured outputs for orchestration. Most of my time is spent in the messy middle: customer infra, auth handoffs, debugging long-tail failures.

Support engineering is the role where that experience actually helps people. When a developer files a ticket about a Realtime session dying at 90s, they don’t need someone reading from a script — they need someone who’s shipped that exact stack and can debug from symptoms to root cause. I want to be that person at the shop building the APIs.

Get in touch

Based in South Orange County, California. Open to remote with periodic on-site at OpenAI SF.