Chad McCluskey· Support Engineer Portfolio

Portfolio · Applying for OpenAI Support Engineering

I ship production OpenAI integrations and I want to support the developers who do too.

I’m Chad McCluskey. I run Stack Consulting AI, where I build voice agents, RAG systems, and automation pipelines on top of OpenAI’s APIs for small businesses. Below are three live demos hitting real OpenAI endpoints, plus a written debugging walkthrough for a real-world Realtime API issue.

GitHub LinkedIn [email protected]

Live in production

Realtime API voice agent over FreeSWITCH

OpenAI Realtime API · function calling · WebSocket

Production AI receptionist running on a self-hosted FreeSWITCH PBX with the OpenAI Realtime API as the conversational core. Calls hit Telnyx → FreeSWITCH → my Lua dialplan handler → Realtime WebSocket session. Function calls bridge the model back to Google Calendar (booking), HubSpot (lead logging), and a SIP transfer endpoint (warm hand-off to a human).

Sub-1-second answer time. Bookings up 40% in 90 days at one client. The same stack powers the live demo on this site’s homepage — visitors enter a phone number, FreeSWITCH places an outbound call, the Realtime agent qualifies them.

Architecture

Caller
  ↓ PSTN
Telnyx (SIP trunk)
  ↓ SIP/RTP
FreeSWITCH (Lua dialplan)
  ↓ WebSocket (audio frames + JSON events)
OpenAI Realtime API · gpt-realtime
  ↓ function calls
   ├── calendar.book_slot
   ├── crm.log_lead
   └── transfer.warm(extension)

Try the live demo Product page

freeswitch/realtime_bridge.ts (excerpt)

const ws = new WebSocket(
  "wss://api.openai.com/v1/realtime" +
    "?model=gpt-realtime",
  {
    headers: {
      Authorization: `Bearer ${OPENAI_KEY}`,
      "OpenAI-Beta": "realtime=v1",
    },
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      voice: "alloy",
      tools: [
        {
          type: "function",
          name: "book_slot",
          parameters: bookingSchema,
        },
        {
          type: "function",
          name: "transfer_human",
          parameters: transferSchema,
        },
      ],
    },
  }));
});

Interactive demo

RAG knowledgebase over real Stack Consulting docs

OpenAI Embeddings · Chat Completions · cosine similarity

Ten chunks of real Stack Consulting AI service documentation are embedded with text-embedding-3-small at first request and cached in-memory across the lambda’s lifetime. Each user question is embedded, scored against the cached corpus via cosine similarity, and the top-3 chunks are stuffed into the system prompt for gpt-4o-mini.

Response panel shows retrieved sources with similarity scores, token counts split across embed/prompt/completion, and total latency. Embed cache “cold” on first request, “hit” thereafter.

→ No vector DB — in-memory cosine for 10 chunks is faster than a network round-trip
→ Source citations enforced via prompt instruction
→ Graceful 503 if OPENAI_API_KEY missing

Code: app/api/portfolio/kb-chat/route.ts + lib/portfolio/kb.ts

Interactive demo

Structured-output lead triage with simulated routing

OpenAI Chat Completions · response_format json_schema (strict)

Inbound lead message → triage to a strict JSON schema (priority, intent category, summary, suggested next step, deal-size estimate, contact-method preference, tags). Server then computes a routing fanout plan: which Slack channels, which email templates, which CRM pipeline.

Fanout is simulated — nothing actually fires. In production this same triage feeds a real GoHighLevel webhook + Slack incoming-webhook + transactional email service.

Submit a lead to see the structured output + routing plan.

Support walkthrough

How I’d debug: “Realtime sessions cut off after 90 seconds”

Customer support · debugging · technical writing

In this walkthrough

→ Customer ticket → Clarifying questions → Hypotheses, ranked → Most likely cause → Fix + code → Customer-facing reply

TICKET #SE-4471 · Priority: P2 · openai/realtime

“Hi — we built a Realtime API voice bot using the WebSocket transport. Calls work great for the first ~90 seconds, then the WebSocket closes silently. No error event, no close reason. We’re on Node 20 behind an AWS NLB. Reproduces consistently. Help?”

1. Clarifying questions before debugging

Is the 90s timing exact, or fuzzy? Exact-ish (87s, 89s, 91s) suggests an upstream idle timeout. Variable (40s–180s) suggests something else.
Are audio frames actively flowing in both directions during that 90s window? If the user goes quiet and the model isn’t speaking, the socket is technically idle from a TCP perspective.
Any logs at the NLB? AWS NLB has a tcp-idle-timeout that defaults to 350s but I’ve seen it lowered to 60s on hardened configs.
Is the WebSocket library sending pings? The ws Node package needs explicit ping intervals or the underlying TCP stays silent.

2. Hypotheses, ranked

NLB or upstream proxy idle-timeout killing the TCP connection while the WS thinks it’s alive. (most likely — 90s is a smell)
No WS keepalive pings — OpenAI’s Realtime API doesn’t require pings, but intermediate hops may.
Client-side audio buffer underrun causing the bot to send no frames; combined with #1 closes silently.
Server runtime restart (Lambda cold-warm cycles, container OOM); 90s aligns with some platform timeouts but customer is on Node 20 long-running — less likely.
OpenAI session max_response_output_tokens or built-in max session length — documented at much higher than 90s, so unlikely root cause but worth confirming.

3. Most likely cause

90 seconds is a classic NLB idle-timeout signature. The Realtime WebSocket carries audio frames continuously when the agent is speaking, but during caller silence + model thinking the socket can go quiet at the TCP layer for >the timeout window. The NLB drops the connection, the client doesn’t get a clean close because the NLB doesn’t emit one.

Two-part fix: (a) raise the NLB idle timeout to 350s+, and (b) keep the WebSocket genuinely active with periodic application-level pings.

4. Fix + code

Application-level keepalive on the Node ws client:

import WebSocket from "ws";

const ws = new WebSocket(REALTIME_URL, { headers });

let pingTimer: NodeJS.Timeout;

ws.on("open", () => {
  // 30s ping cadence stays well below any
  // sane NLB / ALB / CloudFront idle window.
  pingTimer = setInterval(() => {
    if (ws.readyState === WebSocket.OPEN) ws.ping();
  }, 30_000);
});

ws.on("pong", () => {
  // Optional: track RTT for observability.
});

ws.on("close", (code, reason) => {
  clearInterval(pingTimer);
  console.warn("realtime ws closed", { code, reason: reason.toString() });
  // Reconnect with exponential backoff if mid-call.
});

ws.on("error", (err) => {
  console.error("realtime ws error", err);
});

Plus AWS CLI for the NLB:

aws elbv2 modify-target-group-attributes \
  --target-group-arn $TG_ARN \
  --attributes \
    Key=deregistration_delay.timeout_seconds,Value=30 \
    Key=stickiness.enabled,Value=false

Also confirm connection_idle_timeout at the NLB level (default 350s, can be raised to 6000s).

5. Customer-facing reply

To: [email protected] · Subject: Re: SE-4471 · Realtime sessions closing at 90s

Hi —

90 seconds is a strong fingerprint for an idle TCP-timeout on intermediate infrastructure, not the Realtime API itself. The Realtime session has no 90-second cap, and a clean close from our side would send a close frame with a code — the silent drop you’re seeing means the connection was severed below the WebSocket layer.

Two changes that almost always resolve this:

Raise your AWS NLB connection_idle_timeout to at least 350s.
Send WebSocket pings every 30s from the client. Code sample below.

If you can grab the close code/reason from a long-form log capture and the NLB CloudWatch metrics for that target group, I can confirm we’re looking at the same root cause. Happy to jump on a 15-min call if it moves faster.

— Chad

About

Why I want to do support engineering at OpenAI.

I’ve been building on top of the OpenAI APIs since the GPT-3 days — Realtime in production, Embeddings + Chat Completions for RAG, Whisper for transcript pipelines, function calling and structured outputs for orchestration. Most of my time is spent in the messy middle: customer infra, auth handoffs, debugging long-tail failures.

Support engineering is the role where that experience actually helps people. When a developer files a ticket about a Realtime session dying at 90s, they don’t need someone reading from a script — they need someone who’s shipped that exact stack and can debug from symptoms to root cause. I want to be that person at the shop building the APIs.

Get in touch

[email protected]github.com/chadmccluskey linkedin.com/in/chadmccluskey stackconsultingai.com

Based in South Orange County, California. Open to remote with periodic on-site at OpenAI SF.