Audio Processing

Telehealth Platform Integration

Capture audio from telehealth sessions and automatically generate SOAP notes. Covers audio capture, upload, and delivery patterns.

Updated March 19, 2026

Note: Clinical accuracy disclaimer: All generated notes must be reviewed by a licensed healthcare provider before use in patient care. Example outputs in this guide are illustrative.

Architecture overview

Telehealth platforms already capture the most valuable input for clinical documentation: the audio of the provider-patient encounter. SOAPNoteAPI turns that recording into a structured SOAP note without building your own transcription or NLP pipeline. The integration flow is: video call ends -> extract audio recording -> upload to SOAPNoteAPI -> receive structured SOAP note -> present to provider for review.

1. Provider and patient have a video call on your platform.
2. Your platform records the session audio (via your video provider API or browser MediaRecorder).
3. After the call ends, upload the audio to SOAPNoteAPI via PUT /v1/note/audio.
4. SOAPNoteAPI transcribes the audio and generates a SOAP note.
5. Receive the completed note via webhook or polling.
6. Display the note to the provider for review and approval.
7. Provider edits if needed, then files the note to the patient chart.

Audio capture strategies

How you capture the audio depends on your video infrastructure. There are two primary approaches: using your video provider recording API, or capturing audio in the browser with MediaRecorder.

Video provider recording APIs

Most video providers offer server-side recording that produces a single audio/video file after the call ends. This is the recommended approach because it captures both speakers reliably and does not depend on browser capabilities.

Twilio Video -- Use the Composition API to generate audio-only recordings. Set audioSources to capture all participants. The recording is available as a download URL.
Daily.co -- Enable cloud recording. After the session ends, use the recordings API to get the download URL. Supports audio-only extraction.
Zoom -- Use the cloud recording feature. Audio-only files are available via the Zoom API after processing. Webhook events notify when recordings are ready.
Vonage (OpenTok) -- Use the archiving API with outputMode: "composed" for a single mixed audio file.
Custom WebRTC -- If you run your own SRTP/WebRTC stack, use a media server (Janus, mediasoup) to record the mixed audio stream to a file.

Browser MediaRecorder API

If your video provider does not offer server-side recording, you can capture audio directly in the browser. This approach works with any video solution but requires the provider (not the patient) to have a modern browser.

JavaScript

// Capture audio from the call in the browser
// Assumes you have a MediaStream from your video call
const audioStream = new MediaStream(
  callStream.getAudioTracks() // Extract audio tracks from the video call stream
);

const recorder = new MediaRecorder(audioStream, {
  mimeType: "audio/webm;codecs=opus",
});

const chunks = [];
recorder.ondataavailable = (e) => chunks.push(e.data);

recorder.onstop = async () => {
  const audioBlob = new Blob(chunks, { type: "audio/webm" });

  // Single-request upload via PUT /v1/note/audio
  const formData = new FormData();
  formData.append("audio", audioBlob, "telehealth-session.webm");
  formData.append("metadata", JSON.stringify({
    specialty: "nurse_practitioner",
    context: {
      patient_info: { name: "Jane Doe", age: 45 },
      patient_history: "Follow-up for hypertension management.",
    },
  }));

  const response = await fetch("https://api.soapnoteapi.com/v1/note/audio", {
    method: "PUT",
    headers: { "Authorization": "Bearer YOUR_API_KEY" },
    body: formData,
  });

  const { noteId } = await response.json();
  console.log("Audio uploaded. Note ID:", noteId);
  // Poll GET /v1/audio/status/:noteId or receive via webhook
};

// Start recording when the call begins
recorder.start();

// Stop recording when the call ends
// recorder.stop();

Warning: MediaRecorder captures audio from the browser tab. Both the provider and patient audio must be part of the MediaStream. If you are using separate peer connections, mix the audio tracks before recording.

Audio format and quality recommendations

Format: MP3, M4A, WAV, OGG, WEBM, FLAC, or MP4. For telehealth, WEBM (from MediaRecorder) or M4A (from video provider APIs) are most common.
Sample rate: 16 kHz or higher. Most video providers record at 48 kHz, which is excellent.
Channels: Mono or stereo. SOAPNoteAPI handles both. Stereo can help with speaker separation.
Bitrate: 64 kbps or higher for speech. Lower bitrates degrade transcription accuracy.
Duration: Telehealth sessions typically run 15-60 minutes, well within the 4-hour maximum.

Multi-speaker handling

Telehealth recordings contain both the provider and the patient speaking. SOAPNoteAPI Whisper-based transcription handles multi-speaker audio natively. The AI model identifies clinical content from both speakers and organizes it into the correct SOAP sections -- patient-reported symptoms go to Subjective, provider-observed findings go to Objective.

Mixed audio works well. A single recording with both speakers produces accurate notes because the transcript captures the full clinical conversation.
Separate tracks are not required. You do not need to label or separate provider vs. patient audio before uploading.
Background noise from telehealth (notifications, typing, children) is handled by Whisper transcription. Mild background noise does not significantly impact note quality.

Post-call workflow

The recommended production workflow uses webhooks to avoid polling. Your backend receives the completed note and stores it for provider review.

Backend integration (Node.js)

JavaScript

// After the telehealth call ends, your backend:

// 1. Download the recording from your video provider
const recordingUrl = await getRecordingUrl(callId); // Your video provider SDK
const audioBuffer = await fetch(recordingUrl).then(r => r.arrayBuffer());

// 2. Upload to SOAPNoteAPI (single request via PUT /v1/note/audio)
const formData = new FormData();
formData.append("audio", new Blob([audioBuffer], { type: "audio/mpeg" }), `call-${callId}.mp3`);
formData.append("metadata", JSON.stringify({
  specialty: encounter.providerSpecialty,
  context: {
    patient_info: {
      name: encounter.patientName,
      age: encounter.patientAge,
      gender: encounter.patientGender,
      medications: encounter.currentMedications,
      allergies: encounter.knownAllergies,
    },
    patient_history: encounter.priorVisitSummary,
  },
  include_billing_codes: true,
  include_patient_summary: true,
}));

const response = await fetch("https://api.soapnoteapi.com/v1/note/audio", {
  method: "PUT",
  headers: { "Authorization": `Bearer ${process.env.SOAPNOTEAPI_KEY}` },
  body: formData,
});

const { noteId } = await response.json();

// 3. Store the noteId with the encounter record
await db.encounters.update(callId, { soapnoteNoteId: noteId, status: "processing" });

// 4. The webhook handler (separate endpoint) receives the completed note
// See the Webhook Integration guide for the handler implementation

Python backend equivalent

Python

import os
import json
import requests

api_key = os.environ["SOAPNOTEAPI_KEY"]

# 1. Download recording from your video provider
recording_bytes = download_recording(call_id)  # Your video provider SDK

# 2. Upload to SOAPNoteAPI (single request via PUT /v1/note/audio)
metadata = json.dumps({
    "specialty": encounter["provider_specialty"],
    "context": {
        "patient_info": {
            "name": encounter["patient_name"],
            "age": encounter["patient_age"],
        },
        "patient_history": encounter["prior_visit_summary"],
    },
    "include_billing_codes": True,
})

response = requests.put(
    "https://api.soapnoteapi.com/v1/note/audio",
    headers={"Authorization": f"Bearer {api_key}"},
    files={
        "audio": (f"call-{call_id}.mp3", recording_bytes, "audio/mpeg"),
        "metadata": (None, metadata, "application/json"),
    },
)

note_id = response.json()["noteId"]

# 3. Store noteId with encounter
db.encounters.update(call_id, soap_note_id=note_id, status="processing")

# 4. Webhook handler receives the result (see Webhook Integration guide)

Real-time vs. post-call processing

Post-call (recommended): Upload the full recording after the call ends. Simpler to implement, produces better notes because the full context is available, and does not require real-time audio streaming. The note is ready within 1-2 minutes of upload.
Real-time: Currently, SOAPNoteAPI does not support live audio streaming during a call. Upload the complete recording after the session ends.
Hybrid: For long sessions (60+ minutes), you can split the audio into segments and upload each segment. However, a single full-session recording generally produces better note quality because the AI has full context.

Warning: Recording consent is your responsibility. SOAPNoteAPI processes audio you upload -- it does not manage consent workflows. Ensure your platform complies with applicable recording consent laws before sending audio to SOAPNoteAPI.

Inform patients that the session will be recorded for clinical documentation purposes.
Obtain and document consent before recording begins. Many jurisdictions require two-party consent for recording.
Display a clear recording indicator during the telehealth session.
Provide patients the option to decline recording. Have a fallback workflow (provider types a transcript manually or uses shorthand notes).
Check your jurisdiction. Recording consent laws vary by state and country. Consult your legal team.

HIPAA considerations for telehealth audio

Audio uploads are encrypted in transit (TLS 1.2+) and at rest (AWS KMS).
SOAPNoteAPI does not store raw audio files permanently. Audio is processed and deleted after transcription.
Generated notes follow the standard data retention policy (see the HIPAA Compliance guide).
Ensure your BAA with SOAPNoteAPI is executed before sending patient audio. Contact support@soapnoteapi.com to request a BAA.

Generate Notes from Audio Recordings

Need help? Contact support@soapnoteapi.com