Every 45-minute meeting generates 20 minutes of note-writing. Multiply that by 5 meetings a day across a 10-person team, and you are burning 16+ hours daily on documentation that nobody reads properly anyway.
We built this workflow for our own team first, then deployed variations for three clients. Total cost per meeting: approximately ₹30-35 (∼$0.40). Compare that to Otter.ai at $16.99/month or Fireflies.ai at $19/month — and those tools do not push summaries into your CRM, create Google Docs, or trigger Slack notifications.
This tutorial walks through the complete build. You will have a working, importable n8n workflow by the end.
What You Will Learn
- Complete n8n workflow: recording → transcription → summarization → multi-output distribution
- OpenAI Whisper API configuration and the 25MB chunking gotcha
- GPT-4o structured output prompting for meeting summaries
- Cost comparison against commercial meeting note tools
- Speaker diarization workarounds (Whisper's biggest limitation)
Prerequisites
- n8n instance (cloud or self-hosted)
- OpenAI API key with Whisper and GPT-4o access
- Google Workspace account (for Docs output)
- Slack workspace (for notification output)
- Optional: HubSpot/CRM for note integration (see our HubSpot + n8n guide)
The Architecture
The workflow follows a linear pipeline with parallel outputs at the end:
Trigger → Audio Processing → Transcription (Whisper) → Summarization (GPT-4o) → Outputs (Google Docs + Slack + CRM)
Total execution time for a 45-minute recording: 3-5 minutes depending on file size and API latency.
Step 1: Set Up the Trigger
You have two options:
Option A — Google Drive monitoring (automated): Use n8n's Google Drive Trigger node. Configure it to watch a specific folder (e.g., "Meeting Recordings") and trigger on new file uploads. Set polling interval to 1 minute.
This is ideal when your recording tool (Zoom, Google Meet, or Loom) auto-uploads to Drive.
Option B — Manual upload via n8n form (on-demand): Create an n8n Form Trigger that accepts a file upload. Add fields for meeting title, attendees (comma-separated), and optional context.
We use Option B for ad-hoc recordings and Option A for regularly scheduled meetings.
Step 2: Audio Processing (The 25MB Gotcha)
This is where most implementations break. OpenAI's Whisper API has a 25MB file size limit. A 45-minute meeting recorded at standard quality is typically 30-80MB.
The fix: Add a Code node before the Whisper call that checks file size. If the file exceeds 25MB, use an Execute Command node to run FFmpeg:
ffmpeg -i input.webm -vn -acodec libmp3lame -ab 64k -ar 16000 output.mp3
This does three things:
- Strips video (
-vn) — you do not need video for transcription - Reduces bitrate to 64kbps — more than sufficient for speech
- Downsamples to 16kHz — Whisper's optimal sample rate
A 60MB webm file typically compresses to 5-8MB MP3 with this configuration. Transcription accuracy remains at 99%+ for clear speech.
Critical note: If your n8n instance is on a minimal VPS (1-2GB RAM), FFmpeg processing of large files will spike memory. Either allocate 4GB+ RAM or process in chunks using FFmpeg's segment feature.
Step 3: Transcription with OpenAI Whisper
Add an HTTP Request node configured for the Whisper API:
- Method: POST
- URL:
https://api.openai.com/v1/audio/transcriptions - Authentication: Header Auth with your OpenAI API key
- Body: Form-data with
file(binary from previous node) andmodelset towhisper-1 - Optional parameters:
language: Set explicitly for better accuracy (e.g.,enfor English)response_format:verbose_jsonfor timestamps,textfor plain texttimestamp_granularities:segmentfor paragraph-level timestamps
Cost: $0.006 per minute of audio. A 45-minute meeting costs $0.27 for transcription.
Language support: Whisper handles 98 languages with high accuracy. For our clients in India, Dubai, and Singapore, this is a significant advantage — team meetings often switch between English, Hindi, and Arabic.
The speaker diarization gap: Whisper does not natively identify who is speaking. If you need "Speaker 1 said X, Speaker 2 responded Y," you have two options:
- pyannote.audio (open-source): Run it as a pre-processing step before Whisper. Requires a GPU-enabled server.
- AssemblyAI: Offers built-in diarization at $0.01/minute. 67% more expensive than Whisper but includes speaker labels.
For most business use cases, speaker identification is nice-to-have, not essential. Action items and decisions matter more than attribution.
Step 4: Summarization with GPT-4o
This is where the value multiplies. Feed the Whisper transcript to GPT-4o with a structured output prompt:
You are a meeting notes assistant. Given the following transcript, produce a structured summary in this exact JSON format:
{
"meeting_title": "<inferred from context>",
"date": "<ISO 8601>",
"duration_minutes": <number>,
"summary": "<3-5 sentence executive summary>",
"decisions_made": ["<decision 1>", "<decision 2>"],
"action_items": [
{"task": "<description>", "owner": "<name or Unknown>", "deadline": "<if mentioned, else null>"}
],
"key_discussion_points": ["<point 1>", "<point 2>"],
"follow_up_date": "<if mentioned, else null>",
"open_questions": ["<unresolved question 1>"]
}
Transcript:
{{$json.text}}
Cost: GPT-4o processes a 45-minute transcript (~8,000 tokens input) for approximately $0.012 input + $0.04 output = $0.052.
Total cost per meeting: $0.27 (Whisper) + $0.052 (GPT-4o) = $0.32 (∼₹27)
Step 5: Multi-Output Distribution
The summarized JSON feeds into three parallel output nodes:
Output 1: Google Docs
Use n8n's Google Docs node to create a new document in a shared "Meeting Notes" folder. Format the JSON into readable markdown with headers for each section.
Output 2: Slack Notification
Post to a #meeting-notes channel with a condensed version: meeting title, executive summary, action items with owners, and a link to the full Google Doc.
Output 3: CRM Note (Optional)
If the meeting is a client call, use the attendee email to look up the HubSpot/CRM contact and create an engagement note. This integrates directly with the HubSpot automation workflow we covered.
Cost Comparison: DIY vs. Commercial Tools
| This Workflow | Otter.ai Pro | Fireflies.ai Pro | Grain | |
|---|---|---|---|---|
| Monthly cost (50 meetings) | ₹1,350 (~$16) | $16.99 | $19 | $19 |
| Monthly cost (200 meetings) | ₹5,400 (~$65) | $30 (Business) | $39 (Business) | $29 |
| Custom outputs (CRM, Slack, Docs) | Yes | No | Limited | Limited |
| Self-hosted / data privacy | Yes (n8n self-hosted) | No | No | No |
| Language support | 98 languages | ~30 | ~60 | English-focused |
| Speaker diarization | Requires add-on | Built-in | Built-in | Built-in |
The breakeven point: if your team has fewer than 50 meetings/month and does not need custom integrations, a commercial tool is simpler. If you need CRM integration, custom formatting, data sovereignty, or handle 100+ meetings/month, build your own.
Our opinionated take: if you are paying $20/month for an AI meeting notes tool and your team has fewer than 50 meetings/month, you are getting reasonable value. But the moment you need those summaries flowing into your CRM, creating Jira tickets, or triggering follow-up workflows, you need n8n-based automation.
Common Issues and Fixes
Audio quality degrades accuracy: Noisy recordings drop Whisper accuracy from 99% to 80-85%. Solutions: use a dedicated microphone, enable noise suppression in your recording tool, and consider preprocessing with FFmpeg's anlmdn noise reduction filter.
Large meetings timeout: Recordings over 2 hours should be split into 30-minute chunks using FFmpeg before sending to Whisper. Process chunks sequentially and concatenate transcripts.
GPT-4o hallucinates action items: This happens when the transcript is ambiguous. Add a validation prompt: "Only include action items that were explicitly stated as tasks, commitments, or next steps. Do not infer implied actions."
Google Drive trigger misses files: Set the polling interval to 1 minute, not the default 5 minutes. For critical workflows, use a Google Drive Push Notification (webhook) instead of polling.
Frequently Asked Questions
Written by

Founder & CEO
Rishabh Sethia is the founder and CEO of Innovatrix Infotech, a Kolkata-based digital engineering agency. He leads a team that delivers web development, mobile apps, Shopify stores, and AI automation for startups and SMBs across India and beyond.
Connect on LinkedIn