Build AI Meeting Summarizer with n8n + Whisper 2026

Every 45-minute meeting generates 20 minutes of note-writing. Multiply that by 5 meetings a day across a 10-person team, and you are burning 16+ hours daily on documentation that nobody reads properly anyway.

We built this workflow for our own team first, then deployed variations for three clients. Total cost per meeting: approximately ₹30-35 (∼$0.40). Compare that to Otter.ai at $16.99/month or Fireflies.ai at $19/month — and those tools do not push summaries into your CRM, create Google Docs, or trigger Slack notifications.

This tutorial walks through the complete build. You will have a working, importable n8n workflow by the end.

What You Will Learn

Complete n8n workflow: recording → transcription → summarization → multi-output distribution
OpenAI Whisper API configuration and the 25MB chunking gotcha
GPT-4o structured output prompting for meeting summaries
Cost comparison against commercial meeting note tools
Speaker diarization workarounds (Whisper's biggest limitation)

Prerequisites

n8n instance (cloud or self-hosted)
OpenAI API key with Whisper and GPT-4o access
Google Workspace account (for Docs output)
Slack workspace (for notification output)
Optional: HubSpot/CRM for note integration (see our HubSpot + n8n guide)

The Architecture

Free Download: AI Automation ROI Calculator

Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.

The workflow follows a linear pipeline with parallel outputs at the end:

Trigger → Audio Processing → Transcription (Whisper) → Summarization (GPT-4o) → Outputs (Google Docs + Slack + CRM)

Total execution time for a 45-minute recording: 3-5 minutes depending on file size and API latency.

Step 1: Set Up the Trigger

You have two options:

Option A — Google Drive monitoring (automated): Use n8n's Google Drive Trigger node. Configure it to watch a specific folder (e.g., "Meeting Recordings") and trigger on new file uploads. Set polling interval to 1 minute.

This is ideal when your recording tool (Zoom, Google Meet, or Loom) auto-uploads to Drive.

Option B — Manual upload via n8n form (on-demand): Create an n8n Form Trigger that accepts a file upload. Add fields for meeting title, attendees (comma-separated), and optional context.

We use Option B for ad-hoc recordings and Option A for regularly scheduled meetings.

Step 2: Audio Processing (The 25MB Gotcha)

This is where most implementations break. OpenAI's Whisper API has a 25MB file size limit. A 45-minute meeting recorded at standard quality is typically 30-80MB.

The fix: Add a Code node before the Whisper call that checks file size. If the file exceeds 25MB, use an Execute Command node to run FFmpeg:

ffmpeg -i input.webm -vn -acodec libmp3lame -ab 64k -ar 16000 output.mp3

This does three things:

Strips video (-vn) — you do not need video for transcription
Reduces bitrate to 64kbps — more than sufficient for speech
Downsamples to 16kHz — Whisper's optimal sample rate

A 60MB webm file typically compresses to 5-8MB MP3 with this configuration. Transcription accuracy remains at 99%+ for clear speech.

Critical note: If your n8n instance is on a minimal VPS (1-2GB RAM), FFmpeg processing of large files will spike memory. Either allocate 4GB+ RAM or process in chunks using FFmpeg's segment feature.

Step 3: Transcription with OpenAI Whisper

Add an HTTP Request node configured for the Whisper API:

Method: POST
URL: https://api.openai.com/v1/audio/transcriptions
Authentication: Header Auth with your OpenAI API key
Body: Form-data with file (binary from previous node) and model set to whisper-1
Optional parameters:
- language: Set explicitly for better accuracy (e.g., en for English)
- response_format: verbose_json for timestamps, text for plain text
- timestamp_granularities: segment for paragraph-level timestamps

Cost: $0.006 per minute of audio. A 45-minute meeting costs $0.27 for transcription.

Language support: Whisper handles 98 languages with high accuracy. For our clients in India, Dubai, and Singapore, this is a significant advantage — team meetings often switch between English, Hindi, and Arabic.

The speaker diarization gap: Whisper does not natively identify who is speaking. If you need "Speaker 1 said X, Speaker 2 responded Y," you have two options:

pyannote.audio (open-source): Run it as a pre-processing step before Whisper. Requires a GPU-enabled server.
AssemblyAI: Offers built-in diarization at $0.01/minute. 67% more expensive than Whisper but includes speaker labels.

For most business use cases, speaker identification is nice-to-have, not essential. Action items and decisions matter more than attribution.

Step 4: Summarization with GPT-4o

This is where the value multiplies. Feed the Whisper transcript to GPT-4o with a structured output prompt:

You are a meeting notes assistant. Given the following transcript, produce a structured summary in this exact JSON format:

{
  "meeting_title": "<inferred from context>",
  "date": "<ISO 8601>",
  "duration_minutes": <number>,
  "summary": "<3-5 sentence executive summary>",
  "decisions_made": ["<decision 1>", "<decision 2>"],
  "action_items": [
    {"task": "<description>", "owner": "<name or Unknown>", "deadline": "<if mentioned, else null>"}
  ],
  "key_discussion_points": ["<point 1>", "<point 2>"],
  "follow_up_date": "<if mentioned, else null>",
  "open_questions": ["<unresolved question 1>"]
}

Transcript:
{{$json.text}}

Cost: GPT-4o processes a 45-minute transcript (~8,000 tokens input) for approximately $0.012 input + $0.04 output = $0.052.

Total cost per meeting: $0.27 (Whisper) + $0.052 (GPT-4o) = $0.32 (∼₹27)

Step 5: Multi-Output Distribution

The summarized JSON feeds into three parallel output nodes:

Output 1: Google Docs

Use n8n's Google Docs node to create a new document in a shared "Meeting Notes" folder. Format the JSON into readable markdown with headers for each section.

Output 2: Slack Notification

Post to a #meeting-notes channel with a condensed version: meeting title, executive summary, action items with owners, and a link to the full Google Doc.

Output 3: CRM Note (Optional)

If the meeting is a client call, use the attendee email to look up the HubSpot/CRM contact and create an engagement note. This integrates directly with the HubSpot automation workflow we covered.

Cost Comparison: DIY vs. Commercial Tools

	This Workflow	Otter.ai Pro	Fireflies.ai Pro	Grain
Monthly cost (50 meetings)	₹1,350 (~$16)	$16.99	$19	$19
Monthly cost (200 meetings)	₹5,400 (~$65)	$30 (Business)	$39 (Business)	$29
Custom outputs (CRM, Slack, Docs)	Yes	No	Limited	Limited
Self-hosted / data privacy	Yes (n8n self-hosted)	No	No	No
Language support	98 languages	~30	~60	English-focused
Speaker diarization	Requires add-on	Built-in	Built-in	Built-in

The breakeven point: if your team has fewer than 50 meetings/month and does not need custom integrations, a commercial tool is simpler. If you need CRM integration, custom formatting, data sovereignty, or handle 100+ meetings/month, build your own.

Our opinionated take: if you are paying $20/month for an AI meeting notes tool and your team has fewer than 50 meetings/month, you are getting reasonable value. But the moment you need those summaries flowing into your CRM, creating Jira tickets, or triggering follow-up workflows, you need n8n-based automation.

Common Issues and Fixes

Audio quality degrades accuracy: Noisy recordings drop Whisper accuracy from 99% to 80-85%. Solutions: use a dedicated microphone, enable noise suppression in your recording tool, and consider preprocessing with FFmpeg's anlmdn noise reduction filter.

Large meetings timeout: Recordings over 2 hours should be split into 30-minute chunks using FFmpeg before sending to Whisper. Process chunks sequentially and concatenate transcripts.

GPT-4o hallucinates action items: This happens when the transcript is ambiguous. Add a validation prompt: "Only include action items that were explicitly stated as tasks, commitments, or next steps. Do not infer implied actions."

Google Drive trigger misses files: Set the polling interval to 1 minute, not the default 5 minutes. For critical workflows, use a Google Drive Push Notification (webhook) instead of polling.

Free Download: AI Automation ROI Calculator

Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.

Frequently Asked Questions

Written by

Rishabh Sethia

Founder & CEO

Rishabh Sethia is the founder and CEO of Innovatrix Infotech, a Kolkata-based digital engineering agency. He leads a team that delivers web development, mobile apps, Shopify stores, and AI automation for startups and SMBs across India and beyond.

Connect on LinkedIn

Back to all posts

What Is Headless Commerce? A Plain-English Guide for Business Owners

10 min read Next

How to Calculate the ROI of Workflow Automation (With Real Numbers)

13 min read

This tutorial walks through the complete build. You will have a working, importable n8n workflow by the end.

What You Will Learn

Complete n8n workflow: recording → transcription → summarization → multi-output distribution
OpenAI Whisper API configuration and the 25MB chunking gotcha
GPT-4o structured output prompting for meeting summaries
Cost comparison against commercial meeting note tools
Speaker diarization workarounds (Whisper's biggest limitation)

Prerequisites

n8n instance (cloud or self-hosted)
OpenAI API key with Whisper and GPT-4o access
Google Workspace account (for Docs output)
Slack workspace (for notification output)
Optional: HubSpot/CRM for note integration (see our HubSpot + n8n guide)

The Architecture

Free Download: AI Automation ROI Calculator

Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.

The workflow follows a linear pipeline with parallel outputs at the end:

Trigger → Audio Processing → Transcription (Whisper) → Summarization (GPT-4o) → Outputs (Google Docs + Slack + CRM)

Total execution time for a 45-minute recording: 3-5 minutes depending on file size and API latency.

Step 1: Set Up the Trigger

You have two options:

This is ideal when your recording tool (Zoom, Google Meet, or Loom) auto-uploads to Drive.

Option B — Manual upload via n8n form (on-demand): Create an n8n Form Trigger that accepts a file upload. Add fields for meeting title, attendees (comma-separated), and optional context.

We use Option B for ad-hoc recordings and Option A for regularly scheduled meetings.

Step 2: Audio Processing (The 25MB Gotcha)

This is where most implementations break. OpenAI's Whisper API has a 25MB file size limit. A 45-minute meeting recorded at standard quality is typically 30-80MB.

The fix: Add a Code node before the Whisper call that checks file size. If the file exceeds 25MB, use an Execute Command node to run FFmpeg:

ffmpeg -i input.webm -vn -acodec libmp3lame -ab 64k -ar 16000 output.mp3

This does three things:

Strips video (-vn) — you do not need video for transcription
Reduces bitrate to 64kbps — more than sufficient for speech
Downsamples to 16kHz — Whisper's optimal sample rate

A 60MB webm file typically compresses to 5-8MB MP3 with this configuration. Transcription accuracy remains at 99%+ for clear speech.

Step 3: Transcription with OpenAI Whisper

Add an HTTP Request node configured for the Whisper API:

Method: POST
URL: https://api.openai.com/v1/audio/transcriptions
Authentication: Header Auth with your OpenAI API key
Body: Form-data with file (binary from previous node) and model set to whisper-1
Optional parameters:
- language: Set explicitly for better accuracy (e.g., en for English)
- response_format: verbose_json for timestamps, text for plain text
- timestamp_granularities: segment for paragraph-level timestamps

Cost: $0.006 per minute of audio. A 45-minute meeting costs $0.27 for transcription.

The speaker diarization gap: Whisper does not natively identify who is speaking. If you need "Speaker 1 said X, Speaker 2 responded Y," you have two options:

pyannote.audio (open-source): Run it as a pre-processing step before Whisper. Requires a GPU-enabled server.
AssemblyAI: Offers built-in diarization at $0.01/minute. 67% more expensive than Whisper but includes speaker labels.

For most business use cases, speaker identification is nice-to-have, not essential. Action items and decisions matter more than attribution.

Step 4: Summarization with GPT-4o

This is where the value multiplies. Feed the Whisper transcript to GPT-4o with a structured output prompt:

You are a meeting notes assistant. Given the following transcript, produce a structured summary in this exact JSON format:

{
  "meeting_title": "<inferred from context>",
  "date": "<ISO 8601>",
  "duration_minutes": <number>,
  "summary": "<3-5 sentence executive summary>",
  "decisions_made": ["<decision 1>", "<decision 2>"],
  "action_items": [
    {"task": "<description>", "owner": "<name or Unknown>", "deadline": "<if mentioned, else null>"}
  ],
  "key_discussion_points": ["<point 1>", "<point 2>"],
  "follow_up_date": "<if mentioned, else null>",
  "open_questions": ["<unresolved question 1>"]
}

Transcript:
{{$json.text}}

Cost: GPT-4o processes a 45-minute transcript (~8,000 tokens input) for approximately $0.012 input + $0.04 output = $0.052.

Total cost per meeting: $0.27 (Whisper) + $0.052 (GPT-4o) = $0.32 (∼₹27)

Step 5: Multi-Output Distribution

The summarized JSON feeds into three parallel output nodes:

Output 1: Google Docs

Use n8n's Google Docs node to create a new document in a shared "Meeting Notes" folder. Format the JSON into readable markdown with headers for each section.

Output 2: Slack Notification

Post to a #meeting-notes channel with a condensed version: meeting title, executive summary, action items with owners, and a link to the full Google Doc.

Output 3: CRM Note (Optional)

If the meeting is a client call, use the attendee email to look up the HubSpot/CRM contact and create an engagement note. This integrates directly with the HubSpot automation workflow we covered.

Cost Comparison: DIY vs. Commercial Tools

	This Workflow	Otter.ai Pro	Fireflies.ai Pro	Grain
Monthly cost (50 meetings)	₹1,350 (~$16)	$16.99	$19	$19
Monthly cost (200 meetings)	₹5,400 (~$65)	$30 (Business)	$39 (Business)	$29
Custom outputs (CRM, Slack, Docs)	Yes	No	Limited	Limited
Self-hosted / data privacy	Yes (n8n self-hosted)	No	No	No
Language support	98 languages	~30	~60	English-focused
Speaker diarization	Requires add-on	Built-in	Built-in	Built-in

Common Issues and Fixes

Large meetings timeout: Recordings over 2 hours should be split into 30-minute chunks using FFmpeg before sending to Whisper. Process chunks sequentially and concatenate transcripts.

Google Drive trigger misses files: Set the polling interval to 1 minute, not the default 5 minutes. For critical workflows, use a Google Drive Push Notification (webhook) instead of polling.

Free Download: AI Automation ROI Calculator

Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.

Frequently Asked Questions

Written by

Rishabh Sethia

Founder & CEO

Connect on LinkedIn

Back to all posts

What Is Headless Commerce? A Plain-English Guide for Business Owners

10 min read Next

How to Calculate the ROI of Workflow Automation (With Real Numbers)

13 min read

Build an AI Meeting Summarizer with n8n and Whisper in 2026 (Step-by-Step)

What You Will Learn

Prerequisites

The Architecture

Free Download: AI Automation ROI Calculator

Step 1: Set Up the Trigger

Step 2: Audio Processing (The 25MB Gotcha)

Step 3: Transcription with OpenAI Whisper

Step 4: Summarization with GPT-4o

Step 5: Multi-Output Distribution

Output 1: Google Docs

Output 2: Slack Notification

Output 3: CRM Note (Optional)

Cost Comparison: DIY vs. Commercial Tools

Common Issues and Fixes

Free Download: AI Automation ROI Calculator

Frequently Asked Questions

Related Articles

Ready to talk about your project?

Build an AI Meeting Summarizer with n8n and Whisper in 2026 (Step-by-Step)

What You Will Learn

Prerequisites

The Architecture

Free Download: AI Automation ROI Calculator

Step 1: Set Up the Trigger

Step 2: Audio Processing (The 25MB Gotcha)

Step 3: Transcription with OpenAI Whisper

Step 4: Summarization with GPT-4o

Step 5: Multi-Output Distribution

Output 1: Google Docs

Output 2: Slack Notification

Output 3: CRM Note (Optional)

Cost Comparison: DIY vs. Commercial Tools

Common Issues and Fixes

Free Download: AI Automation ROI Calculator

Frequently Asked Questions

Related Articles

Ready to talk about your project?