Someone told you to "fix your sitemap." Or an SEO audit flagged your robots.txt configuration. Or you've been running your Shopify store for two years and never once looked at either of these files.
All three scenarios are more common than you'd think.
Shopify handles sitemaps and robots.txt automatically, which is both a blessing and a curse. A blessing because you don't need to generate them manually. A curse because most store owners assume "automatic" means "correct" — and it doesn't always.
This guide covers what Shopify actually generates, what you should verify, and how to customise robots.txt for better crawl control.
Understanding Shopify's Auto-Generated Sitemap
Every Shopify store has a sitemap at yourstore.com/sitemap.xml. This is a sitemap index — it doesn't contain your pages directly. Instead, it points to child sitemaps:
<sitemap>
<loc>https://yourstore.com/sitemap_products_1.xml</loc>
</sitemap>
<sitemap>
<loc>https://yourstore.com/sitemap_pages_1.xml</loc>
</sitemap>
<sitemap>
<loc>https://yourstore.com/sitemap_collections_1.xml</loc>
</sitemap>
<sitemap>
<loc>https://yourstore.com/sitemap_blogs_1.xml</loc>
</sitemap>
Each child sitemap contains up to 5,000 URLs. If you have more than 5,000 products, Shopify creates sitemap_products_2.xml, and so on.
What gets included automatically:
- All published products
- All published pages
- All published collections
- All published blog posts
- Your homepage
What gets excluded automatically:
- Draft products (status: draft)
- Password-protected pages
- Pages with the
noindextag - Checkout pages
- Cart pages
- Account pages (login, register, order history)
This is generally correct behaviour. You don't want draft products or checkout pages in your sitemap.
Verifying Your Sitemap Is Correct
Open yourstore.com/sitemap.xml in a browser. Then spot-check each child sitemap:
Products: Open
sitemap_products_1.xml. Are all your published products listed? Are any draft or archived products accidentally appearing? Count the URLs and compare against your product count in Shopify admin.Collections: Open
sitemap_collections_1.xml. Shopify includes all published collections. Check for collections you might not want indexed — internal collections used for automations, test collections, or duplicate collections created by apps.Pages: Open
sitemap_pages_1.xml. Verify your important pages (About, Contact, FAQ, Policy pages) are all present.Blogs: Open
sitemap_blogs_1.xml. Every published blog post should appear here. If you've been publishing blog content and it's not in the sitemap, check if the blog section is set to published in Shopify admin.
Common issue: some Shopify apps create hidden pages or collections that end up in your sitemap. If you see URLs you don't recognise, investigate whether they're from an installed app.
Submitting Your Sitemap to Google Search Console
If you haven't done this, do it now. It takes 30 seconds.
- Go to Google Search Console
- Select your property (your Shopify store domain)
- Navigate to Sitemaps in the left sidebar
- Enter
sitemap.xmlin the "Add a new sitemap" field - Click Submit
Google will crawl your sitemap and report back on how many URLs were discovered, how many were indexed, and any errors.
For Bing: the process is identical in Bing Webmaster Tools. Submit the same sitemap URL.
Check back in a week. If there's a large gap between "discovered" and "indexed" URLs, that signals potential quality issues with some of your pages — thin content, duplicate content, or crawl errors.
Shopify's robots.txt: What Most Developers Don't Know
Since Shopify 2.0 (Online Store 2.0 themes), the robots.txt file is customisable through a Liquid template. This was a significant change that most developers and store owners still don't know about.
Your default Shopify robots.txt is at yourstore.com/robots.txt. It looks something like this:
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /carts
Disallow: /account
Disallow: /*?*variant=
Disallow: /*?*q=
Disallow: /*?*sort_by=
Disallow: /*?*filter.*=
Allow: /collections/*+*
Sitemap: https://yourstore.com/sitemap.xml
These defaults are sensible. Admin pages, cart, checkout, and account pages should be blocked from crawlers. The query parameter disallows prevent Google from crawling infinite variations of filtered and sorted collection pages.
Customising robots.txt with robots.txt.liquid
Here's the part most guides skip. In your Shopify theme files, you can create or edit the robots.txt.liquid template.
In your theme editor: go to Online Store → Themes → Edit code → Templates → look for robots.txt.liquid.
If it doesn't exist, create it. Shopify provides a robots Liquid object that outputs the default rules. Start with the defaults and add your custom rules:
{% comment %}
Output Shopify's default robots.txt rules
{% endcomment %}
{{ robots.default_content }}
{% comment %}
Custom rules below
{% endcomment %}
# Block internal search results pages
User-agent: *
Disallow: /search
Disallow: /search?
# Block tagged collection pages (common source of duplicate content)
User-agent: *
Disallow: /collections/*+*
# Block specific pages you don't want indexed
User-agent: *
Disallow: /pages/test-page
Disallow: /pages/old-landing-page
When to Customise robots.txt
Customise when:
- Internal search pages are being indexed: Shopify's
/searchresults pages create thin content pages for every search query. Block them. - Tagged collection URLs create duplicate content: URLs like
/collections/t-shirts/blue+cottonare auto-generated by Shopify's tag filtering. These create near-duplicate versions of your collection pages. - App-generated pages you don't want indexed: Some Shopify apps create public-facing pages (wishlists, comparison tools) that shouldn't be in Google's index.
- Temporary landing pages: If you create campaign-specific pages that should live on your store but not rank in search, disallow them.
Do not customise to:
- Block your entire
/collections/directory (you want collections indexed) - Block legitimate product pages
- Try to remove pages from Google's index (robots.txt prevents crawling, not indexing — use a
noindexmeta tag for that)
Shopify's Canonical Tag Implementation
Shopify automatically adds canonical tags to prevent duplicate content issues. For most pages, this works correctly:
<link rel="canonical" href="https://yourstore.com/products/blue-tshirt">
But there are edge cases where Shopify's canonical implementation breaks:
Variant URLs: When a customer selects a variant, the URL changes to /products/blue-tshirt?variant=12345678. Shopify correctly canonicalises this back to the base product URL. But some themes or apps override this behaviour.
Collection-prefixed product URLs: Shopify creates URLs like /collections/summer/products/blue-tshirt when a product is accessed from a collection page. The canonical tag should point to /products/blue-tshirt. Verify this is working — some themes break it.
Paginated collection pages: /collections/all?page=2 should canonical back to /collections/all. Check this on collections with pagination.
To check canonical tags: view page source on any page and search for rel="canonical". The URL should always be the cleanest version of that page without query parameters or collection prefixes.
The 5 Most Common Shopify SEO Configuration Errors
These come up in almost every technical SEO audit we run on Shopify stores:
1. Sitemap never submitted to Search Console. The store has been live for a year and nobody has submitted the sitemap. Google will eventually find your pages, but submission speeds up discovery and gives you visibility into indexing issues.
2. Duplicate title tags and meta descriptions. Shopify auto-generates SEO titles from product names, but if you haven't customised them, you might have 50 product pages all with generic patterns. Each page needs a unique, keyword-optimised title tag.
3. Internal search results being indexed. Check site:yourstore.com inurl:search in Google. If you see results, your search pages are being indexed and diluting your crawl budget.
4. Tag-based collection pages creating duplicate content. Check site:yourstore.com inurl:/collections/ inurl:+ in Google. These filtered collection pages often contain the same products as the parent collection with slightly different URLs.
5. Missing or broken canonical tags on variant URLs. Open a product page, select different variants, and check if the canonical tag stays consistent. Some apps and theme customisations break this.
The Audit Workflow
Here's a practical 30-minute technical SEO check you can run on your Shopify store right now:
- Open
yourstore.com/sitemap.xml— verify all child sitemaps load correctly - Submit the sitemap to Google Search Console if you haven't already
- Open
yourstore.com/robots.txt— verify the default rules are present - Run
site:yourstore.com inurl:searchin Google — if results appear, add a search disallow to robots.txt - Check canonical tags on three pages: a product page, a collection page, and a product page accessed from a collection
- Verify your homepage has a unique title tag and meta description
- Spot-check 5 product pages for unique title tags
This takes 30 minutes and catches the most impactful issues.
For the performance side of Shopify SEO, our image optimization guide covers the other half of the equation — because technical SEO configuration means nothing if your pages take 4 seconds to load.
And for the accessibility side, our WCAG compliance guide covers the overlap between accessibility and SEO — alt text, heading structure, and semantic HTML all affect both.
If you want a comprehensive technical SEO audit of your Shopify store, book a discovery call with us. At Innovatrix Infotech, technical SEO is part of every build we deliver — not an afterthought bolted on post-launch.
Written by

Founder & CEO
Rishabh Sethia is the founder and CEO of Innovatrix Infotech, a Kolkata-based digital engineering agency. He leads a team that delivers web development, mobile apps, Shopify stores, and AI automation for startups and SMBs across India and beyond.
Connect on LinkedIn