Home/Blog/The SEO audit checklist we use on every client site
SEOPillar ArticleFebruary 24, 202610 min read

The SEO audit checklist we use on every client site

The actual 47-point checklist we run on every website. Technical SEO, content quality, schema markup, AI readiness, and performance — nothing generic.

SEO audit checklist with technical checks and scoring

Every SEO audit I've seen shared online is the same recycled list of obvious stuff. "Make sure your site loads fast." Thanks, never thought of that.

Here's what we actually check when a new client site lands in our queue. This is the 47-point checklist we run at Build444. Every point has a specific pass/fail threshold. No vibes. No "it depends." Numbers.

1. Technical SEO

This is the foundation. If search engines can't crawl your site properly, nothing else matters. We check 12 things here.

Crawlability and indexing

  • robots.txt exists and doesn't accidentally block important paths. We see this more than you'd think. One client had Disallow: /products/ in production for 8 months.
  • XML sitemap is present at /sitemap.xml, returns a 200 status, and contains fewer than 50,000 URLs per file. Every URL in the sitemap should return 200. No 404s, no 301s.
  • Canonical tags on every page. Self-referencing canonicals on standard pages. Cross-domain canonicals only where intentional.
  • No noindex tags on pages that should be indexed. We crawl the full site and flag every page with noindex so the client can confirm each one.

HTTPS and security headers

  • Full HTTPS with a valid certificate. No mixed content warnings.
  • HSTS header present with max-age of at least 31536000 (one year). includeSubDomains preferred.
  • CSP (Content Security Policy) header present. Doesn't need to be perfect on day one, but it needs to exist.

Core Web Vitals

These are the three metrics Google actually uses in ranking. We measure on mobile, using real Chrome User Experience Report data when available, lab data when it's not.

  • LCP (Largest Contentful Paint): Under 2.5 seconds. Anything between 2.5s and 4s gets flagged yellow. Over 4s is a hard fail.
  • CLS (Cumulative Layout Shift): Under 0.1. This one kills trust. When a page jumps around while loading, users leave. Most CLS issues come from images without explicit width/height or late-loading web fonts.
  • INP (Interaction to Next Paint): Under 200ms. This replaced FID in 2024. It measures how fast your page responds to clicks, taps, and key presses. Heavy JavaScript is the usual culprit.

2. On-page SEO

On-page is where most sites leak the most ranking potential. Small, boring fixes that compound over hundreds of pages.

Title tags

  • Every page has a unique title tag. No duplicates across the site.
  • Under 60 characters. Google truncates at roughly 580 pixels, which works out to about 60 characters for most fonts. We check pixel width, not just character count.
  • Primary keyword appears in the first half of the title. Not stuffed. Just present.

Meta descriptions

  • Every page has a unique meta description. Under 160 characters.
  • Contains a clear call to action or value proposition. Google doesn't use meta descriptions for ranking, but they affect click-through rate by 5-10%.

Heading hierarchy

  • Exactly one H1 per page. Not zero. Not three. One.
  • H1 contains the primary keyword for that page.
  • Headings follow a logical hierarchy. No jumping from H1 to H4. No using H3 because it "looked better" in the design.
  • We check for heading skips (H1 to H3 with no H2) across the entire site.

Image alt text

  • Every content image has descriptive alt text. Not "image1.jpg". Not the filename. An actual description of what's in the image.
  • Decorative images use an empty alt="" attribute so screen readers skip them.
  • Alt text under 125 characters. Longer than that and screen readers may cut it off.

3. Content quality

Bad content is the number one reason sites don't rank. We measure it, not just read it.

Readability

  • Flesch Reading Ease score of 60 or above. That's roughly an 8th-grade reading level. Most B2B sites score in the 30s. That's too hard for the web.
  • Average sentence length under 20 words. Long sentences kill comprehension on screens.
  • Paragraphs no longer than 3-4 sentences.

Keyword density

  • Primary keyword density between 0.5% and 2.5% of total word count. Below 0.5% means Google might not understand what the page is about. Above 2.5% starts looking like keyword stuffing.
  • We also check for keyword cannibalization: multiple pages targeting the exact same primary keyword. When two of your pages compete for the same term, usually neither ranks well.

Thin and duplicate content

  • No pages with fewer than 300 words of unique content (excluding navigation, footers, sidebars).
  • No pages with more than 70% content similarity to another page on the same site. We use Jaccard similarity on n-gram sets, not just exact-match detection.
  • Paginated content has proper rel="next" and rel="prev" or uses a single long page with lazy loading.

4. Schema markup

Schema is structured data that helps search engines understand what your pages are about. It also powers rich results (those fancy search listings with ratings, FAQs, and prices).

Required schema for every site

  • Organization schema on the homepage with name, url, logo, and sameAs links to social profiles.
  • WebPage schema on every page with name, description, and dateModified.
  • BreadcrumbList schema that matches the visual breadcrumb navigation.

Conditional schema

  • FAQ schema on any page with a FAQ section. We verify the JSON-LD matches the visible page content. Google will penalize invisible FAQ schema.
  • Product schema on product pages with name, price, currency, availability, and aggregate ratings if they exist.
  • Article schema on blog posts with headline, author, datePublished, dateModified, and image.
  • LocalBusiness schema for any business with a physical location.

We validate all schema using Google's Rich Results Test. No errors. Warnings are okay in some cases, but we document each one.

5. AI readiness

This is the section nobody else checks. If your site isn't set up for AI search (ChatGPT, Perplexity, Google AI Overviews), you're invisible to a growing chunk of your audience.

AI crawler access

  • robots.txt allows GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Blocking these crawlers means your content never enters AI training data or AI-generated answers.
  • We check the actual robots.txt directives, not just whether the file exists. A lot of sites block AI crawlers without realizing it because their CMS added default rules.

Structured data for AI citations

  • Clean, complete schema markup (see section 4). AI systems pull structured data to generate citations and source links.
  • Clear, self-contained paragraphs that directly answer common questions. AI models prefer content they can excerpt without needing context from surrounding paragraphs.

AI search visibility

  • We search for the client's brand name and primary keywords in ChatGPT, Perplexity, and Google AI Overviews.
  • We document whether the site appears, whether competitors appear, and what content the AI is pulling from.
  • This baseline tells us where to focus content improvements.

6. Performance

A slow site loses money. Every 100ms of added load time reduces conversions by roughly 1%. We measure everything.

Image optimization

  • All images served in WebP or AVIF format. No PNGs for photos. No uncompressed JPEGs.
  • Images properly sized. We flag any image that's served at a resolution more than 2x its display size. A 4000px-wide image in a 400px container is wasted bandwidth.
  • Lazy loading (loading="lazy") on all below-the-fold images. Above-the-fold images (especially the LCP element) must NOT be lazy loaded.

Font loading

  • Web fonts use font-display: swap or font-display: optional. Never font-display: block, which creates invisible text during loading.
  • No more than 3 font files loaded. Each font file adds a network request and parsing time.
  • Font files preloaded with <link rel="preload"> for the most-used weight/style.

JavaScript

  • Total JS bundle size under 200KB compressed for the initial page load. We see sites shipping 800KB+ of JavaScript for what is basically a brochure site.
  • No render-blocking scripts in the <head> without async or defer.
  • Third-party scripts (analytics, chat widgets, tracking pixels) loaded after the main content. Every tracking script you add costs your users time.

What we do with the results

We don't just hand over a spreadsheet and say "good luck." Every check gets a score. We weight the scores by category (technical SEO and AI readiness get the heaviest weights at 13% each). The total gives a score out of 100.

Then we build a prioritized action plan. The fixes that will move the needle the most go to the top. We estimate effort for each fix so you know what to tackle first.

If you want us to run this audit on your site, we offer it as a one-time report. You get the full 47-point analysis, scored and prioritized, delivered as a PDF within 48 hours.

No retainer required. No sales calls. Just the data.

Daniel Dalgaard

Daniel Dalgaard

Founder of Build444. Builds websites, automations, and SEO systems for businesses that want to grow online.

Read more

Want to know where your website stands?

Get a complete SEO analysis with AI readiness score in under 5 minutes.

Get your SEO audit