Technical SEO Baseline for a New or Rebuilt Website

On this page17

Technical SEO Baseline for a New or Rebuilt Website

TL;DR

Technical SEO isn't a 200-item checklist. It's a minimum skeleton: Google can crawl your pages, hreflang routes correctly between languages, and you can verify the indexing curve in Search Console two weeks after launch. We bucket every item as P0 (must), P1 (worth doing), or skip (the audit-tool noise). For a new or just-rebuilt site, all-green P0 carries six months of content work. The common items we delete from client backlogs: self-referencing canonicals on empty pagination, ItemList schema on a standalone service page, "fix all 88 medium issues" tickets straight from an audit export.

We wrote this because the same scenario keeps walking through our door. A B2B exporter migrates an old .cn site to multilingual WordPress, pays for a 200-item audit, and the team is paralyzed by a screen of red and yellow. This is the prioritization sheet we walk them through.

Who reads your site

The audience for technical SEO is three crawlers and one human. Googlebot fetches HTML, harvests links and structured data, and rebuilds the index. Bingbot does similar work and quietly powers Bing plus parts of ChatGPT search. The newer AI crawlers — GPTBot, ClaudeBot, PerplexityBot — behave like overworked interns: can they get the main answer in three seconds, and does the first paragraph already address the question? The human is whoever clicks from a Google result or AI summary and decides in three seconds whether to stay.

Naming the readers stops you from doing SEO for SEO's sake. Every robots line, schema field, and hreflang tag should answer "who is this for."

Crawlability

Everything in this section is P0. A single mistake here can drop you out of the index entirely, which beats any keyword tweak you'd ever do.

  • robots.txt: served at the root, plain text. User-agent: * should only Disallow paths you genuinely don't want crawled (admin, login, internal search). The classic disaster: an engineer wrote Disallow: / on staging, the deploy pipeline carried it to production, and traffic vanished for two weeks before anyone opened Search Console.
  • sitemap.xml: only 200-status, indexable, search-worthy URLs. No tag archives, paginated lists, or PDFs. For multilingual sites, one sitemap per language referenced by a sitemap index. Submit to Google Search Console and Bing Webmaster Tools.
  • Status codes: full crawl with Screaming Frog or Sitebulb before launch. Indexable pages return 200, redirects are 301 (not 302), 404s actually return 404 (not soft-404 with status 200). Rebuilds especially need every old URL 301'd to its new home — see How to Preserve SEO During a Website Rebuild.
  • Internal links: every indexable page needs at least one internal link from home or another indexed page. Orphan pages don't get crawled. WordPress themes handle this by default; custom builds often don't.

Pass criteria: zero 404s, zero 5xx, zero non-self canonicals, zero orphan pages on the crawl report. Without this, nothing below matters. Google's SEO Starter Guide treats crawlability as the first prerequisite for everything else.

Page basics

Mostly P0, with a few P1s.

  • Title tag: written one page at a time, 50 to 60 characters. A "Primary keyword — qualifier | Brand" template is fine; bulk-generated titles are not. The job is to say what the page is for a human.
  • Meta description: 120 to 160 characters, written for the click. Google rewrites 60 to 70 percent of descriptions in SERPs, but your version still matters — AI search engines lean on it more directly than Google does.
  • H1: one per page, matching or close to the title. H2s as short labels, H3s for inline subdivisions. Short headings stay scannable for humans and parseable for AI summaries.
  • Canonical: every page self-references a clean URL (no UTM, no tracking). Don't canonical translated pages back to the original language — that's hreflang's job. The single most common multilingual mistake.
  • Alt text: every non-decorative image gets a real alt. Not "image" or "photo" — describe the fact: product model, factory floor, customer logo.

P1 territory: Open Graph and Twitter Card tags. Worth tuning when you're actively promoting on LinkedIn or X; defaults are fine otherwise.

Performance

Core Web Vitals — LCP, INP, CLS — are direct ranking signals, weighted higher on mobile. This is also the one area where audit tools and reality usually agree.

Three P0s:

  • LCP under 2.5 seconds: the largest content element renders in under 2.5 seconds. Usual culprits: uncompressed hero image, webfont blocked behind third-party DNS. Compress to WebP/AVIF, self-host fonts or add font-display: swap, and most sites land under 2.5 seconds same-day.
  • INP under 200 ms: how fast the page reacts to the first click. Bad INP usually means three or four third-party scripts (live chat, heatmap, video popup) blocking the main thread.
  • CLS under 0.1: lock width and height on images and ad slots so nothing jumps as the page paints.

Not P0: chasing PageSpeed from 88 to 95. Those seven points cost a week splitting CSS bundles and barely move real user metrics. Spend the time on content.

For WordPress-specific caching, image pipelines, and CDN selection, the performance section of WordPress Overseas Website Architecture covers the practical setup.

Structured data

The most contentious P0. Done right, AI summaries quote you directly. Done wrong, Google starts doubting your site quality across the board.

What we tell clients to ship and what to leave out:

  • Organization: site-wide, exactly one. Legal name, logo, URL, social profiles, contact info.
  • WebSite: site-wide, with a SearchAction if you have site search.
  • Article: only on blog posts. Not on service pages.
  • Service: only on service pages. Don't dress up a blog post as a Service.
  • FAQPage: only where there's a visible FAQ section. Content has to render in HTML — schema-only answers stopped being eligible after Google tightened the rules in 2023. You won't be penalized; you'll lose the rich result.

Don't ship: BreadcrumbList on one-level-deep pages, ItemList on a standalone service page, Review schema with invented testimonials. These are common audit-tool suggestions Google either ignores or, in the case of fake reviews, manually penalizes.

Validate every schema block before launch with the Schema.org Validator and Google's Rich Results Test. For per-page-type guidance — what schema fits which kind of service — see Schema Markup for Service Websites.

Multilingual

For a multilingual site, technical SEO is mostly one thing: hreflang. And it's the thing 80 percent of the Chinese exporter sites we audit get wrong.

P0 checklist:

  • URL paths separated: subdirectories like /en/, /de/, /es/. Don't rely on ?lang=en query parameters; they confuse Google's clustering.
  • Full hreflang set on every page: self-reference, every sibling language, and x-default. Missing siblings, missing self-reference, or asymmetric pairs all generate Search Console errors.
  • hreflang and canonical don't fight: the English page canonicals to itself, not the Chinese page. The most common mistake among Chinese teams: treating Chinese as source-of-truth and pointing every translated canonical home. Google then drops the English version from the index.
  • Language switcher: lands on the equivalent URL, not the home page. Buyers and crawlers both lose context when it bounces.

Localization itself — keywords, CTAs, proof points per market — is a content topic. The full pattern is in Multilingual Site Structure and Hreflang.

The first two weeks

Launch is when technical SEO actually starts. Write these into the launch PRD:

  • Day 1: URL Inspection on five priority pages (home, top service, lead case study, latest blog, contact). Confirm Google renders each one, picks up hreflang, and sees no canonical conflicts.
  • Day 3: submit the sitemap. Watch the gap between "Discovered" and "Indexed." For a new site, 80 percent discovered and 30 to 50 percent indexed in week one is normal.
  • Day 7: pull the Mobile Usability report. The two errors most common after a rebuild — "Content wider than screen" and "Clickable elements too close together" — are legacy issues from old themes.
  • Day 14: export the Coverage report. Walk every URL in "Crawled — currently not indexed" and "Discovered — currently not indexed." A handful is normal; if those buckets keep growing in week three, it's a content quality problem, not a technical one.

For tying Search Console data together with GA4 to read the trend, see How to Measure SEO with Search Console and Analytics.

What audit tools get wrong

We've watched teams ship 2 a.m. emergency fixes because an audit tool flagged a "critical error." Most of those flags are noise. The ones that come up most often:

  • "Missing meta description": paginated archives, tag pages, and internal search results shouldn't be indexed in the first place. Don't write descriptions for pages you should be noindex-ing.
  • "Multiple H1s": HTML5 allows multiple H1s and Google handles them fine. Worth fixing only if two H1s appear in the same visible region.
  • "Page depth too deep": past three clicks isn't automatically bad. What matters is whether priority pages are reachable via clear internal link paths.
  • "Image missing alt": decorative backgrounds can carry alt="". The tool can't tell the difference; you can.

Our filter: for every flag, check whether Google Search Central documents it. If Google doesn't mention it, it's almost never worth engineering hours.

If you can only fix three things

In order:

  1. 301s and the sitemap. A rebuild that botches redirects will bleed traffic for four to six weeks before anyone notices, and recovery is slow.
  2. LCP and mobile basics. Direct ranking signal, and the area where most Chinese-host-plus-old-theme sites fall short.
  3. Structured data correctness. AI search engines reach for sites with clean schema first; bad schema costs you the slot but doesn't penalize you.

Everything else (Open Graph, breadcrumb schema, pushing PageSpeed past 95) is P1 or P2. Get content shipping, then come back.

Launch checklist

AreaMust-pass itemsPass criteria
Crawlrobots.txt, sitemap.xmlNo site-wide Disallow; sitemap only contains 200 indexable URLs
Status codesFull-site crawlZero 404s, zero 5xx, all 301s pointing to live targets
Page basicsTitle, description, H1, canonicalOne per page, length compliant, canonical self-references
PerformanceLCP, INP, CLSLCP <2.5s, INP <200ms, CLS <0.1
Structured dataOrganization, Service, ArticlePasses Schema.org Validator and Rich Results Test
Multilingualhreflang, canonical, switcherSymmetric pairs, includes x-default, switcher hits equivalent URL
Index checkSearch Console URL InspectionFive priority pages all "Indexable" and rendered correctly
AnalyticsGA4, Search Console, BingReporting live, UTM doesn't pollute canonical

If any row is uncertain, isolate it for a focused review before launch instead of finding it three weeks later when traffic dips.

FAQ

Which matters more, technical or content SEO?

Technical SEO is the foundation. Without P0 you can't get indexed, so content quality is moot. Once the foundation is green, content, E-E-A-T signals, and internal linking decide where you rank. Our pattern: drive technical to all-green, put every hour into content, re-audit technical quarterly.

What SEO tools do we recommend?

Crawl: Screaming Frog (free up to 500 URLs) or Sitebulb. Schema: Schema.org Validator and Google's Rich Results Test. Monitoring: Search Console (mandatory, free) and Bing Webmaster Tools. Ahrefs and Semrush are P1 — useful later, not required in the first three months.

Will failing Core Web Vitals hurt rankings?

Not as a manual action, but Core Web Vitals have been a page-experience ranking signal since 2021 — modest weight, real. The bigger effect: 5+ second LCP roughly doubles mobile bounce rate compared with 2-second sites, which damages dwell time and conversions and indirectly hurts SEO.

What if my schema is wrong?

Syntactically wrong schema or fields that don't match visible content lose the rich result without an algorithmic penalty. Deceptive schema — Review markup with fake testimonials on a page with no review module — is what triggers manual penalties, and recovery takes months.

Get a baseline review

If you're launching a new site or just finished a rebuild, bring your domain, Search Console access, and the launch timeline. We'll run this exact baseline against your site as part of a free initial review under our overseas website build and SEO/GEO support service, and tell you which items are P0 same-week fixes versus what can wait for the next sprint. If any term in this post is unfamiliar, the overseas website glossary covers it in plain language.