This file is an implementation specification for an IDE coding agent working inside the repository for https://scottcoff.in.
The goal is to build a repeatable diagnose → improve → diagnose workflow for a Jekyll/GitHub Pages site using:
This workflow should prioritize correctness, canonical identity, crawlability, accessibility, and stable layout over chasing small Lighthouse score changes.
Follow these rules throughout the implementation.
Prioritize issues in this order:
scottcoff.in to scottcoffin.github.ioStop and request human review if any proposed fix requires:
First inspect the repo. Do not edit files in this step.
Inspect the repository and report:
_config.ymlGemfilepackage.json_includes/head.html_layouts/default.html_data/navigation.ymlrobots.txtsitemap.xmlSEO_CHECKLIST.md/Research//Data_Science//Media//Expertise//CV/` is used
jekyll-seo-tag is configuredjekyll-sitemap is configuredscottcoffin.github.iohttp://scottcoffin.github.iohttps://scottcoffin.github.ioReturn a concise file-by-file implementation plan before editing.
If no package.json exists, create one. If it exists, extend it conservatively.
Add these development dependencies:
{
"@lhci/cli": "latest",
"lighthouse": "latest",
"http-server": "latest"
}
Only add concurrently or wait-on if genuinely useful.
package.json scriptsAdapt commands to the actual repo if needed.
{
"scripts": {
"build:site": "bundle exec jekyll build",
"serve:site": "npx http-server _site -p 4000",
"lhci": "lhci autorun",
"psi": "node scripts/pagespeed_insights.mjs",
"audit:tasks": "node scripts/audit_to_tasks.mjs",
"diagnose": "bash scripts/diagnose_seo_perf.sh",
"diagnose:live": "RUN_PSI=1 bash scripts/diagnose_seo_perf.sh",
"jsonld": "node scripts/validate_jsonld.mjs",
"gsc": "node scripts/search_console_check.mjs",
"gtmetrix": "node scripts/gtmetrix_check.mjs"
},
"devDependencies": {
"@lhci/cli": "latest",
"lighthouse": "latest",
"http-server": "latest"
}
}
.gitignore entriesAdd report outputs to .gitignore unless the user explicitly wants to version reports:
reports/lighthouse/
reports/lhci/
reports/pagespeed/
reports/search-console/
reports/gtmetrix/
reports/*.json
reports/*.csv
reports/*.html
reports/*.log
Keep curated Markdown summaries trackable only if desired. If uncertain, ignore all generated reports and let the user decide.
Create:
lighthouserc.js
Use this as the starting configuration. Adjust paths only if the site uses different URL paths.
module.exports = {
ci: {
collect: {
staticDistDir: './_site',
url: [
'http://localhost:4000/',
'http://localhost:4000/Research/',
'http://localhost:4000/Data_Science/',
'http://localhost:4000/Media/',
'http://localhost:4000/Expertise/'
],
numberOfRuns: 3,
settings: {
formFactor: 'mobile',
screenEmulation: {
mobile: true,
width: 390,
height: 844,
deviceScaleFactor: 3,
disabled: false
},
throttlingMethod: 'simulate'
}
},
assert: {
assertions: {
'categories:performance': ['warn', { minScore: 0.70 }],
'categories:accessibility': ['error', { minScore: 0.90 }],
'categories:best-practices': ['warn', { minScore: 0.90 }],
'categories:seo': ['error', { minScore: 0.90 }]
}
},
upload: {
target: 'filesystem',
outputDir: './reports/lhci'
}
}
};
If staticDistDir does not serve correctly in this repo, switch to a startServerCommand approach:
module.exports = {
ci: {
collect: {
startServerCommand: 'bundle exec jekyll build && npx http-server _site -p 4000',
startServerReadyPattern: 'Available on',
url: [
'http://localhost:4000/',
'http://localhost:4000/Research/',
'http://localhost:4000/Data_Science/',
'http://localhost:4000/Media/',
'http://localhost:4000/Expertise/'
],
numberOfRuns: 3,
settings: {
formFactor: 'mobile',
screenEmulation: {
mobile: true,
width: 390,
height: 844,
deviceScaleFactor: 3,
disabled: false
},
throttlingMethod: 'simulate'
}
},
assert: {
assertions: {
'categories:performance': ['warn', { minScore: 0.70 }],
'categories:accessibility': ['error', { minScore: 0.90 }],
'categories:best-practices': ['warn', { minScore: 0.90 }],
'categories:seo': ['error', { minScore: 0.90 }]
}
},
upload: {
target: 'filesystem',
outputDir: './reports/lhci'
}
}
};
After setup, run:
npm run lhci
If this fails, fix the configuration before proceeding.
Create:
scripts/pagespeed_insights.mjs
Purpose: run Google PageSpeed Insights against deployed public URLs and save both raw JSON and summary CSV.
fetch.fetch, print a clear message requiring Node 18+.PAGESPEED_API_KEY=...
[
'https://scottcoff.in/',
'https://scottcoff.in/Research/',
'https://scottcoff.in/Data_Science/',
'https://scottcoff.in/Media/',
'https://scottcoff.in/Expertise/'
]
mobiledesktopperformanceaccessibilitybest-practicesseoreports/pagespeed/YYYY-MM-DDTHH-mm-ss/
reports/pagespeed/latest-summary.csv
reports/pagespeed/latest-summary.md
Use NA for missing values.
timestamp
url
strategy
performance_score
accessibility_score
best_practices_score
seo_score
lcp_ms
cls
inp_ms_or_na
tbt_ms
fcp_ms
speed_index_ms
total_byte_weight
render_blocking_savings_ms
unused_css_savings_bytes
unused_js_savings_bytes
image_savings_bytes
canonical_audit_score
viewport_audit_score
meta_description_audit_score
crawlable_anchors_score
http_status_code
final_url
Use:
https://www.googleapis.com/pagespeedonline/v5/runPagespeed
Parameters:
url=<encoded URL>
strategy=mobile|desktop
category=performance
category=accessibility
category=best-practices
category=seo
key=<optional API key>
The script should:
Create:
scripts/audit_to_tasks.mjs
Purpose: read latest Lighthouse/PageSpeed reports and turn them into actionable engineering tasks.
Search for latest available reports from:
reports/lhci/
reports/lighthouse/
reports/pagespeed/
Create:
reports/audit-summary.md
reports/audit-tasks.json
For each URL:
Order issues by:
{
"page": "string",
"source": "lighthouse|pagespeed|custom",
"category": "seo|performance|accessibility|best-practices",
"auditId": "string",
"title": "string",
"description": "string",
"score": "number|null",
"numericSavings": "object|null",
"affectedAssets": ["string"],
"likelySourceFiles": ["string"],
"recommendedFix": "string",
"risk": "low|medium|high",
"canAttemptAutomatically": true
}
Use these default risk rules:
alt textwidth/height to imagesloading="lazy" to below-the-fold imagesreports/audit-summary.md should include:
Create:
scripts/local_seo_check.mjs
Purpose: inspect built HTML directly for site-specific SEO problems that Lighthouse may not fully catch.
This script assumes _site exists. The diagnose script will build before running it.
For these generated files if present:
_site/index.html
_site/Research/index.html
_site/Data_Science/index.html
_site/Media/index.html
_site/Expertise/index.html
Check:
<h1> or a clearly acceptable theme equivalent<title> exists<meta name="description"> exists<link rel="canonical"> existshttps://scottcoff.inscottcoffin.github.ioscottcoffin.github.io_site pathsalt attributeswidth and height where feasiblerobots.txt existssitemap.xml existshttps://scottcoff.inscottcoffin.github.ioWrite:
reports/local-seo-check.md
reports/local-seo-check.json
Add a package script:
"seo:check": "node scripts/local_seo_check.mjs"
Create:
scripts/validate_jsonld.mjs
Purpose: validate local JSON-LD syntax without claiming Google rich-result eligibility.
_site/.<script type="application/ld+json">
...
</script>
reports/jsonld-validation.md
reports/jsonld-validation.json
The script validates JSON syntax only. It must not claim that a page is eligible for Google rich results. Google Rich Results Test remains a manual check.
Create:
scripts/diagnose_seo_perf.sh
Make it executable if possible.
set -euo pipefail
mkdir -p reports reports/lhci reports/pagespeed
bundle exec jekyll build
node scripts/local_seo_check.mjs
node scripts/validate_jsonld.mjs
npx lhci autorun
RUN_PSI=1, run:node scripts/pagespeed_insights.mjs
node scripts/audit_to_tasks.mjs
reports/audit-summary.md
reports/audit-tasks.json
reports/local-seo-check.md
reports/jsonld-validation.md
reports/lhci/
reports/pagespeed/latest-summary.md
The script should exit nonzero if:
The script should not fail solely for performance warnings.
Only implement this after Search Console ownership verification is complete.
Create:
scripts/search_console_check.mjs
Query Google Search Console for:
Do not hard-code credentials.
Support:
If credentials are missing, exit with a clear setup message.
GSC_SITE_URL="sc-domain:scottcoff.in"
GSC_DAYS=28
Default to:
sc-domain:scottcoff.in
[
'https://scottcoff.in/',
'https://scottcoff.in/Research/',
'https://scottcoff.in/Data_Science/',
'https://scottcoff.in/Media/',
'https://scottcoff.in/Expertise/'
]
reports/search-console/url-inspection-latest.json
reports/search-console/search-analytics-latest.csv
reports/search-console/search-console-summary.md
For the past GSC_DAYS:
Classify queries as:
branded
topical
other
Use simple rules:
scott, coffin, scott coffin, plastiversemicroplastic, pfas, toxicology, toxicokinetic, risk assessment, environmental toxicologyAdd package script:
"gsc": "node scripts/search_console_check.mjs"
Do not include this in default npm run diagnose.
Only implement this if the user wants an independent external performance service.
Create:
scripts/gtmetrix_check.mjs
Do not hard-code credentials.
Use:
GTMETRIX_API_KEY=...
If missing, exit with clear setup instructions.
[
'https://scottcoff.in/',
'https://scottcoff.in/Research/'
]
reports/gtmetrix/raw/
reports/gtmetrix/summary.md
Add package script:
"gtmetrix": "node scripts/gtmetrix_check.mjs"
Do not include this in default npm run diagnose.
After the tooling exists and npm run diagnose works, use this process.
Use the latest:
reports/audit-summary.md
reports/audit-tasks.json
reports/local-seo-check.md
reports/jsonld-validation.md
Choose exactly one low-risk, high-impact improvement.
Prefer issues affecting /Research/ first.
Use this order:
For the selected issue:
npm run diagnose
Stop after either:
This task is especially important for this site.
Audit and fix canonical domain leakage.
Problem to look for:
The public site should consistently use:
https://scottcoff.in
Internal navigation, canonical tags, feeds, sitemap, footer links, and social/profile site links should not point to:
https://scottcoffin.github.io
except where intentionally linking to GitHub-hosted source code or a GitHub profile/repository.
grep -R "scottcoffin.github.io" .
https://scottcoff.in
grep -R "scottcoffin.github.io" _site || true
_site contains no unwanted internal links to scottcoffin.github.io.https://scottcoff.in.https://scottcoff.in.https://scottcoff.in.Prioritize /Research/ because it is a high-value SEO page.
Improve /Research/ while preserving publication content.
Research | Scott Coffin, PhDResearch by Scott Coffin, PhD on microplastics in drinking water, ecotoxicology, PFAS, New Approach Methodologies, toxicokinetics, computational toxicology, and regulatory risk assessment.https://scottcoff.in/Research/ResearchReplace:
little or not toxicology testing
with:
little or no toxicology testing
Replace:
https.://doi.org/
with:
https://doi.org/
Replace malformed concatenations such as:
ShareMicroplastics
with:
Share Microplastics
Optimize images conservatively.
/Research/, especially author/profile/sidebar images.width and height attributes where feasible.alt text.loading="lazy" only for below-the-fold images.Reduce render-blocking resources conservatively.
/Research/.defer.Fix accessibility failures before marginal performance score issues.
alt text.Validate JSON-LD syntax locally and add structured data conservatively.
scripts/validate_jsonld.mjs for syntax.Person, ProfilePage, or WebPage JSON-LD over marking up every publication unless data are complete and accurate.Homepage:
ProfilePagePersonResearch page:
WebPagePerson as author/mainEntityabout topics:
This requires site ownership verification first.
scottcoff.in as a Domain property in Google Search Console using DNS TXT.https://scottcoff.in/sitemap.xml
https://scottcoff.in/https://scottcoff.in/Research/https://scottcoff.in/Data_Science/https://scottcoff.in/Media/https://scottcoff.in/Expertise/After credentials are configured, run:
npm run gsc
After deploying changes to GitHub Pages/custom domain, run:
npm run psi
or:
npm run diagnose:live
If local Lighthouse improves but PageSpeed does not:
If PageSpeed says insufficient real-user data are available, rely on lab diagnostics and Search Console until field data accumulate.
Only add CI after local diagnostics run reliably.
Create:
.github/workflows/seo-performance.yml
Suggested workflow:
name: SEO and Performance Diagnostics
on:
pull_request:
workflow_dispatch:
jobs:
seo-performance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
bundler-cache: true
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- name: Install npm dependencies
run: npm install
- name: Run diagnostics
run: npm run diagnose
- name: Upload reports
uses: actions/upload-artifact@v4
if: always()
with:
name: seo-performance-reports
path: reports/
Do not include PageSpeed Insights, Search Console, or GTmetrix in PR CI unless credentials/secrets and rate limits are configured safely.
After implementing the workflow, report:
# SEO/performance automation implementation report
## Files changed
| File | Change |
|---|---|
## Commands added
| Command | Purpose |
|---|---|
## Diagnostics run
| Command | Result |
|---|---|
## Current baseline
| Page | Performance | Accessibility | Best practices | SEO | LCP | CLS | TBT |
|---|---:|---:|---:|---:|---:|---:|---:|
## Critical issues
## Low-risk next fixes
## Medium/high-risk issues requiring review
## Manual follow-up
- [ ] Verify Domain property in Google Search Console
- [ ] Submit sitemap
- [ ] Run URL Inspection
- [ ] Run Google Rich Results Test manually
- [ ] Run live PageSpeed after deployment
After implementation, the normal workflow should be:
npm install
npm run diagnose
After deployment:
npm run psi
After Search Console credentials are configured:
npm run gsc
For one improvement iteration, instruct the IDE agent:
Use reports/audit-tasks.json and reports/audit-summary.md to select exactly one low-risk, high-impact issue affecting /Research/. Make the minimal fix, rerun npm run diagnose, compare before/after metrics, and revert if SEO/accessibility/build status worsens.
At minimum, this implementation should add or modify:
package.json
lighthouserc.js
scripts/pagespeed_insights.mjs
scripts/audit_to_tasks.mjs
scripts/local_seo_check.mjs
scripts/validate_jsonld.mjs
scripts/diagnose_seo_perf.sh
.gitignore
Optional additions:
scripts/search_console_check.mjs
scripts/gtmetrix_check.mjs
.github/workflows/seo-performance.yml
SEO_CHECKLIST.md
Use local Lighthouse and Lighthouse CI for repeatable pre-deployment diagnostics. They are suitable for an IDE agent because they can run against the locally built site and produce machine-readable reports.
Use PageSpeed Insights API for deployed public URLs. It can be automated and can return Lighthouse-based lab diagnostics and, when available, field data.
Use only after site ownership is verified. This is the best programmatic source for Google-selected canonicals, indexing status, sitemap status, and actual search query/page performance.
Use optionally as a second external performance opinion. It requires API credentials and should not be part of the default local loop.
Use manually for Google-specific structured-data eligibility. Locally, only validate JSON-LD syntax.
The workflow is complete when:
npm run diagnose builds the site and produces reports.reports/audit-summary.md exists.reports/audit-tasks.json exists.npm run psi can test deployed URLs.