Unreachable robots.txt

ssynowsky-dagger · July 31, 2024, 9:42pm

Here is the URL for the robots.txt: https://atlantawhereyoubelong.com/robots.txt

My live test in the URL inspection says “Failed: Robots.txt unreachable”. I can reach it with the browser, and if I change my UserAgent to GoogleBot, I can still see it. Not sure what’s wrong. I saw this was a previous issue on Git, so hoping for a resolution on here!

versecafe · August 1, 2024, 2:31am

User-Agent: *
Disallow: /

might be causing an issue if it puts the wildcard over everything else, googles error could just be wrong, I can’t think of a reason why it would fail one it’s available, you could try removing everything and setting up the policy around just the general user agent and see if it works

ssynowsky-dagger · August 1, 2024, 2:40am

I actually added that in after I was getting the error, worried I might need it for some reason. I will remove it, just in case.

webmonkey · August 1, 2024, 5:12am

Hi @ssynowsky-dagger,

I’ve had a look and it seems that the robots.txt is an Edge Function/Middleware which fails on invocation. You can test this yourself:

→ curl -I https://atlantawhereyoubelong.com/robots.txt
HTTP/2 500 
cache-control: public, max-age=0, must-revalidate
content-type: text/plain; charset=utf-8
date: Thu, 01 Aug 2024 05:05:58 GMT
server: Vercel
strict-transport-security: max-age=63072000
x-vercel-error: EDGE_FUNCTION_INVOCATION_FAILED
x-vercel-id: fra1::pq8s7-1722488758778-c08f7de0f7fe
content-length: 61

I recommend checking your Runtime Logs and you might see that locale information (accept-language header) is missing, which I believe Googlebot also doesn’t send in this case and this is causing the crawling to fail.

However, once you add this header to the request, actual content is returned. The recommended mitigation is to not only rely on the accept-language header rather than having a fallback if this header is not present:

→ curl -I https://atlantawhereyoubelong.com/robots.txt -H "accept-language: en-US"         
HTTP/2 200 
accept-ranges: bytes
access-control-allow-origin: *
age: 78
cache-control: public, max-age=0, must-revalidate
content-disposition: inline
content-type: text/plain
date: Thu, 01 Aug 2024 05:10:39 GMT
etag: "8a10c1ad8c7af097f987274f026391f4"
server: Vercel
strict-transport-security: max-age=63072000
vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch, Next-Url
x-matched-path: /robots.txt
x-vercel-cache: HIT
x-vercel-id: fra1::qdb49-1722489039839-5ba7582cc320
content-length: 404

ssynowsky-dagger · August 1, 2024, 5:12pm

Thank you so much! Once I updated some items in my middleware, everything works perfectly! Now, to wait on Google haha

webmonkey · August 1, 2024, 5:24pm

Glad to hear you’ve managed to solve the issue. I’ve just gave it another try and can confirm that the request now responds with a 200!

system · August 2, 2024, 5:24pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting Robots.txt unreachable Error Help	2	20	October 13, 2024
Robots.txt Unreachable Help	3	28	August 8, 2024
Robots.txt not updating on live site Help	3	15	December 27, 2024
Your Team encountered an unknown problem. Please reach out to our support team for details Help	3	4	July 30, 2024
Unable to access sitemap.xml and robots.txt Help react	2	14	January 1, 2025

Unreachable robots.txt

Related topics