0

    Website indexation is one of the most pressing issues in every SEO’s work. And it is not surprising: what is the point of continuous technical optimization if your website doesn’t even exist for search engine robots?

    In this article, we will talk about 11 reasons why your website might not get indexed by search engines and show you how to use Netpeak Spider to figure out which one can interrupt the indexing of your site.

    1. The entire website is disallowed in a robots.txt file

    First and foremost reason why many websites become invisible for search engine robots is using wrong directives in a robots.txt file. Strangely enough, it happens because of the lack of directives and syntax knowledge, or negligence. After a website release, a lot of webmasters and SEO specialists forget to remove an extra disallow and make website visible.

    In that case, Netpeak Spider will show you a Blocked by robots.txt issue.

    blocked by robots txt

    2. Wrong directives in Meta Robots or X-Robots-Tag

    Directives that were set up with Meta Robots or X-Robots-Tag work only for specific URLs and can hide only some selected pages. Noindex or nofollow directives that were created accidentally can spoil your life and successfully hide, right up to a comprehensive technical SEO audit.

    If Netpeak Spider finds a noindex directive in metadata or the HTTP header response page, it will be highlighted as URL with Blocked by Meta Robots or Blocked by X-Robots-Tag issue. All URLs with a nofollow directive in the <head> section of the document and the HTTP header section will also be marked as pages with Nofollowed by X-Robots-Tag and Nofollowed by Meta Robots issues, respectively.

    blocked by meta robots

    3. The website is penalized by Google

    This is the most burning problem for the websites with a history. You need to be careful when you buy a second-hand domain: there is a big chance of getting a website with a “dark past”. In that case, efforts to restore Google’s trust will be much more significant than the expected benefits.

    If you’re buying a website with a history, we highly recommend to:

    • Request access to Google Analytics to analyze the traffic growth dynamics and find any possible anomalies
    • Check all the data in Google Search Console
    • Take a look at archived versions of the website in Wayback Machine
    • Perform a backlink analysis (with Serpstat or Ahrefs, for example)
    • Use Sucuri SiteCheck to make sure search engines didn’t penalize that website

    4. Access to .js files is limited for search robots

    If a part of your website content becomes visible only after JS rendering, search robots must have access to all .js files. Otherwise, they will not be able to see a page correctly and index all the links hidden in JavaScript.

    You can check if there are any JS-files hidden from search robots with Netpeak Spider. Just turn on the Check JavaScript option in the main settings menu and launch crawling as usual.

    Also, keep in mind that according to Google, it has no problems with JavaScript rendering, but we can’t say the same thing about other search engines. If you want to optimize your website for Yandex, Bing, Yahoo, and other search engines, we highly recommend you take a look at their official documentation about JS websites crawling and recent research.

    5. Low website loading speed

    SEO specialists may disagree about a lot of things, but all of them agree that low website loading speed negatively affects SEO. It can even become the reason why impatient search engines will ignore your website.

    To figure out which pages of your website have critically low loading speed, crawl it with Netpeak Spider. If you have such pages, they will be marked with a Long Server Response Time issue. You can also analyze each page with a Google PageSpeed without leaving Netpeak Spider window. Just right-click on its URL and select Open URL in service→ Google PageSpeed.

    page speed google

    6. Rel=”canonical” leads to a page with a redirection

    SEOs use rel=“canonical” attribute to show search engines which page is preferable for indexing. But if a canonical page redirects to another URL, only the final redirection address will have a chance to be indexed by search robots. It will happen because the webpage response code is different from 200 OK.

    7. No internal links on a page

    If you have created one or several new pages but haven’t placed any links to them inside the website, the search engine robot will probably not find them.

    You can find such pages with Netpeak Spider. An Internal PageRank Calculation tool in Netpeak Spider will detect all pages that cause link juice maldistribution, pages without compliant incoming links, in particular. They will be marked as ‘orphans.’

    8. Search engine indexing is disabled in CMS settings

    If you’ve created a website on a WordPress CMS, you can hide it from search engine robots in one click. You just need to activate the Search Engine Visibility option in the settings panel. We can’t say if this option affects indexing by Google or any other search engines, but better safe than sorry.

    search engine visibility

    Check for similar settings in other CMSs.

    9. The website can be accessed only by authorized users

    If you’ve decided to hide your site from unauthorized visitors until it is ready, don’t forget that you’ve also hidden it from the search engines.

    By the way, authorization won’t stop Netpeak Spider if you are going to crawl your website before launch. Just open the Authentication tab in the settings menu and enter your login and password to access to the site.

    netpeak authentication

    10. Access to the website is denied in .htaccess file

    In most cases, .htaccess file helps set redirects, but sometimes it is also used to limit access for search engine robots. For example, if you want to block Google’s bot your .htaccess file will look like this:

    RewriteEngine On
    
    RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
    
    RewriteRule . - [F,L]

    If you have accidentally (or on purpose) used it in such a way, we recommend you remove the lines shown above or use an older version of the .htaccess file to allow access.

    11. Incorrect Server Response Code

    Only web pages with 200 OK Server Response Code have a chance to get a place in a search engine’s index. Even if the page looks entirely normal, but has response code other than 200 (for example, 404 or 503), it will not be indexed by search engine robots. If you’ve detected such a problem on your website, ask a web developer for help.

    In a nutshell

    Even though some of search engines’ working mechanisms are shrouded in mystery, we can name a few main reasons why your website might not get indexed by robots:

    • Entire website is disallowed in a robots.txt file
    • Wrong directives in Meta Robots or X-Robots-Tag
    • The website is penalized by Google
    • Access to .js files is limited for search robots
    • Rel=”canonical” leads to the page with a redirection
    • No internal links on a page
    • Low website load speed
    • Search engine indexing is disabled in CMS settings
    • The website can be accessed only by authorized users
    • Access to the website is denied in .htaccess file
    • Incorrect server response code

    Have you ever faced the indexing issues mentioned above? Why did they appear and how did you solve the problem with URL non-compliance? Tell us about your experience in the comments below!

    11 Reasons Why Your Site Might Not Get Indexed
    Rate this post
    Alexandra Metiza
    Content marketer with 5+ years of experience, insane sneakerhead, experimentalist, and unstoppable fun-seeker rolled into one.

    Comments

    Leave a Reply