Like any search engine we have a bot or crawler. We call it whosagoodblog-bot. Unlike most search engines we use a directed crawl strategy trying to just crawl blog posts. Our intent is to provide blog readers with the easiest access to the best blogs.
We adhere to robots.txt rules for the Robots Exclusion Protocol. If a site tells WhosaGoodBlog not to crawl a certain urls, WhosaGoodBlog obeys this. We do not access restricted areas behind a sign-in or a pay-wall. We respect noindex and nofollow meta tags. Whosagoodblog-bot is clearly identified in its user-agent string.
Whosagoodblog-bot waits between page requests to avoid overloading servers. We do not currently support crawl-delay directives, but we have a minimum time between requests of 5 seconds. WhosaGoodBlog caches content during a crawl to eliminate repeated requests.
We use a directed crawl strategy, attempting to limit the crawl to only blog posts. We try to ignore contact, legal, and shopping pages, anything that is not a blog post. If you have a blog with a shop, no problem we are happy to index your blog posts and we won't crawl your shopping pages. If you are a product or service company with a blog section on your website, that is where we will direct our crawl.
WhosaGoodBlog does not accept payment to crawl a site faster or more frequently
WhosaGoodBlog only indexes information accessible to the public. For WhosaGoodBlog, ethical crawling means bringing value back to the web by directing traffic to the crawled sites.
We do not use AI in crawling or on any of the gathered data.