Spam And Bad Bot Traffic Is Always Hitting Your Website

Spam And Bad Bot Traffic

Every day I seem to spend more and more time monitoring and blocking spam and bad bot traffic.

It doesn’t matter if your blog or website is big or small; bots are hitting it all the time.

When I check a couple of my smaller sites that receive very few visitors per day, there is always a steady stream of automated bot traffic.

But to give you an idea of how much web traffic is automated, malicious, or spam, I completed a full audit of Just Publishing Advice. Here are the results.

Monitoring spam and bad bot traffic

You probably check your traffic numbers with Google Analytics (GA).

It’s one of the best free tools to get an idea of how well your blog or website is performing.

There’s nothing better than to see a steady increase in the number of users and page views.

But what it doesn’t tell you is how many automated, suspicious or malicious visits your site receives.

If you want to discover the traffic that GA ignores or misses, you need to dig deeper with other data sources.

I use a handful of tools to monitor and protect my site from the bad actors. Luckily, most of them are free.

The only paid service I use is Statcounter, which only costs me $9.00 per month.

It collects similar data to GA, but the big advantage is that it reports IP addresses and outbound link activity.

Because of this, I can monitor and manage scrapers, automated bot hits and check for invalid Adsense ad clicks.

Now on to the data to show you what I discovered.

 

Spam and bad bot traffic activity in detail

On average, my site receives around 3,500 real user visits per day.

I would always like to have more, but it’s not too bad.

Average daily visits
Average Daily Visits

But this is not the full picture.

When I check and collect access data from other sources, the real number of hits on my site is around 11,500 per day.

spam and bad bot traffic by numbers

As you can see, there is a lot more happening on my site than most analytics tools report.

A better way to look at this data is in percentages.

spam and bad bot traffic by percentages

Of all the visits to my site each day, only 32% is real visitor traffic.

However, this number seems to be about average.

Help Net Security reported in 2021 that automated traffic makes up 64% of internet traffic.

 

How to access spam and bad bot traffic data for your site

As I mentioned before, I use mostly free tools.

These form my lines of defense against spam and bad bot traffic.

1. Cloudflare

You might think that Cloudflare is only a CDN for making your site load faster.

But that’s only a side benefit of a free account. The real advantage of using Cloudflare is security.

Its web application firewall (WAF) is my first line of defense.

Cloudflare blocking

I have masked out the IP addresses due to privacy. But you can see the total number of blocks and challenges issued for this one day is 1,728.

With the WAF, you can set your own firewall rules or use the tools to block or challenge IP addresses or ASNs.

It’s by far the best tool to manage unwelcome traffic on your site.

2. Wordfence

My second line of defense is the Wordfence plugin on my site.

It blocks any malicious traffic that might get past Cloudflare.

Worfence blocks

The number of blocks varies from day to day. But on average, it blocks between 250-450 attempts each day.

3. Server protection

The last line of defense is my ISP Apache server.

From the access and error logs, I can scan for any untoward activity that the server has blocked. I can also check if any allowed activity looks suspicious.

Then I can use Cloudflare or Wordfence to look after any suspicious activity I find.

 

Catching spammers

Spammers are more of a nuisance than a threat.

But there are relatively easy ways to manage them.

Comment spam

Akismet is a free plugin that works quite well to combat comment spam on your blog.

Akismet anti spam

The accuracy rate is around 99.5%, so it works very well.

There are about 4,000 legitimate comments on my site. But Akismet has blocked over 70,000 spam comments!

SEO spam

Link outreach campaigns are now nothing more than spam.

It was considered a legitimate practice to ask for backlinks in years gone by.

But now, SEO tools like Semrush make it so easy to automate these campaigns directly to your email address.

There’s nothing you can do other than delete these emails as they arrive.

In my case, it can be 100-200 per day asking for links, guest posts, or sponsored post placement.

For me, that amount is definitely what I consider as nothing more than pure spam.

 

What can you do about scrapers?

web scraping

Python and other forms of web scraping are becoming more and more common.

It isn’t easy to know what to do about it.

Recently, Linkedin tried to stop web scrapers, but a US court ruled that scraping was legal.

It’s relatively easy to find scapers that are accessing your site. You can search your server access logs for user agents such as python-requests or python/3.

You can also set up a temporary Cloudflare firewall rule and issue a javascript challenge.  (http.user_agent contains “python-requests”) or (http.user_agent contains “Python/3”)

But there is little you can do other than monitor it. The only time you really need to challenge or block a scraper is when it is hitting your site too often.

I had one that was hitting my site over 14,000 times per day from over 50 different IP addresses.

There are legitimate reasons for scraping, such as SEO research or data gathering. But there are also content scapers that copy, steal and republish your content.

But it’s not that easy to tell the difference.

 

Vulnerability scanners

This is another form of bot traffic that is sometimes good but mostly bad.

Web security companies naturally and helpfully scan for software, plugin, and theme vulnerabilities that can be patched and fixed.

But then there are hackers that are looking for the same vulnerabilities to access and control websites.

Again, it’s not easy to tell the good guys from the bad guys.

The best approach is to let Cloudflare and Wordfence manage the issue in most cases. But there are times that I have to add a manual block just to be sure.

 

Good bots and bad bots

Search engines like Google and Bing use bots to check your site. Without these, your site would never stand a chance of being indexed and your pages ranking for search.

You want your site and blog posts to rank on Google and Bing, so yes, these are really good bots.

Other good bots help you analyze your traffic. These might include Ahefs, Semrush, and Ubersuggest, among others.

But yes, there are also bad bots like hackers and spammers that don’t have your best interests in mind.

Learning how to tell the difference is not always easy. But excessively blocking bots will often do you more harm than good.

Again, all you can do is monitor, check, and then be selective about which ones you block or challenge.

I use a couple of free online tools to help me check.

One is AbuseIPDB. You can check any IP address to see if it has been reported as abusive.

Another is Scamalytics. With this app, you can check the fraud score of an IP address.

 

Conclusion

There is no way you can stop spam and bad bot traffic on your website or blog.

All you can do is monitor it and then try to manage it as best you can.

But don’t be surprised if you discover that around 65% of your site traffic is automated bots.

The latest report from Imperva confirms that bad bot activity is increasing every year.

All site owners can do, and should do, is learn how to manage the threats as effectively as possible.

Derek Haines

A Cambridge CELTA English teacher and author with a passion for writing and all forms of publishing. My days are spent writing and blogging, as well as testing and taming new technology.

Avatar for Derek Haines

2 thoughts on “Spam And Bad Bot Traffic Is Always Hitting Your Website

  • Avatar for V.M. Sang
    June 9, 2022 at 3:13 pm
    Permalink

    A great post, and it will be useful for a lot of people, but for non- techy people like me, it’s just too difficult.
    What are scrapers, for example? How do you use JavaScript to temporarily block stuff?
    I wanted to add, I think it was a MailChimp link, onto my site and it told me to add it to a part of the software. I had no idea how to do it so I didn’t bother.
    There seems to be some idea that if you have a website, you understand these things.

    Reply
    • Avatar for Derek Haines
      June 9, 2022 at 6:40 pm
      Permalink

      Scrapers don’t visit and read a page on your site like normal visitors. They aim to download data and code from your site.

      It could be for names, addresses, or locations. But other scapers are after your text content to copy and republish.

      As for javascript blocking, you need to use a service or programs like Wordfence or Cloudflare.

      I agree that you need a bit of technical know-how.

      But once you set it up, or get someone to do it for you, it’s relatively easy to protect your site.

      Reply

Add Your Comment

Your email address will not be published.

To prevent spam, all comments are moderated and will be published upon approval. Submit your comment only once, please.

This site uses Akismet to reduce spam. Learn how your comment data is processed.