Google Unveils Major Spam Filter Overhaul

by John Lister on December, 12 2023 at 09:12AM EST

Google says it has significantly upgraded Gmail's spam filter to overcome a common scam tactic. It's now using AI to detect images that aren't technically text characters but are still readable by humans.

It also says the new system will reduce the number of false positives: legitimate emails mistakenly flagged as spam. That's certainly felt like an increasing problem over the past year or so.

The scammer tactic tackled with the update is called adversarial text manipulation. That takes account of the fact that a key part of spam filtering involves analyzing the text in an email and looking for patterns and signals associated with unsolicited messages.

Fake Characters Deceive

These can include text that appears to be written by a machine (for example, generating thousands of different emails to see which wording is most persuasive). It can also include badly translated or poorly written text. One theory is that some spam senders deliberately make their scam "obvious" to most people to filter out the skeptical and leave only the people most likely to fall for a scam.

The problem is that adversarial text manipulation uses code and images of special characters that resemble letters closely enough that people can read them, but don't have any "meaning" to the spam filter, thereby getting past the spam filter and straight into your inbox.

For example, the word "Microsoft" might be written as "Microsof+", but in a much more convincing way using graphics.

The new system is called RETVec, standing for Resilient & Efficient Text Vectorizer. Previously spam filters have tried to combat this using optical character recognition, the same system used when scanning printed documents and converting them to text. (Source: techspot.com)

Context Is Key

Instead, RETVec uses image similarity as a starting point, then uses context to try and figure out the most likely meaning of each character. For example, it might individually rank consecutive characters as possibly being a "t", and "h" and an "e". Putting this information together means it can be more confident that the word is indeed "the".

Perhaps the biggest difference is that the new technology significantly reduces the number of steps needed to identify a letter. Previous spam filters used millions of parameters, whereas RETVec uses around 200,000. That uses less computing resources, making it more practical to use. (Source: arstechnica.com)

What's Your Opinion?

Have you spotted this type of spam? Have you noticed any changes in spam filtering recently? Are false positives a bigger problem than missed spam these days?

Filed under:

| Tags:

Rate this article:

Most popular articles

Need Help? Ask!

My name is Dennis Faas and I am a senior systems administrator and IT technical analyst specializing in cyber crimes (sextortion / blackmail / tech support scams) with over 30 years experience; I also run this website! If you need technical assistance , I can help. Click here to email me now; optionally, you can review my resume here. You can also read how I can fix your computer over the Internet (also includes user reviews).

We are BBB Accredited

We are BBB accredited (A+ rating), celebrating 21 years of excellence! Click to view our rating on the BBB.

Search form

You are here