Hiding email addresses

Two methods of hiding email addresses in Spip’s text fields from spam robots

On our site we need to publish some contact lists which include email addresses. In order to avoid the “harvesting” of these addresses by spam robots they need to be hidden in some way. Here are two ways of doing this, which (I hope) are adequately secure.

Both methods make use of the apres_propre “entry point”, which is provided in Spip from version 1.7 onward, to intercept the text stream.

I should say that this whole contribution comes with a Health Warning attached: I do not know a lot about PHP, and the Regular Expressions here are among the first I have written (they may well be the last, too...).

Method 1

In the file ecrire/mes_options.php3 (create this file if it does not already exist), place these lines (making the change noted at point 3):

/*
 *   +----------------------------------+
 *    Filter name : AntiSpam1                                               
 *   +----------------------------------+
 *    Date : 26 May 2004
 *    Author : Paolo                                       
 *   +-------------------------------------+
 *    Hides email addresses in Spip's 
 *    text fields 
 *   +-------------------------------------+ 
 *  
 *    Please post suggestions and comments in
 *    the forum associated with the article:
 * http://www.uzine.net/spip_contrib/article.php3?id_article=537
*/

function apres_propre($string) {
$replacement = "M";
$tip = "Antispam measure: you need to replace ".$replacement." by @ in order to send mail.";

// 1. force the link to lower-case letters
preg_match_all("/mailto:[^\"\s>]*/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i], strtolower($found[0][$i])."?body=".$tip, $string);
  }

// 2. replace the @ in the links  
preg_match_all("/mailto:[^\?]*/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i],str_replace("@", $replacement, $found[0][$i]),$string);
  }

// 3. 'Encodes' the word "mailto:" 
// N.B.: it seems to be impossible to stop Spip 
// transforming the line below.
// For each of the 7 characters, change "&&" to "&#"!
$string = str_replace("mailto:", "&&109;&&x61;&&105;&&x6c;&&116;&&x6f;&&58;", $string);
    
// 4. Takes out the "@" of the text if an email has been given as the text of a link
preg_match_all("/>\S+@{1}\S+\s?<\/a/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i],str_replace("@", $replacement, $found[0][$i]),$string);
  }  
return $string;
}

How does it work?

1. First, we look for all the instances of mailto: and match the following characters up until the next inverted commas, space or closing angle bracket, and count that as an email link.

Next the link (which may of course contain capital letters) is forced to lower-case. At the same time a “tip” is added to the link. When the visitor to the page clicks on the link, this text will be inserted into the body of the new email telling them what to do in order to make the email address valid.

2. The @ is replaced with the replacement string which is defined at the beginning of the function and which you can change according to taste. Here, I’ve chosen a capital M. As the link contains only lower-case letters this will be easy for the visitor to see to replace, but hopefully incomprehensible for robots.

3. Mail robots apparently usually look for the text “mailto:” so it makes sense to change it a bit. To make it a bit more confusing the string of entities uses a mixture of hex and decimal encoding.

4. This next regular expression checks if between a closing angle bracket and a </a there is an @ character anywhere. This will usually be indicative of an email address having been given as the text of a link. So this text is converted in the same way. It would be possible to just change this text to something like “Send email” - as is done in the second method.

Advantages of this method

-  It will be (I think) good at hiding the addresses from robots.

-  Unlike Spip’s default |antispam filter, it will not convert every @ sign in the text, but just the ones in email links (so you can still write sentences like: “C U @ 9”, she texted to her friend - ok, no great advantage!)

-  The spaces in the email address produced by the default |antispam filter produces quirky effects in some email software when the email links are clicked. This allows you to avoid that.

Disadvantage of this method

-  It’s tiresome for the person clicking on the link to have to correct it manually.

Method 2

In the file ecrire/mes_options.php3 (create this file if it does not already exist), place these lines:

/*
 *   +----------------------------------+
 *    Filter name : AntiSpam2                                               
 *   +----------------------------------+
 *    Date : 26 May 2004
 *    Author : Paolo                                       
 *   +-------------------------------------+
 *    Hides email addresses in Spip's 
 *    text fields 
 *   +-------------------------------------+ 
 *  
 *    Please post suggestions and comments in
 *    the forum associated with the article:
 * http://www.uzine.net/spip_contrib/article.php3?id_article=537
*/

function apres_propre($string) {
preg_match_all("/mailto:[^\"]*/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $comat = strpos($found[0][$i],"@");
  $part1 = substr($found[0][$i],7,($comat-7));
  $part2 = substr($found[0][$i],($comat+1));
  $newstr ='#" name="'. $part2 . '" title="' . $part1 . '" onClick="location.href = dolink(this.title, this.name); return false;';
  $string = str_replace($found[0][$i], $newstr, $string);
  }
// if an email address is given as the text of a link, change it
preg_match_all("/>\S+@{1}\S+\s?<\/a/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i],">[Email]</a",$string);
  }  
return $string;
}

Then, in the <head> section of the templates where text with emails may appear place these lines:

<script Language="JavaScript">
<!--
function dolink(part1, part2){
  link = 'mailto:' + part1 + '@' + part2;
  return link;
}
//-->
</script>

Alternatively, you can of course put this function in a separate .js file and link your templates to it using a line like this:

<script type="text/javascript" src="mes_scripts.js"></script>

How does it work?

The function matches strings beginning with mailto: until it finds a pair of inverted commas. So it is important that the email links be well formed with the href attribute enclosed in double inverted commas (email links made with Spip’s shortcut are like this).

Then the email link is jumbled up by assigning bits of it to different attributes.
So a link that contains
<a href="me@nowhere.net" ...
is transformed into
<a href="#" name="nowhere.net" title="me" onClick="location.href = dolink(this.title, this.name); return false;" ...

The email is only decoded when a visitor clicks on the link.

If the text of a link contains an @ the whole text is replaced; in this case by the word [Email].

Advantages of this method

-  All the advantages of the first method, plus

-  The link works when it is clicked and doesn’t need correcting manually.

Disadvantage of this method

-  The link will only work if the visitor has a browser with Javascript. Otherwise they will not be able to get at the email address at all.

Note (June 2005): This contrib has now been superseded by “Un système antispam”, published in French.

updated on 25 January 2007

Discussion

Une discussion

  • 1

    I also created my personnal hiding method. It’s a mix of your 2 methods and some salt.

    The code is designed for french sites but you can update it. If you are interested for the code, ask me.

    • Cela me semble bon! J’ai aussi dévelopé mes idées un peu après avoir écrit cette contrib.

      Paolo

    Reply to this message

Comment on this article

Who are you?
  • [Log in]

To show your avatar with your message, register it first on gravatar.com (free et painless) and don’t forget to indicate your Email addresse here.

Enter your comment here

This form accepts SPIP shortcuts {{bold}} {italic} -*list [text->url] <quote> <code> and HTML code <q> <del> <ins>. To create paragraphs, just leave empty lines.

Add a document

Follow the comments: RSS 2.0 | Atom