PHP howto – Sanitize database inputs

When accepting data from a user, any data at all, it should be sanitized before making its way to your database.

What does this mean? Well, for one, you’re going to inspect the data and make sure that it doesn’t contain any malicious code such as ill-intentioned javascript.  Another is to prepare the data so that when it gets added to your insert/update SQL it doesn’t break the SQL (or do other nasty actions). Otherwise know as a SQL injection attack.

The technical details of the types of attacks we’re protecting against are a bit out of the scope of this post, but there are numerous resources available which will explain far better than I am able to.

After a form has been submitted (via get or post) it gets stored in the global array $_GET or $_POST.  Once we have this data, we can and should do a bunch of things to it, such as:

Stripping out malicious code

We’ll scan through the input, searching for anything that shouldn’t be there, like html code, <script> tags, etc.

]*?>.*?@si',   // Strip out javascript
    '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
    '@
]*?>.*?

@siU',    // Strip style tags properly
    '@@'         // Strip multi-line comments
);

    $output = preg_replace($search, '', $input);
    return $output;
}
?>

’slashing

This part can sometimes get tricky, but not to worry, the code’s not too bad.  Basically we’re adding a backslash before any of the following: (single-quote), (double quote), \ (backslash) and NULL characters.  Depending on your server configuration, there are a bunch of ways of getting this done.  PHP has something called magic_quotes, which does this automatically.  Note, however, that as of PHP 6 this feature has been deprecated and removed.  Another PHP function, addslashes(), is the manual version of magic_quotes.  addslashes(“Where’s Wally”); will return “Where\’s Wally”.  A better option, if your server supports it, is mysql_real_escape_string().  It performs pretty much the same function, but is apparently better.

$val) {
            $output[$var] = sanitize($val);
        }
    }
    else {
        if (get_magic_quotes_gpc()) {
            $input = stripslashes($input);
        }
        $input  = cleanInput($input);
        $output = mysql_real_escape_string($input);
    }
    return $output;
}
?>

To use, we simply pass any input to the function. The function works on single strings, as well as deep arrays.

 It's a good day!";

$_POST = sanitize($_POST);
$_GET  = sanitize($_GET);
$good_string = sanitize($bad_string);
// $good_string returns "Hi! It\'s a good day!"
?>

Typecasting

Making sure that the data we’re inserting matches the expected type;  i.e, someone’s age should be received as an integer value, and not a string.


This is a very gentle introduction to sanitizing your database input, and I would certainly recommend that you do a lot more research on these methods in order to use them correctly in your given environment.

That’s it for today. If you found this useful, of would like to improve it, comments are always appreciated!

Tags: , , , , , , , ,

55 Responses to “PHP howto – Sanitize database inputs”

  1. Neil Garb says:

    Hey Den — you have two typos in the second-last codeblock. sanatize = sanitize.

  2. Denham Coote says:

    Grrr. Thought I’d fixed that. Thanks! Fixed :)

    Oh, happy birthday again!

  3. Hey man, this post is excellent, most programmers don’t bother doing any of this. Good post

  4. Denham Coote says:

    Hey Anthony, thanks for the feedback!

  5. Freyr Helgason says:

    Hey, just wanted to say thank you. These helped me a lot in getting started with sanitizing my inputs =) Great for novices like me to read and comprehend. I don’t think there’s nearly enough emphasis put on this in my web design course and the only thing that was ever really mentioned was to check that the e-mail address was typed correctly.

    It’s people like you that make it possible to self-educate! =)

  6. Matt says:

    This is exactly what I was looking for – a function to quickly and easily sanitize POST and GET. I’m going to use this so often, it’s now in my Firefox bookmarks toolbar – too useful to be buried in my delicious bookmarks. :) Thanks for the great post!

  7. mike says:

    Your regex for stripping out javascript is redundant. You already have a regex to strip out html tags. The script tag is an html tag.

  8. Technocrat says:

    You might find http://htmlpurifier.org/ to be of use.

    Stumbled! here

  9. [...] input (yes I’ve had instances where I purposely didn’t want to sanitize incoming data). For a good tutorial on how to sanitize user input  – try this one . You can also assign input data to a session variable more easily. I.e $_SESSION["userdata"] = [...]

  10. Hey Denham,

    Nice tutorial and code, but I think it’s a better approach, to try and define what kind of input you will accept, instead of trying to define all the stuff you DON’T want to accept. Like instead of saying “I don’t want html, javascript and CSS”, It’s better to say ‘I only want to following characters that fall into these ranges: [a-z], [A-Z], [0-9] and [.,!?]” – I think this will be a much more secure approach, instead of trying to think up all the sneaky stuff, people can do, to break things…

  11. Denham Coote says:

    Mike: Thanks for the tip. :)

    Technocrat: Awesome, thank you – I’ll be sure to take a look at it.

    Michael: Absolutely. I saw this as a more general way of getting rid of common nasties, as opposed to validation of each input – I would still promote validating your inputs _after_ this has been done (ie, email addresses, etc)

  12. DaveD says:

    Wouldn’t html entities do this for you much quicker and allow the safe retention of characters that some users find useful suck as , etc. http://uk.php.net/htmlentities

  13. Denham Coote says:

    DaveD: Cool function, but what about when Bob ‘Badass’ enters his name along with a nasty script? That then gets preserved. Then when I display it on the view page, it renders the hidden (evil) script and when you visit you get your ass handed to you?

    Update: Actually, go look at the 3 latest posts on that page (May soon get modded, though) where they are talking about the exact same thing. To quote the one guy:

    “Your post is pointless. You never use htmlentities on data recieved from the user. You might only escape it when you insert it in the database user real_escape_string. Only when you display it, you change it in the template.”

    That’s not to you, Dave, just part of the discussion happening over at php.net.

  14. Sam says:

    Dont forget to filter out onClick and related, otherwise javascript can still be used.

  15. DaveD says:

    This is a good point and one worth exploring, so if we take a double quote a single quote and greater than and less than sign (probably the worst offenders), pass them to htmlentities($str, ENT_QUOTES) we get the inoffensive string back " ' < > (hope that prints okay on this site) there’s no harm in any of that?

    When you way want to escape a slash it would be for other reasons, you are working with very defined field lengths in a database and html entities may give you more characters than the original user input.

    I’m certainly not seeking in invalidate your efforts here.

  16. Mark James says:

    I understand that your function is meant to only tackle the database side of things, but in your comment you completely disregard the use of htmlentities which DaveD correctly suggests and also suggest that people fun this on their data BEFORE filtering/validating. I believe you’re mixing up filtering and escaping here in one function, which is a dangerous habit to get in to, and the function you offer might be bug-free in syntax but would certainly lead to a buggy website. For example, if you had a website to store the names of people and used your function before displaying user input back to the user, then all single quotes would end up as \’ in the HTML itself, not to mention your regular expressions would allow a wide range of malicious Javascript while stripping out input which may have been non-malicious (I know of at least one company with a name that resembles a HTML tag – this would not make it through your ’sanitizing’).

    The correct way to handle user input is to filter input and escape output.

    Filtering of values should be done to all input before it is handled with the application, although you should still consider that data to be ‘dirty’ within your application.

    Only when the data goes to an external party (i.e. the user in the form of HTML or the database in the form of SQL) should it be properly escaped/encoded.

    For people who are just discovering this topic and need some simple rules to get them into the basics, pleas ignore the function on this page and try these instead (caveat lector!):

    * All user input is dirty. Do not trust it. You can attempt to filter it (i.e. check that it is in the correct format), but do not attempt to mysql_escape or run it through any other output-specific functions at this point. (PHP5 has a filter extension for filtering)
    * Inside your PHP scripts, the data will still be dirty.
    * Before sending the data to a MySQL database, run it through the mysql_real_escape_string function (or better yet, use PHP5 and the PDO library instead of the mysql_ functions)
    * When using the data on a HTML page, run it through the htmlentities function. This will convert (almost) anything which the browser could interpret as malicious code into safe entity equivalents.

    Sorry to rain on your parade here Denham, but I do feel some major clarifications to your original post are required.

  17. Denham Coote says:

    Hi Mark

    Thanks for your really informative post. Do you have some links I can include? I’ll gladly update my post.

    If you look again, you’ll note that I didn’t disregard DaveD’s comment – I actually asked a question, and followed it with a reference, hoping that someone more knowledgeable (such as yourself, perhaps?) might shed some more light on the matter.

    I see now, though, the advantage to htmlentities in that even if the nasty input is preserved, it gets written back in a way that isn’t damaging. That’s cool, and I’ll make use of it.

    D.

  18. Denham Coote says:

    DaveD: Apologies, it seems the spam filter didn’t like your use of html – All fixed now.

    Thank you for your input – See my previous comment re htmlentities..

    D.

  19. Indy says:

    You can just use array_map() instead of recursive function call .. it does the same thing, but looks cooler.

  20. Aries-Belgium says:

    There is already a function in PHP that does that: strip_tags(). That function takes also a second parameter which is a list of parameter you like to accept:

    strip_tags( $message , “” );

    If you don’t allow html just do strip_tags without the second parameter and sanitize the message with htmlentities().

    It’s also better to clean up html when you output your message and not when you save it. Why? Well, if you later allow html or other html tags all your messages from before the changes will render the same way as new ones. Also when saving string that has gone through htmlentities() is larger because html tags are converted to their html code. So foreach tag will get converted to > and < so that are 4 characters instead of 1.

  21. [...] You have written the form HTML, the form has been submitted and all input data has been sanitized [...]

  22. [...] PHP howto – Sanitize database inputs (tags: programming PHP webdev database) Posted under Links [...]

  23. David Keech says:

    The comments here about accepting known good characters rather than rejecting known bad characters are spot on. The consequences if you get it wrong are what explains the difference. If you forget to strip one bad character then your security is compromised. If you forget to allow one good character then one user gets an error message.

    The bit I wanted to add was about addslashes() vs. mysql_real_escape_string().

    The reason mysql_real_escape_string() is better for user supplied data destined to be used in an SQL query is because it also escapes the percentage symbol and the underscore which are both special characters in the SQL syntax. The characters % and _ are used in the LIKE clause and act as wildcards.

  24. [...] You have written the form HTML, the form has been submitted and all input data has been sanitized (make sure to read the comments). [...]

  25. Andy Bailey says:

    really useful thanks, just what I needed to replace str_ireplace

  26. James says:

    hey, great job dude.
    precisely what I was looking for.

  27. eliezer says:

    that’s a handy function pal!

  28. en3r0 says:

    Great post, very helpful!

  29. [...] further useful and related article can be found here – the author desribes how to sanitise database input before applying it to your MySQL database. [...]

  30. This is a very handy light-weight way to validate input. Thanks for the code, it’ll come in very handy.

    Dwayne.
    http://probablysucks.com

  31. methode says:

    Hi,

    I have to admit that I don’t really understand why nobody, except Mark in a tiny side-note, mentions the filter_var function of PHP5.
    I really hope that the vast majority of the PHP users already upgraded their PHP to 5(+). Since it’s an inbuilt function of PHP5 it should work way better and faster than a user-written function. I know that everything has bugs, but since it was re-reviewed by hundreds of thousand of developers, it should be less buggy than any function which is written by the PHP end-users.

    Also, I didn’t encounter yet situation when the filters didn’t manage to catch a malicious code.

    For example:

    $a = 'My malicious code &lt;script src=”http://some.malicious/code.js” type=”text/javascript”&gt;&lt;/script&gt;';
    filter_var($a, FILTER_SANITIZE_STRING);
    print_r($a);
    //the result is: My malicious code

    all rights reserved -© – methode :)

    OK, i forgot what i wanted. Anyway, filters are good, you can validate with them and sanitize user input extremely easily.

    BTW, see this comment? It has >em< and >code< in the body, I guess this should be the ultimate escaping method since I’m able to put emphasis and code blocks in my comment as WP doesn’t filter them out. We should lurk over some WP filters, if they’re good for the GOV, they should be great for us, too.

    My 2c

  32. Darb says:

    You might want to read http://www.codinghorror.com/blog/archives/001172.html by the guy who runs http://stackoverflow.com

    There are good links that hang off of his post about html sanitisation. A White list approach (striptags) is much better than the blacklist approach you take, as there is much smaller room for error, and you are more likely to get false negatives than false positives, which is preferable.

  33. [...] Coote’s Blog has a great article on Stripping out malicious code for PHP, which is easy to implement and very effective. . <? function [...]

  34. [...] works on single strings, as well as deep arrays. Denham Coote?s Blog has a great article on Stripping out malicious code for PHP, which is easy to implement and very effective. PHP Code: <? [...]

  35. Reid M says:

    For selective HTML filtering and standardization using PHP, you can try the simple htmlawed — http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed

  36. wheat says:

    Thanks a million. Your article had the clearest explanation and the best code examples I’ve found. My code is better now thanks to you.

  37. Heather says:

    It appears that someone likes your code a bit too much, considering that they ripped your entire article:

    http://blogs.srijan.in/2008/11/07/xss-sanitization-for-php/

  38. [...] my search, I found a great post by Denham Coote, who really details a system which goes beyond just adding slashes or using special character [...]

  39. Jeremy says:

    Thanks very much for your code. *Hopefully* my site will now be able to stand up to SQL injections from my hacker friends or anybody more nefarious.

  40. Rev says:

    Would’nt the following work?
    $myvar = htmlspecialschars(strip_tags(trim($_POST['somevar'])));

  41. anonim says:

    Try lass.inputfilter_clean.php
    http://www.phpclasses.org/browse/file/8942.html

    It is the best I found

  42. database says:

    database…

    Are you able to make some prognoses how database will develop say during the next three – five years?…

  43. Gerard says:

    Really cool! Im new at this kind of stuff. Just observing

  44. bundyxc says:

    I love your cleanInput() function. I’ll definitely use it in my projects. Thanks!

  45. William Scholtz says:

    This is only good if you know you will always use mysql DB, We don’t always know.

  46. ugg sale uk says:

    This is only good if you know you will always use mysql DB, We don’t always know.

  47. emanu says:

    the cleanInput() function seems to work as well without the regex element @…
    the tags are removed by the other regex element, no?
    am I missing something?

  48. Ali Qamar says:

    Informative post and very informative comments as well however my vision is blurred now :-(

  49. chips zynga says:

    although I throw away a large amount of of my daytime hours on the net actively playing video games like myspace poker or mafia wars, I always like to dedicate some free time to surf a small amount of websites occasionally and I’m seriously happy to report this latest piece of writing is honestly fairly good quality and greatly improved than 1 / 2 the various poor quality stuff I read today , anyways i’m off to enjoy a couple of hands of facebook poker

  50. Der Koerper says:

    Hey thank you very much for this post! This is what I need!

    Nice work :)

Leave a Reply