Tuesday, 1 December 2009

Using MySQL to retrieve substring from the middle of a string

In MySQL it's hard to grab part of a string out of the middle of a VARCHAR or other text-based field, especially if you don't know exactly how far into the string the text you want begins, and it's of variable length, so you don't know where it ends either. Here's a way to do it.

Here's an example. The following two strings are mobile device user agent strings captured on a web server - one is for an iPhone and the other for the iPod Touch:

Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/1A542a Safari/419.3
Mozilla/5.0 (iPod; U; CPU iPhone OS 3_1_1 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Mobile/7C145

Let's say you're recording hundreds of these in a column in MySQL and you want to give a list of every AppleWebKit version out there. How do you extract just the number from after the part which says "AppleWebKit/" and before the "(KHTML..."?

To solve this problem, you can use a nested combination of MySQL's SUBSTRING and LOCATE operators:

mysql> SELECT
SUBSTRING(column_name,
LOCATE('AppleWebKit/', column_name) +12,
(LOCATE(' (KHTML', column_name)) - (LOCATE('AppleWebKit/', column_name) +12)
)
AS string_you_need
FROM table_name
;

In plain English, this says "select the substring located in column_name which i) starts at the position number which is 12 positions after the start of the text 'AppleWebKit/' and ii) ends at the position number which is the position of the beginning of the text ' (KHTML' MINUS the position number which is 12 positions after the start of the text 'AppleWebKit/' ".

This will give you a result looking like this:

+----------------------+
| string_you_need |
+----------------------+
| 420.1 |
| 528.18 |
+----------------------+

You can of course supplement this using GROUP BY and ORDER BY operators when working on a larger number of rows.


No comments:

Post a Comment