Skip to content

September 19, 2010

4

Get Your RegEx On

by Anthony Verre

A Not-So Regular Expression

Nearly every single SEO/SEM knows that using analytics is an absolute essential, but when you’re not measuring and uncovering the most valuable data, either through analytics reporting or traffic segments, you’re not using the tool to your advantage.

Regular Expressions (RegEx) are one of the most powerful and least used in an analytics tool belt.  It certainly walks the line of SEM geek-craft, but there will come a time when you need to pull out the pocket protector and thick, taped glasses to get the data you need. Here’s what Google defines a regular expression as:

Regular Expressions are a set of characters you can use match one or more strings of text. Regular Expressions is that they support wildcard matching, letting you capture a lot of variations (in URLs for example) using a single string of characters.

The Basics of RegEx Characters: A Quick Guide

If you are familiar with or write advanced query operators for Google, Yahoo, or Bing, writing RegEx for analytics is along the same vein. Below are the most common characters I use when writing regular expressions in analytics reports to refine the data.

  1. ” ^ ” (caret): using the caret before letter or keyword at the start of the string will match position rather than a character
    • Example: (^keyword) or (^www)
  2. ” $ ” (dollar sign): using the dollar sign at the end of the letter or keyword will match a position rather than a character
    • Example: (y$)
  3. ” | ” (pipe delimiter): used to string together multiple items into a series of options
    • Example: (keyword|keyword1|keyword2|etc.)
  4. ” + ” (plus sign): will match as many items as possible
    • Example: (keyword+)

Geeky student child wearing glassesWhy You Should Be Using Regular Expressions in Your Analytics

As an SEO/SEM you need to get granular. While Google does a good job of getting you high-level data in their standard reporting, it is a bit harder to dig deep if you don’t have your handy-dandy RegEx operators. Beyond that, there are times when the sites you work with will require RegEx in order to track goals properly. It’s an essential tool in your toolkit for measuring correctly, tracking SEO strategy progress, and measuring ROI.

Using Regular Expressions to Filter Keywords in GA

The keywords report is one of those live and die by reports in GA. Google is kind enough to let you segment by paid and organic, but if you want to get to the meat of your efforts, you’ll need a regular expression.

Keywords Before RegEx

Your initial keyword list will more than likely have the majority of those keywords being branded. Either through user typing in the website into Google (yes, that still happens) or querying the corporation’s name or brand.  It’s not very helpful if you need to find out if your SEO/SEM strategy is actually working. For instance, have the onsite tweaks you made generated more traffic, or the link building efforts you started on “Keyword X” started boosting traffic and conversions?

The initial list you’re presented with from GA isn’t going to help you prove any of that. Using a RegEx made up of the caret (^), pipe delimiter (|), and possibly the dollar sign ($), you can create a filter that will get you to the heart of your organic traffic.

Regular Expressions to Track Dynamic Goals

On occasion it can’t be avoided; websites have dynamic URLs generated on the “Thank You” page. When this happens you can’t simply just add-in the thank you page URL. You have to write a regular expression to capture the all the goals.

Let’s say you have a “Thank You” URL that looks something like this: http://%5Bprimary domain]/thank-you/contact-us?sid=259.  Using a RegEX to amend the URL (/thank-you/contact-us+), we’ve effectively take the dynamic element out of the goal conversion and are now able to track as accurately as possible. If the URL structure is more segmented and delineates more definitive sections, then you can create multiple goals still using the (+) to combat the dynamic session id associated.

If you’re concerned that dynamic portions of the URL helped to designate what section or what particular form, you can use “Reverse Goal Path” for each goal to see the exact path a user took through the site and completed the conversion. It will take more front-loaded leg work, but the piece of mind you’ll create for your client is worth the extra hour or two.

The Power to Slice Your Data Deeper

Those are two very common examples that regular expressions can be used for within your analytic reporting. With RegEx, you can’t be afraid to fail because you will. The trick is to keep testing your expressions until you get exactly the data you need.

There are great tools out there will allow you to test your RegEX to make sure it works prior to implementing on live analytics; Epik One’s Regular Expression Filter Tester is a perfect for this task. It’s specific for Urchin 5 and Google Analytics. RegEx can help you slice your data anyway you want, you just have to know the questions you want to ask.

4 Comments
  1. Sep 20 2010

    Great post, I would like to add that regex are a lot easier with perl language, but are still useful in js, PHP and so on.

    • Sep 20 2010

      Bruno,

      Truthfully, I use rather simple regular expressions (and combinations of them) in analytics that don’t require much programming knowledge. But you have me intrigued, and could you recommend any books/sites that could help aspiring RegEX writers?

  2. Sep 20 2010

    I use RegEx mostly for filtering out branded keywords and to report on non-branded organic keyword performance. But I like that tip about the goal tracking on dynamic thank you pages. I used to filter branded traffic by adding more and more filters, it took forever! Then I learned about the pipe!

    brandedterm|brandedvariation|variation|variation

    When you have clients with a very strong brand this string can get VERY long.

    If you come across any other good examples/uses of RegEx please do share them.

  3. Sep 20 2010

    Tony- great idea on RegEx for dynamic pages. I’ve only dabbled in RegEx for keyword filtering. This should come in handy.

Comments are closed.