Technical

Regular Expressions Not Equal to a Smile or a Frown

ByAmy Bowser-Rollins 10/05/201611/01/2016

One of the nifty and powerful tools in our litigation support tool belt is “regular expressions.”

We use regular expressions to match patterns of data that we need to search for and then perhaps manipulate in some way. A few examples of easy-to-recognize “data patterns” are social security numbers (999-99-9999) and phone numbers (999-999-9999). Both values are consistent in their “data patterns” and both include a specific number of digits. If we needed to search for a social security number that ends with 2334 or a phone number in the United States that begins with an area code of 301, we can use regular expressions to (1) isolate the correct position in the value and (2) match the exact numeric value.

Let's work through one example together.

For this example, we need to search for US phone numbers that look like either of these two patterns:

(999) 999-9999

999-999-9999

Here is an example of a regular expression to find any US phone numbers that includes an area code, with or without parentheses surrounding the area code:

($[0-9]{3}$ |[0-9]{3}-)[0-9]{3}-[0-9]{4}

In terms of syntax, let's define the types of items used in this regular expression. As you read through this list, try to stare at the entire regular expression as a whole and remember the two patterns we need to find.

Using brackets ([ ]) means that at least one of the characters within the brackets must be a match.
Using braces ({ }) means that the preceding characters within brackets must occur exactly the number of times designated within the braces.
One set of parentheses is being used to encapsulate an OR statement
Another set of parentheses is being used to search for parentheses surrounding an area code
A pipe (|) is being used as an OR operator
Backslashes are being used to signify that we literally want to search for the next character; in other words the next character is not part of the regular expression syntax; instead it is part of text string we are searching for
The hyphens are literally part of the text string
There is an intentional space in the text string

Okay, when you look at the regular expression as a whole, can you visualize where the two dashes are in the second phone number pattern (999-999-9999)?

Keep visualizing the regular expression as a whole. Can you see the counts of 3, 3 and 4 that match both phone number patterns? Each phone number is 3 digits plus 3 digits plus 4 digits, right?

Now focus in on the wider instance of open parentheses and closing parentheses.

Inside this wider parenthetical, since we know the pipe character (|) represents the OR operator, can you visualize how it is searching for the area code with or without surrounding parentheses?

Doing this visualization technique leaves us with one last item to understand and that is the numbers within the phone number itself can be any number between 0 and 9.

How did you do? I bet that you can stare at that regular expression now and understand exactly what it means. No more complexity.

Learning how to use regular expressions can be intimidating because, at first glance, it seems similar to trying to learn a programming language. Yes, formulating a regular expression can get complex, but spending the time to gain a basic understanding of the syntax (like you just did with me) will assist with many searching scenarios.

If you had any aha moments during this exercise, let me know in the comments area below, okay?

Bonus Points: At the beginning of this article, I mentioned that if we wanted to search for a US phone number with an area code of 301, we could use a regular expression. Now that you understand the example above, how would you edit it to only search for phone numbers with a 301 area code? You can do this! Trust your instincts.

NOTE: In case you're interested, I wrote another article entitled Regular Expressions – Searching for SSN Numbers Across Text Files where I explained step-by-step how I used regular expressions on-the-job.

Technical

Overview of Database Data Types

ByAmy Bowser-Rollins 05/17/201208/03/2015

Regardless of the database software you are using, there are some standard data types. There may be a slight variation in the actual name of the data type and there might be a slight difference in the specifications for the data type depending on the database software, but for the most part we expect to…

Technical

Regular Expressions – Searching for SSN Numbers Across Text Files

ByAmy Bowser-Rollins 10/28/201408/02/2021

I believe that the best way to learn litigation support is by working through real-life, on-the-job scenarios. This article is about a scenario that landed on my plate a few years ago. In a previous article, I talked about the importance of creating cheat sheets in litigation support. After I figured out a solution to…

Technical

An Index Disguised as a List

ByAmy Bowser-Rollins 07/09/201507/09/2015

A while back I wrote an article entitled A Text File Disguised as a Load File. It is the most read article on this site. Why? Because as the article states, there are many overlapping meanings for the same term and it can be tricky for a newbie to learn all of the terminology we use…

Technical

A Text File Disguised as a Load File

ByAmy Bowser-Rollins 12/29/201106/01/2014

One of the trickiest things for a newbie in litigation support to learn is the terminology we use when discussing load files. There are many overlapping meanings and it can be quite confusing until the day it “just clicks”. Like many learning curves, the big picture can be difficult to grasp when we are focused…

Technical

An Advantage of Former IT in Litigation Support

ByAmy Bowser-Rollins 08/15/2014

People that transition from IT to litigation support have an interesting advantage. I have noticed it time and time again. Much of the technical work we perform in litigation support can be repetitive, step by step, “just get it done” tasks. However, some technical work can be trial and error. Some examples would be: 1….

Technical

Concatenate – Fancy Word, Simple Meaning

ByAmy Bowser-Rollins 04/09/201508/28/2015

Concatenate is one of many fancy words we learn while working in litigation support. It is a technical term and it is a verb defined as “to link (things) together in a chain or series.” Most of the time we are using concatenation to piece together multiple values. We also have the ability to insert additional values…

4 Comments

RpTheHotrod says:

10/05/2016 at 11:50 am

I absolutely love using RegEx. A great fast way to grab data with delimiters and grouping them is the use of (.*?)
For example, a load file of
“Info”,”More Info”,”Even More Info”
could be quickly identified as
^”(.*?)”,”(.*?)”,”(.*?)”$
(^ = beginning of line $ = ending of line)

The contents of each set of information within parenthesis are assigned to group numbers which can be manipulated by replacing. This is based on the order they appear via the usage of /# where # is a number of a group. The contents in the first (.*?) is group 1, the second is group 2, and the third is group 3.

From there, you can swap out all you want. If you wanted the contents of the third group to be in the place of the first, and the contents of the first group to be in the place of the third group, you’d

SEARCH:
^”(.*?)”,”(.*?)”,”(.*?)”$

REPLACE:
“3”,”2″,”1″

This tends to be handy when having to swap out dates.
2016-10-05

This is YYYY-MM-DD, but what if you wanted it MM-DD-YYYY?

SEARCH:
(.*?)-(.*?)-(.*?) —-> 2016 is group 1, 10 is group 2 and 5 is group 3
REPLACE:
2-3-1

Once you get more savvy, you can be more efficient with something like
(d{4})[.- \]?(d{1,2})[.- \]?(d{1,2})
This finds a pattern of 4 digits followed by the usual suspects for date delimiters (period, dash, space, backslash, and no delimiter), then 1 or 2 digits, then delimiters, then one more set of 1 or 2 digits.

NOTE: Not all text editors support features such as “grouping”. I personally prefer EditPad Pro for my RegEx work.

Anyway, great article! RegEx is fantastic!

-Jared

Reply
1. Amy Bowser-Rollins says:
  
  10/05/2016 at 5:34 pm
  
  Wow, Jared, you could teach a course on regular expressions! Thanks so much for sharing your knowledge.
  
  Reply
mgolab says:

10/05/2016 at 2:58 pm

Excellent, Amy. They look daunting when you see someone else’s example, however as you say, they are straight forward once you understand what they are doing.

Reply
1. Amy Bowser-Rollins says:
  
  10/05/2016 at 5:36 pm
  
  Matthew – your recent email about this topic reminded me that I had a half-written article to finish. Ha!
  
  Reply

Regular Expressions Not Equal to a Smile or a Frown

(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}

Overview of Database Data Types

Regular Expressions – Searching for SSN Numbers Across Text Files

An Index Disguised as a List

A Text File Disguised as a Load File

An Advantage of Former IT in Litigation Support

Concatenate – Fancy Word, Simple Meaning

4 Comments

Leave a Reply Cancel reply

(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}

Similar Posts

4 Comments

Leave a Reply Cancel reply