Data-Cleaning Demo

This demo shows a fairly advanced application of the VSFlexString control. It uses a pattern with tags to retrieve information and automatically parse each match.

Tags are created by enclosing parts of the Pattern string in curly braces. By tagging the matches, you can determine which parts of the string matched what parts of the pattern.

For example, say we have a database that contains customer names. But the same name may be stored as "John Doe", "John Francis Doe", "John F. Doe", or "Doe, John", depending on who did the data entry. We want to clean the data, converting all variations to the latter type ("Last, First").

Here is a small function that will accomplish this task:

    Private Function CleanName(name$) As String

        fs.Text = name

        fs.Pattern = "^{[A-Za-z]+}[^,]* {[A-Za-z]+}$"

        If fs.MatchCount > 0 Then

            CleanName = fs.TagString(1) & ", " & fs.TagString(0)

        Else

            CleanName = name

        End If

    End Function

The Pattern string needs some explanation. The first part, "^{[A-Za-z]+}", will match sequences of letters that start at the beginning of the string. This will be a first name or a last name. The second part, "[^,]*" will match any sequence of characters not including a comma and followed by a space. This will match the space between a first and a last name, and also the optional middle name. However, the pattern will not match names already in the "Last, First" format because of the comma. Finally, the "{[A-Za-z]+}$" part will match the last name.

Notice how the parts of the Pattern that match the first and last names are enclosed in curly brackets. This allows us to retrieve their values and replace names in "First Last" and "First Middle Last" format with the "Last, First" format we want. This is accomplished using the TagString property.

You may test the function using the Visual Basic debug (immediate mode) window:

    ? CleanName("John Doe")

    Doe, John

    ? CleanName("John    Doe")

    Doe, John

    ? CleanName("Doe, John")

    Doe, John

    ? CleanName("John F. Doe")

    Doe, John

    ? CleanName("John Francis Doe")

    Doe, John

    ? CleanName("John Francis Jr.")

    John Francis Jr.

The function works as expected. Note that the last try fails, because the last name is not supposed to contain periods. In this case, the function just returns the original string, which seems like a reasonable thing to do.

Writing the patterns is not difficult, but it does require some practice. This sample is a good starting point.