4.4. Regular Expressions

Throughout this project regular expressions have formed the foundation of user input validation. A regular expression is a pattern with which a programmer can specify the exact syntax of valid input for a particular portion of a program. Perl's regular expression capabilities are amongst the best of any programming language out there, due in part to it's focus on text processing. By using these regular expressions CsvSQL can validate each user inputted command and ensure that it meets the particular criteria associated with that command before passing it on to the subroutine that will use the values in a command to manipulate the actual file data.

In fact, not only do the regular expressions used here check the syntax of user input, they also pick out the actual conditions and expressions that will be used by various subroutines, and assign these expressions to variable names that will later be passed to these subroutines. This is done by capturing expressions which fall inside parenthesis and assigning their value to Perl defined variables. These captured expressions will form the parameters for CsvSQL's subroutines. The grouping methodology will be explained in more detail later on.

The only way to completely explain a regular expression is with an example. As such, below is an example of one of the four regular expressions used to check the syntax of the SELECT statement before the command is allowed execute.

Example 4-6. WHERE Statement Regular Expression


m/\s*((?:.+?)|"(?:.+?)")\s+where\s+(.+?)\s*([<>!]=|[<>=])\s*(.+?)\s+
                        (and|or)\s+(.+?)\s*([<>!]=|[<>=])\s*(.+?)$/i
				
The WHERE statement regular expression will analyse user input from the filename of a SELECT to the end of the statement. For example, if the entered command followed the syntax SELECT * FROM file_name WHERE col_name = expr AND col_name = expr this regular expression would check the validity of the statement from the file_name on, inclusive. In order to explain this regular expression it is best to dissect it into it's constituent parts.