Getting Familiar with Dart

String Processing


Learning Objectives

  • You know how to work with strings in Dart and know how to iterate over characters in a string.
  • You can look for occurrences in a string and know how to use regular expressions to find patterns in strings.
  • You can extract data from strings using regular expressions.

Strings are a basic data type in many programming languages. They are used to represent text. As source code is text, understanding how to work with strings is essential when designing and implementing with programming languages.

Strings in Dart

Strings in Dart are represented by the String class, which provides methods and properties for working with strings. The shorthand for creating a string is providing a string within quotes (") or single quotes ('), while multi-line strings can be created using triple quotes (''' or """).

Not all programming languages have a built-in string data type. For example, in C, strings are represented as arrays of characters, and there are no built-in string operations.

The String class has a property length that returns the number of characters in the string, and individual characters can be accessed using the index operator [] that takes an integer index and returns the character at that index.

The following example demonstrates how to create a string, print its length, and show the character at index 1.

Run the program to see the output

Like in many other programming languages, strings in Dart are immutable, which means that they cannot be changed after they have been created. This is done for performance reasons, as it allows the compiler to optimize string operations. When you modify a string, you are actually creating a new string with the modified content.

String encoding

Strings in Dart are encoded using UTF-16, which is a variable-length encoding that uses one or two 16-bit code units to represent characters. This allows Dart to represent characters from different languages and scripts, as well as special characters, such as emojis. As an example, a string like "🎯" is represented using two 16-bit code units.

This means that iterating over them using the index operator [] may not work as expected, as some characters are represented using multiple code units. This is shown below.

Run the program to see the output

Dart provides a property runes that returns an iterable of Unicode code points in the string. The code points are represented as integers, and you can convert them to characters using the String.fromCharCode method.

In the following, we iterate over the runes, printing each using the String.fromCharCode method.

Run the program to see the output

There is also a convenience library characters that provides an extension that allows iterating over the characters. The use of the characters package is shown below.

Run the program to see the output

In these materials, for simplicity, we assume that the strings do not contain characters that are represented using multiple code units. If you are working with strings that contain such characters, you should use the runes property or the characters package when working with strings.

Parsing strings to numbers

Dart provides methods for parsing strings to numbers. The int.parse method is used to parse a string to an integer, while the double.parse method is used to parse a string to a double. If the string cannot be parsed, an exception is thrown.

The following example shows how to parse a string to an integer and a double.

Run the program to see the output

If you are unsure whether the string can be parsed, you can use the tryParse method, which returns null if the string cannot be parsed. The following example shows how to use the tryParse method to parse a string to an integer and a double.

Run the program to see the output

Integers and doubles inherit from the num class, which provides methods for working with numbers. If it is unclear what the type of the number is, you can use the num class to work with the number. The num class also provides a method for parsing strings to numbers, num.parse. The following example shows how to use the num.parse method to parse a string to a number.

Run the program to see the output

Looking for occurrences in string

Checking whether a string contains another string can be done with the contains method, which returns a boolean. If we would want to, for example, count the number of vowels in a string, we could use the following code.

Run the program to see the output

The contains method is case-sensitive, so if we would want to e.g. count both upper and lower case vowels, we would need to convert the string to lower or upper case before checking. Converting to upper case can be done with the method toUpperCase, while converting to lower case is done with toLowerCase.

In the following, we count the number of vowels in a string, regardless of the case of the characters.

Run the program to see the output

Loading Exercise...

While the above show some examples for looking for occurrences in a string, Dart provides a handful of other methods for for working with strings. Some of the most commonly used methods are:

  • startsWith and endsWith to check if a string starts or ends with another string.
  • indexOf and lastIndexOf to find the index of a substring in a string.
  • split to split a string into a list of substrings.
  • trim to remove leading and trailing whitespace.
  • replaceAll to replace all occurrences of a substring with another string.

In addition to the above, when going beyond string manipulation into looking for patterns, regular expressions are the way to go.

Regular expressions

Regular expressions are a pattern-matching language that allows you to search for patterns in strings. They are used in a wide range of applications, both within programming and outside of it — as an example, in addition to programming, regular expressions are used in command line tools, text editors, and spread sheets.

Finding literals

Dart provides a RegExp class that allows defining and working with regular expressions. The most basic regular expression would be a literal, such as a, which matches the character a. The following example shows how to use a regular expression to find all vowels in a string.

Run the program to see the output

Above, we use the hasMatch method to check if the string contains the character “a”. The method returns a boolean that indicates whether the regular expression matches the string.

Note that the regular expression is defined as a raw string, which is done by prefixing the string with r. This is done to avoid having to escape backslashes in the regular expression.

The string literals can be of any length, and they can contain any characters. The following example shows how to use a regular expression to find the string “ello” in a string.

Run the program to see the output

Regular expressions come with modifiers, which allow you to specify options for the regular expression. The most commonly used modifier is the i modifier, which makes the regular expression case-insensitive.

In Dart, the modifier is specified as an optional parameter caseSensitive in the RegExp constructor, as shown below.

Run the program to see the output

Character classes and alternation

If we would like to look for all vowels in a string, we can use character classes, which are defined using square brackets []. The following example shows how to use a regular expression to check if a string contains a vowel.

Run the program to see the output

The above could have alternatively been written with alternation, which is used to match one of several alternatives. Alternation is defined as a vertical bar |. The following example shows how to use alternation to check if a string contains a vowel.

Run the program to see the output
Loading Exercise...

Special character sequences

In addition to literals and character classes, regular expressions allow using escaped character sequences that match specific characters. Some of the most commonly used special character sequences are \d that matches digits, \w that matches word characters, and \s that matches whitespace.

The following example shows a program that checks whether a string contains a digit.

Run the program to see the output

Try the above program out without the r prefix to see what happens. As mentioned earlier, the r prefix is used to create a raw string, which means that the backslashes are not interpreted as escape characters.

Loading Exercise...

The special character sequences have also their negated versions, which are written with an uppercase letter. For example, \D matches any character that is not a digit, \W matches any character that is not a word character, and \S matches any character that is not whitespace.

The following example shows how to use a regular expression to check if a string contains a non-digit character.

Run the program to see the output

In regular expressions, the character . is a special character that matches any character except a newline. If we would wish to just look for the occurrence of the character . in a string, we would need to escape it with a backslash. The following example shows how to use a regular expression to check if a string contains the character ..

Run the program to see the output

Quantifiers

Quantifiers specify how many times a character or group of characters can appear. The most commonly used quantifiers are * that matches zero or more occurrences, + that matches one or more occurrences, and ? that matches zero or one occurrence.

The following example shows how to use a regular expression quantifiers to check whether a string contains “hooray!” with one or more “o” characters and zero to an infinite number of exclamation marks.

Run the program to see the output

The following example shows how to use regular expression quantifiers to check if a string contains a digit with decimal places — the regular expression is composed of checking whether there are one or more digits followed by a dot and one or more digits.

Run the program to see the output

Quantifiers can also be used to define a number of times that a character or character group should appear. This is done with curly brackets {}. For example, the regular expression a{2} matches the character a repeated 2 times, while the regular expression \d{6} matches a digit repeated 6 times.

In the following, we use a regular expression to check if a string contains six decimals followed by a dash.

Run the program to see the output

When you modify the above example to have seven digits, the regular expression still matches. This is because the method hasMatch checks if the regular expression matches any part of the string.

Loading Exercise...

Non-greedy quantifiers

By default, quantifiers are greedy, which means that they match as many characters as possible. If you would want to match as few characters as possible, you can use the non-greedy quantifiers *?, +?, and ??.

Anchors

To match the position of a string, rather than the characters in the string, anchors are used. The most commonly used anchors are ^ that matches the beginning of a string, and $ that matches the end of a string.

If we would want to check that a given string contains exactly six decimals followed by a dash, and that the sequence appears at the beginning of the string, we could use the following regular expression.

Run the program to see the output

Loading Exercise...

Groups and extracting matches

Groups are used to group characters together, which allows applying quantifiers to the group. Groups are defined using parentheses (). The following example shows how to use a regular expression to check if a string contains a word followed by a space and a digit.

Run the program to see the output

If we would wish to extract the matches from the string, we use the allMatches method, which returns an iterable of Match objects. The Match object contains the start and end indices of the match, as well as the matched string.

The following example shows how to use the allMatches method to extract a pair consisting of a word and a digit, with a whitespace in between. The matches are then printed.

Run the program to see the output

The group method of the match takes an integer index and returns the matched string for the group at that index. The index 0 returns the entire match, while the index 1 returns the first group, index 2 the second group, and so on.

The group method is nullable, which means that it returns null if the group does not exist. If you are sure that the group exists, you can add the ! operator to the end of the group method call to assert that the value is not null.

If we would wish to, for example, extract all decimal numbers from the text, we could modify the regular expression to match one or more digits followed by a dot and zero or more digits. The following example shows how to extract all decimal numbers from a string.

Run the program to see the output

Alternatively, if we do not know whether the decimal number has a decimal point, we could modify the regular expression to match one or more digits followed by a dot and zero or more digits, or just one or more digits. The following example shows how to extract all decimal numbers from a string, regardless of whether they have a decimal point.

Run the program to see the output

Loading Exercise...

The RegExp class also has a method firstMatch that returns the first match in the string, if one exists, otherwise null (i.e., the method returns a nullable value), while the method matchAsPrefix that is given both a string and an index, and returns a match if the string starts with the match at the given index, otherwise null.

The following example shows how to use the firstMatch method to extract the number found from a string, and then print the number if it exists.

Run the program to see the output

The following shows the same, but using the matchAsPrefix method to extract the number found from a string, starting from index 17. The index 17 is the start of the string “678.90 Stuff”.

Run the program to see the output

Try modifying the index to 16, 18, and 19 to see how the method behaves when the index is before, at, and after the start of the match.


Regular expressions and generative AI

If you are unfamiliar with the regular expression syntax, it may seem a bit daunting at first. Complex regular expressions seem daunting also to experienced developers.

Generative AI tools are good at transforming strings from one form to another, and they can be used to generate regular expressions. Regardless, it is good to have a basic understanding of regular expressions, even if you use generative AI tools to generate them.