Return home

isNumber Function in C++

Let’s assume we need to import records from a text file. Here is an example of such record:

1234, Hakan Haberdar, 3.75, Houston, 77001

After reading the record line, probably, the next step is to convert strings “1234”, “3.75”, and “77001” to corresponding numeric types. You may first want to find out whether a given string contains a valid number. This is a very common problem, and in Java there is a function called NumberUtils.isNumber that checks whether a string contains a valid number. However, there is no such a function in standard C++. Instead, you can follow different approaches. For example, you can use non-standard (but almost standard) Boost lexical_cast function. Or, you can use some standard C++ facilities to design such functionality. In a nutshell, it is a good practice to write such function that tests whether a string variable contains a valid number. Please keep in mind that there may be a lot of ways, but we here follow a simple approach just for educational purposes.

First of all, let’s define what a valid number is for us.

What is a Valid Number?

We divide the numbers into two categories: integers and floating point numbers.

Integers

Here are the example of valid integers:

·        1

·        12

·        123

·        1234

·        1,234

If one wants to use a comma to separate groups of thousands, s/he should follow the following rules.

Rule 1 for Integers Having Comma as Thousand Separator:

If there is one comma in the number, the distance between the comma and the first digit cannot be larger than 3. For example, 1000,000 is not a valid number.

1

0

0

0

,

0

0

0

0

1

2

3

4

5

6

7

 

40 = 4 

For example, 10,000 is a valid number.

Rule 2 for Integers Having Comma as Thousand Separator:

Similarly, the distance between the comma and the last digit should be larger than 2. For example, 1,00 is not a valid number.

1

,

0

0

0

1

2

3

 

31 = 2 

For example, 1,000 is a valid number.

Floating Point Numbers

Here are the example of valid floating point numbers:

·        1.0 (there must be one period)

·        0.1

·        1.23

·        1,234.0

Similar to above, if you want to use a comma to separate groups of thousands, there are some rules you need to follow.

Rule 1 for Floating Points Numbers Having Comma as Thousand Separator:

If there is one comma in the number, the distance between the comma and the first digit cannot be larger than 3. For example, 1234,567.8 is not a valid number.

1

2

3

4

,

5

6

7

.

6

0

1

2

3

4

5

6

7

8

9

 

40 = 4 

For example, 10,000.1 is a valid number.

Rule 2 for Floating Points Numbers Having Comma as Thousand Separator:

Similarly, the distance between the comma and the period should be larger than 3. For example, 1,24.5 is not a valid number.

1

,

2

4

.

5

0

1

2

3

4

5

 

41 = 3 

For example, 1,234.5 is a valid number.

Rule 3 for Floating Points Numbers Having Comma as Thousand Separator:

There should not be a comma after the period. For example, 1.000,000 is not a valid number.

Rule 4 for BOTH Integers and Floating Points Numbers Having Comma as Thousand Separator:

This rule applies to both types of numbers. If there is more than one comma, the distance between 2 consecutive commas should be 4. Please note that we check this if there are at least 2 commas in the string. For example:

1

,

2

3

4

,

5

6

7

.

9

0

1

2

3

4

5

6

7

8

9

10

 

51 = 4 

 

1

,

2

3

4

,

5

6

7

0

1

2

3

4

5

6

7

8

 

51 = 4 

As you see here, the most difficult part of the problem is to parse the comma separator.

Time Complexity

We do know that there may be several different algorithms for this problem. Our goal is to develop an algorithm whose time complexity is O(n), where n is the number of characters in the string (i.e., the length). This is called linear time complexity. If you like, you can use some predefined functions in C++ and work on other approaches.

The First Character of the String

We first should focus on the first character of the string. We can categorize all possible cases for the first character into 4 groups:

1.     Digit

2.     + (unary plus) symbol

3.     – (unary negative) symbol

4.     If the character is anything else, this is not a valid number.

Rest of the Characters

We will process the rest of the string character by character. Please note that we can categorize all possible cases for the remaining characters into 4 different groups:

1.     Digit

2.     . (period) symbol (must be at most one, cannot be the last character)

3.     , (comma) symbol (read the rules above)

4.     If the character is anything else, this is not a valid number.

The golden rule is simple, if we come across a character that makes the string an invalid number, we stop processing the characters.

Implementation of the Function

After reading this tutorial, you will probably stop using the thousand separator. J There are different constraints here. If you read the constraints (rules) above, you will see that we can check some of the constraints (i.e., rules) while we are processing the string, and for some, we can check them only when we finish processing the string.

Constraints that can be checked at the end of the processing:

·        Period cannot be the last character.

·        The distance between the comma and the last digit of an integer should be larger than 2.

·        +, +.2, –.12 or – are not valid numbers. So, there must be at least one integral digit.

Please note that we can check the distance rule between the comma and the period during the processing because period cannot be the last character of a valid number. So, we will now work on the constraints one by one starting with the first character. But, the first thing is to declare function prototype.

bool isNumber(const string &text);

First Character

Let’s remember the constraints for the first character:

1.     It may be a digit. If so, we should count the number of integral part because we need this information for the distance rule between the first digit and the comma.

2.     It may be + symbol. If so, we do not do anything.

3.     It may be – symbol (negative sign). If so, we do not do anything.

4.     If the first character is anything else, this is not a valid number. There is no need to continue.

We can create 2 if branches that satisfy all. We will first check if the string is empty.

   if(text.length()==0)

   {

      return false;

   }

   int intCount=0;

   if( text[0] >= '0' && text[0] <= '9')

   {

      intCount++;

   }

   else if( text[0] != '+' && text[0] != '-')

   {

      return false;

   } 

 

It is hard to believe but that is it.

The Rest of the Characters

Here is the long list of the constraints for the rest of the characters:

1.     The character can be a digit. If so, we should count the number of integral part because we need this information for the distance rule between the first digit and the comma. We need to keep in mind that we count digits before the period (i.e., the integral part). So, we need the check if we have seen a period. So, we actually need a variable that we store the number of periods.

2.     The character can be a period, but there must be at most one. So, we indeed need a variable that we store the number of periods. In the meantime, because the period indicates the end of integral part, we have to check the last comma rule. This means that we need another variable to keep track of the position of the current comma. Please keep in mind that we check constraints for comma and period if they do exist in the string. Moreover, the period cannot be the last character, but we have to wait for this constraint. For now, we will save the position of the period.

3.     The character can be a comma (, symbol). There are very important constraints for the use of the comma separator. If there is more than one comma, the distance between 2 consecutive commas should be 4.  This means that we need second variable to keep track of the position of the previous comma. There cannot be comma after the period. Moreover, the distance between the comma and the last digit of an integer should be larger than 2, but we have to wait for this constraint.

4.     If the character is anything else, this is not a valid number. We simply return false.

This case is more complicated and we need exactly 4 if branches that satisfy all:

   int dotCount=0, commaCount=0, previousCommaPosition=-1, currentCommaPosition=-1, dotPosition=-1;

   for(int c=1; c<text.length(); c++)

   {

      if( text[c] >= '0' && text[c] <= '9')

      {

         if(dotCount==0)

          {

             intCount++;

          }

      }

      else if( text[c]=='.')

      {

         dotCount++;

          dotPosition = c;

         if( dotCount > 1)

         {

             return false;

         }

         if( commaCount>0)

          {  // check the last comma for floating point numbers

             if(dotPosition-currentCommaPosition!=4) // must be ,000.00 | 0,00.00 false

             {

                return 0;

             }

          }

      }

      else if( text[c]==',')

      {

         if(dotCount>0)

          {

             return false; // no comma after dot

          }

         commaCount++;

          previousCommaPosition = currentCommaPosition;

          currentCommaPosition = c;

          

          if(commaCount>1)

          {   // If this is not the first  comma,

              // the distance between two consecutive commas should be 4

              if(currentCommaPosition-previousCommaPosition!=4)

              {

                 return false;

              }

          }

          else if(commaCount==1)

          {  // If this is the first comma, it should be 1,0 or 10, or 100,

             if( intCount>3 )

             {

                return false; // : 1000,000 --> false

             }

          }

      }

      else

      {

         return false;     

      }

   }

 

Assuming that all the characters are processed and the program has reached this point, then we need to check 3 final constraints.

1.     Period cannot be the last character.

2.     The distance between the comma and the last digit of an integer should be larger than 2.

3.     +, +.2, –.12 or – are not valid numbers. So, there must be at least one integral digit.

We need 3 separate if branches:

   // + or - is not a valid number

   if( intCount == 0 )

   {

      return false;

   }  

  

   if( dotPosition>0 && text.length()-1==dotPosition )

   {

         return false;

   }

  

   // if there is no period but there is comma, we need to check the last comma's distance to the end

   if( commaCount>0 && dotCount == 0)

   {

      if( text.length()-1-currentCommaPosition != 3)// check the last comma for integers

      {

        return false;

      }

   }

 

Please click here to access the example code.

This is the end of the tutorial. I hope you enjoyed it. Please feel free to contact me at haberdar at gmail dot com.

How to Cite this Document

You can use anything here as long as you give credit as follows: 

Hakan Haberdar, "isNumber Function in C++", Computer Science Tutorials [online], (Accessed MM-DD-YYYY) Available from: http://www.haberdar.org/is-number-function-tutorial.htm