In this post, I evidence how to convert string variables to numeric in Stata. String variables are shown in ruby . If a numeric variable is stored as a cord variable in Stata, we take several ways to convert them to numeric variables. Allow'southward start with the destring command first.

one. The destring command

The destring command might be the starting time pick for converting string variables to numeric if we have a limited number of non-numeric characters. With this command, nosotros can either generate a new variable or supervene upon the existing i. Hither is an example:

                    *Outset create a dummy information                    input str15(price return) "120.25" ".ten" "122.25" ".12" terminate                    *Now destring the two variables                    destring                    price,                    supercede                    destring                    return,                    replace                  

1. The ignore option of destring

If there are not-numeric characters in our dataset, destring control will bear witness an mistake

non-numeric characters found

For case, our data might have comma separators, therefore, destring will generate the higher up fault. In such cases, we can use the ignore(",") selection, which tells Stata that the given grapheme i.east. " ," should be ignored. See the following example:

                    *First create a dummy information                                        input str15(price render) "12,000.25" ".ten" "12,200.25" ".12" end                    *Now destring the two variables                    destring                    price,                    supplant ignore(",")

1.two List all non-numeric characters

We can listing all not-numeric characters using the tabulate command and the real() part. Suppose that our variable strvar contains non-numeric values

                    tabulate                    strvar                    if                    missing(real(strvar                    ))

Suppose that the above code comes upward with a list of the following non-numeric characters.

                    tabulate                    strvar                    if                    missing(existent(strvar))       strvar |      Freq.     Percent        Cum. ------------+-----------------------------------      #mistake |          ane        0.00        0.00         #na |          one        0.00        0.00           . |    171,106      100.00      100.00         14* |          1        0.00      100.00 ------------+-----------------------------------                  

The higher up tables shows that there are three non-numeric variables in our dataset. These are #fault, #na, and *. We can specify these in the ignore option. The post-obit code creates a new variable numvar

                    destring                    strvar,                    ignor("#error" "#na" "*")                    gen(numvar)                  

2. The existent() function

The destring command is useful in a sense that it does not catechumen information to missing observations. Instead, it gives you an error message when in that location are not-numeric characters in the variable. If yous are sure that observations with non-numeric characters are non needed , you can utilise the real() part with generate control.

                    generate                    newvar =                    real(strvar)

The above code uses a brute force. It converts string values to numeric values. And if there are non-numeric characters? Those observations are set to missing values.