bioCancer: Interactive Multi-Omics Cancers Data Visualization and Analysis

All transformations applied in the Tools > Transform tab can be logged. If, for example, you apply a log transformation to numeric variables the following code is generated and put in the Transform command log window at the bottom of your screen when you click the Store button.

## transform variable
r_data[["epiGenomics"]] <- mutate_each(r_data[["epiGenomics"]], funs(log), ext = "_log", mRNA, Met450)

This is an important feature if you need to recreate your results or you want to re-run a report with new, but similar, data. Even more important is that there is a record of the steps taken to generate all results.

To add commands contained in the command log window to a report in R > Report click the icon.

If a filter has been specified it will be ignored for (most) functions available in Tools > Transform. To create a new dataset based on a filter navigate to the Workspace > View tab and click the Store button. Alternatively, create new dataset based on a Filter select Holdout sample from the Transformation type dropdown.

When you select Type from the Transformation type drop-down another drop-down menu is shown that will allow you to change the type (or class) of one or more variables. For example, you can change a variable of type integer to a variable of type factor. Click the Store button to change variable(s) in the data set. A description of the transformations included in bioCancer is provided below.

As factor: convert a variable to type factor (i.e., a categorical variable)
As number: convert a variable to type numeric
As integer: convert a variable to type integer
As character: convert a variable to type character (i.e., strings)
As date (mdy): convert a variable to a date if the dates are ordered as month-day-year
As date (dmy): convert a variable to a date if the dates are ordered as day-month-year
As date (ymd): convert a variable to a date if the dates are ordered as year-month-day
As date/time (mdy_hms): convert a variable to a date if the dates are ordered as month-day-year-hour-minute-second
As date/time (mdy_hm): convert a variable to a date if the dates are ordered as month-day-year-hour-minute
As date/time (dmy_hms): See mdy_hms
As date/time (dmy_hm): See mdy_hm
As date/time (ymd_hms): See mdy_hms
As date/time (ymd_hm): See mdy_hm

When you select Transform from the Transformation type drop-down another drop-down menu is shown that will allow you to apply common transformations to one or more variables in the data. For example, to take the (natural) log of a variable select the variable(s) you want to transform and choose Log from the Apply function drop-down. A new variable is created with the extension specified in the 'Variable name extensiontext input (e.g,._log). Make sure to pressreturnafter changing the extension. Click theStore` button to add the variable(s) to the data set. A description of the transformation functions included in bioCancer is provided below.

Ln: create a natural log-transformed version of the selected variable (i.e., log(x) or ln(x))
Square: multiply a variable by itself (i.e., x^2 or square(x))
Square-root: take the square-root of a variable (i.e., x^.5)
Absolute: Absolute value of a variable (i.e., abs(x))
Center: create a new variable with a mean of zero (i.e., x - mean(x))
Standardize: create a new variable with a mean of zero and standard deviation of one (i.e., (x - mean(x))/sd(x))
Invert: 1/x
Median split: create a new factor with two levels (Above and Below) that splits the variable values at the median
Deciles: create a new factor with 10 levels (deciles) that splits the variable values at the 10th, 20th, ..., 90th percentiles.

Choose Create from the Transformation type drop-down. This is the most flexible command to create new or transformed variables. However, it also requires some basic knowledge of R-syntax. A new variable can be any function of other variables in the (active) dataset. Some examples are given below. In each example the name to the left of the = sign is the name of the new variable. To the right of the = sign you can include other variable names and basic R-functions. After you type the command press return to create the new variable. If the result is as expected press Store to add it to the dataset.

Note: If one or more variables is selected from the list in Select variables they will be used to group the data before creating the new variable (see example 1. below). If this is not the intended result make sure that no variables are selected when creating the new variable

Create a new variable p that is equal to the mean of price. To calculate the mean of price per group (e.g., per level of clarity) simply select clarity from the Select variables list

z = mean(x)
Create a new variable z that is the difference between variables x and y

z = x - y
Create a new variable z that is a transformation of variable x but with mean equal to zero (note that this transformation is also available in the Transform drop-down as Center):

z = x - mean(x)
Create a new logical variable z that takes on the value TRUE when x > y and FALSE otherwise

z = x > y
Create a new logical z that takes on the value TRUE when x is equal to y and FALSE otherwise

z = x == y
Create a variable z that is equal to x lagged by 3 periods

z = lag(x,3)
Create a categorical variable with two levels

z = ifelse(x < y, 'smaller', 'bigger')
Create a categorical variable with three levels. An alternative approach would be to use the Recode function described below

z = ifelse(x < 60, '< 60', ifelse(x > 65, '> 65', '60-65'))
Convert an outlier to a missing value. For example, if we want to remove the maximum value from a variable called xmRNA that is equal to 400 we could use an ifelse statement and enter the command below in the Create box. Press return and Store to add the new xmRNA_rc variable. Note that if we had entered xmTNA on the left-hand side of the = sign the original variable would have been overwritten

xmRNA_rc = ifelse(xmRNA > 400, NA, sales)

Similarly, if a respondent with ID 3 provided information in the wrong scale on a survey (e.g., income in \$1s rather than in \$1000s) we could use an ifelse statement and enter the command below in the Create box. As before, press return and Store to add the new sales_rc variable

income_rc = ifelse(ID == 3, income/1000, income)
If multiple respondents made the same scaling mistake (e.g., those with ID 1, 3, and 15) we again use Create and enter:

income_rc = ifelse(ID %in% c(1, 3, 15), income/1000, income)
If you have a date in a format not available through the Type menu you can use the parse_date_time function. For a date formated as "2-1-14" you would specify the command below (note that this format will also be parsed correctly by the mdy function in the Type menu)

date = parse_date_time(x, "%m%d%y")
Determine the time difference between two dates/times in seconds

time_diff = as_duration(time2 - time1)
Extract the month from a date variable

month = month(date)
Other attributes that can be extracted from a date or date-time variable are minute, hour, day, week, quarter, year, wday (for weekday). For wday and month it can be convenient to add label = TRUE to the call. For example, to extract the weekday from a date variable and use a label rather than a number

weekday = wday(date, label = TRUE)
Calculate the distance between two locations using lat-long information

trip_distance = as_distance(lat1, long1, lat2, long2)
Calculate quintiles for a variable recency by using the xtile command as shown below. To create deciles replace "5" by "10".

rec_iq = xtile(recency, 5)

Note: For examples 7, 8, and 15 above you may need to change the new variable to type factor before using it for further analysis (see Type above)

The Bin option is a convenience function for the xtile command mentioned above when you want to create multiple quitile/decile/... variables. To calculate quintiles enter "5" as the Nr bins. The reverse option replaces 1 by 5, 2 by 4, ..., 5 by 1. Choose an appropriate variable name extension for the new variables.

It is possible to manipulate your data in a spreadsheet (e.g., Excel or Google sheets) and copy-and-paste the data back into Radiant. If you don't have the original data in a spreadsheet already use the clipboard feature in Workspace > Datasets so you can paste it into the spreadsheet or click the download icon on the top right of your screen in the Workspace > View tab. Apply your transformations in the spreadsheet program and then copy the new variable(s), with a header label, to the clipboard (i.e., CTRL-C on windows and CMD-C on mac). Select Clipboard from the Transformation type drop-down and paste the new data into the Paste from spreadsheet box. It is key that new variable(s) have the same number of observations as the data in Radiant. To add the new variables to the data click Store.

Note: Using the clipboard feature for data transformation is discouraged because it is not reproducible.

Choose Normalize from the Transformation type drop-down to standardize one or more variables. For example, in the diamonds data we may want to express price of a diamond per-carat. Select carat as the normalizing variable and price in the Select variable(s) box. You will see summary statistics for the new variable (e.g., price_carat) in the main panel. Store changes by clicking the Store button.

To use the recode feature select the variable you want to change and choose Recode from the Transformation type drop-down. Provide one or more recode commands, separated by a ;, and press return to see the newly created variable. Note that you can specify the names for the recoded variable in the Recoded variable name input box (press return to submit changes). Finally, click Store to add the new variable to the data. Some examples are given below.

Values below 20 are set to 'Low' and all others to 'High'

lo:20 = 'Low'; else = 'High'
Values above 20 are set to 'High' and all others to 'Low'

20:hi = 'High'; else = 'Low'
Values 1 through 12 are set to 'A', 13:24 to 'B', and the remainder to 'C'

1:12 = 'A'; 13:24 = 'B'; else = 'C'
Collapse age categories for a cross-tab analysis. In the example below '<25' and '25-34' are recoded to '<35', '35-44' and '35-44' are recoded to '35-54', and '55-64' and '>64' are recoded to '>54'

'<25' = '<35'; '25-34' = '<35'; '35-44' = '35-54'; '45-54' = '35-54'; '55-64' = '>54'; '>64' = '>54'
To exclude a particular value (e.g., an outlier in the data) for subsequent analyses we can recode it to a missing value. For example, if we want to remove the maximum value from a variable called FreqMut that is equal to 102 we would (1) select the variable FreqMut in the Select variable(s) box and enter the command below in the Recode box. Press return and Store to add the recoded variable to the data

102 = NA
To recode specific numeric values (e.g., carat) to a new value (1) select the variable carat in the Select variable(s) box and enter the command below in the Recode box to set the value for carat to 2 in all rows where carat is currently larger than or equal to 2. Press return and Store to add the recoded variable to the data

2:hi = 2

Note: Never use a = symbol in a label when using the recode function (e.g., 50:hi = '>= 50') as this will cause an error.

Choose Rename from the Transformation type drop-down, select one or more variables, and enter new names for them in the rename box shown. Separate each name by a ,. Press return to see the variables with their new names on screen and press Store to alter the variable names in the original data.

Choose Replace from the Transformation type drop-down if you want to replace existing variables in the data with new ones created using, for example, Create, Transform, Clipboard, etc.. Select one or more variables to overwrite and the same number of replacement variables. Press Store to alter the data.

It is possible to manipulate your data in a spreadsheet (e.g., Excel or Google sheets) and copy-and-paste the data back into bioCancer. If you don't have the original data in a spreadsheet already use the clipboard feature in Workspace > Datasets so you can paste it into the spreadsheet or click the download icon on the top right of your screen in the Workspace > View tab. Apply your transformations in the spreadsheet program and then copy the new variable(s), with a header label, to the clipboard (i.e., CTRL-C on windows and CMD-C on mac). Select Clipboard from the Transformation type drop-down and paste the new data into the Paste from spreadsheet box. It is key that new variable(s) have the same number of observations as the data in bioCancer. To add the new variables to the data click Store.

Note: Using the clipboard feature for data transformation is discouraged because it is not reproducible.

Choose Normalize from the Transformation type drop-down to standardize one or more variables. For example, in the epiGenomics data we may want to express mRNA of a Genes per-FreqMut. Select FreqMut as the normalizing variable and mRNA in the Select variable(s) box. You will see summary statistics for the new variable (e.g., mRNA_FreqMut) in the main panel. Store changes by clicking the Store button.

If a (single) variable of type factor is selected in Select variable(s), choose Reorder/Remove levels from the Transformation type drop-down to reorder and/or remove levels. Drag-and-drop levels to reorder them or click the $\times$ to remove them. Press Store to commit the changes. To temporarily exclude levels from the data use the Filter box (see the help file linked in the Data > View tab).

Choose Reorder/Remove columns from the Transformation type drop-down. Drag-and-drop variables to reorder them in the data. To remove a variable click the $\times$ next to the label. Press Store to commit the changes.

Choose Remove missing from the Transformation type drop-down to eliminate rows with one or more missing values. If all variables are selected a row with a missing values in any column will be removed. If one or more variables are selected only those rows will be removed with missing values for the selected variables. Press Store to change the data. If missing values were present you will see the number of observations in the data summary change (i.e., the value of n changes).

It is common to have one or more variables in a dataset that should have only unique values (i.e., no duplicates). Customers id's, for example, should be unique unless the dataset contains multiple orders for the same customer. In that case the combination of customer id and order id should be unique. To remove duplicate select one or more variables to determine uniqueness. Choose Remove duplicates from the Transformation type drop-down and check how the summary statistics change. Press Store to change the data. If there are duplicate rows you will see the number of observations in the data summary change (i.e., the value of n and n_distinct will change).

If there are duplicates in the data use Show duplicates to get a better sense for the data points that have the same value in multiple rows. If you want to explore duplicates using the View tab make sure to Store them in a different dataset (i.e., make sure not to overwrite the data you are working on). If you choose to show duplicates based on all columns in the data only one of the duplicate rows will be shown. These rows are exactly the same so showing 2 or 3 isn't helpful. If, however, we look for duplicates based on a subset of the available variables Radiant will generate a dataset with all rows that are deemed similar.

To create a variable that can be used to randomly filter a dataset for analysis (or training) and holdout, select Training variable from the Transformation type dropdown. Specify either the number of observations to use for training (i.e., Size > 1) or a proportion of observations to select (i.e., Size > 1). The new variable will have a value "1" for training and "0" holdout.

To create a holdout sample based on (the reverse of) a filter select Holdout sample from the Transformation type dropdown. By default the opposite of the active filter is used. For example, is analysis is conducted on all observations where data < "2015-1-1" then the holdout sample will be based on the filter data >= "2015-1-1" if the Reverse filter box is checked.

kmezhoud/bioCancer documentation built on Feb. 17, 2024, 10:29 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kmezhoud/bioCancer
Interactive Multi-Omics Cancers Data Visualization and Analysis

inst/app/tools/help/transform.md
In kmezhoud/bioCancer: Interactive Multi-Omics Cancers Data Visualization and Analysis

Transform command log

Filter

Type

Transform

Create

Bin

Clipboard

Normalize

Recode

Rename

Replace

Clipboard

Normalize

Reorder or remove levels

Reorder or remove columns

Remove missing values

Remove duplicates

Show duplicates

Training variable

Holdout sample

R Package Documentation

Browse R Packages

We want your feedback!

kmezhoud/bioCancer Interactive Multi-Omics Cancers Data Visualization and Analysis

inst/app/tools/help/transform.md In kmezhoud/bioCancer: Interactive Multi-Omics Cancers Data Visualization and Analysis

Transform command log

Filter

Type

Transform

Create

Bin

Clipboard

Normalize

Recode

Rename

Replace

Clipboard

Normalize

Reorder or remove levels

Reorder or remove columns

Remove missing values

Remove duplicates

Show duplicates

Training variable

Holdout sample

R Package Documentation

Browse R Packages

We want your feedback!

kmezhoud/bioCancer
Interactive Multi-Omics Cancers Data Visualization and Analysis

inst/app/tools/help/transform.md
In kmezhoud/bioCancer: Interactive Multi-Omics Cancers Data Visualization and Analysis