add <- function(x) {
x + 1
}class06: R projects
Background
Functions are at the heart of using R. Everything we do involves calling and using functions (from data input, analysis to results output).
All functions in R have at least 3 things:
- A name the thing we use to call the function.
- One or more input arguments that are comma separated
- The body, lines of code between curly brackets {} that does the work of the function.
A first function
Let’s write a silly wee function to add some numbers:
Let’s try it out
add(100)[1] 101
Will this work
add(c(100,200,300))[1] 101 201 301
Modify to be more useful and add more than just 1
add <- function(x, y=1) {
x + y
}add(100, 10)[1] 110
Will this work?
add(100,0)[1] 100
plot(1:10, col="blue", typ="b")
log(10, base=10)[1] 1
Note to Kate Input arguments can be either required or optional. The later have a fall-back default that is specified in the function code with an equals sign.
#add(x=100, y=200, z=300)A second function
all functions in R look like this
name <- function(arg) {
body
}
The sample() function in R …
sample(1:10, size=4)[1] 3 9 5 6
Q. Return 12 numbers picked randomly from the input 1:10
sample(1:10, size=12, replace=TRUE) [1] 2 10 9 6 3 3 3 2 1 3 2 1
Q. Write the code to generate a random 12 nucleotide long DNA sequence?
nucleotide <- c("A", "T", "G", "C")
sample(nucleotide, size=12, replace=TRUE) [1] "G" "C" "C" "T" "G" "T" "A" "A" "C" "C" "G" "T"
- Write a first version function called
generate_dna()that generates a user specified lengthnrandom DNA sequence?
name <- function(arg) {
body
}
generate_dna <-function(n=6){
nucleotide <- c("A", "T", "G", "C")
sample(nucleotide, size=n, replace=TRUE)
}generate_dna(100) [1] "G" "A" "G" "A" "G" "T" "A" "T" "T" "T" "A" "G" "T" "G" "A" "T" "G" "T"
[19] "T" "G" "A" "T" "C" "T" "C" "T" "C" "T" "G" "C" "G" "G" "T" "G" "C" "A"
[37] "T" "T" "G" "G" "C" "T" "T" "C" "A" "A" "T" "G" "T" "T" "C" "A" "G" "A"
[55] "C" "T" "C" "C" "C" "T" "T" "C" "G" "G" "T" "G" "G" "A" "G" "C" "C" "C"
[73] "T" "T" "A" "T" "A" "G" "C" "A" "G" "C" "G" "G" "T" "G" "G" "T" "G" "A"
[91] "C" "A" "G" "T" "C" "T" "A" "T" "G" "T"
Q. Modify your function to return a FASTA like sequence so rather than [1] “G” “T” “T” “G” “T” “C” “G” “A” “G” “G” “A” “G” we want “GTTG…”
generate_dna <-function(n=6){
nucleotide <- c("A", "T", "G", "C")
sally<-sample(nucleotide, size=n, replace=TRUE)
sally <- paste(sally, collapse="")
return(sally)
}by default the last thing you type is what you get
generate_dna(10)[1] "CTCCGATAGG"
Q bingus is a hairless cat Give the user an option to return FASTA format output ssequence or standard multi-element vector format
generate_dna <-function(n=6, fasta=TRUE){
nucleotide <- c("A", "T", "G", "C")
sally<-sample(nucleotide, size=n, replace=TRUE)
if(fasta){
sally <- paste(sally, collapse="")
cat("Helllooooooo")
}
return(sally)
}generate_dna(10, fasta=T)Helllooooooo
[1] "TATCACAATC"
A new col function
Q. Write a function called
generate_protein()that generates a user specified length protein sequence in FASTA like format?
Q. Use your new
generate_protein()function to generate sequences between length 6 and 12 amino acids in length and check if any of these are unique in nature (i.e. found in the NR database at NCBI)
generate_protein <- function(n, fasta=TRUE){
amino_acids <- c("A","R","N","D","C",
"E","Q","G","H","I",
"L","K","M","F","P",
"S","T","W","Y","V")
more<-sample(amino_acids, size=n, replace=TRUE)
if(fasta){
more<-paste(more, collapse ="")
}
return(more)
}generate_protein(6, T)[1] "IQRNYN"
generate_protein(7, T)[1] "DRGVYIW"
generate_protein(8, T)[1] "SCWATMGQ"
generate_protein(9, T)[1] "NTIYACMMY"
generate_protein(10, T)[1] "ICTNKLKMFV"
generate_protein(11, T)[1] "VHTIVQMQVSD"
generate_protein(12, T)[1] "QSSSGPNSQSSS"
Or we could do a for() loop:
for (i in 6:12) {
cat(i, "\n")
}6
7
8
9
10
11
12
for (i in 6:12) {
cat(">", i,sep="", "\n")
cat(generate_protein(i), "\n")
}>6
DAYCYF
>7
HRPDLGC
>8
QSVAVKMF
>9
NPYLFNYDK
>10
ESTLTRHLEG
>11
MTCNLEYNVCH
>12
HESHETEVKLKP
id AGKRT