class06: R projects

Author

Kate Ruiz (PID: A17671200)

Background

Functions are at the heart of using R. Everything we do involves calling and using functions (from data input, analysis to results output).

All functions in R have at least 3 things:

  1. A name the thing we use to call the function.
  2. One or more input arguments that are comma separated
  3. The body, lines of code between curly brackets {} that does the work of the function.

A first function

Let’s write a silly wee function to add some numbers:

add <- function(x) {
  x + 1 
}

Let’s try it out

add(100)
[1] 101

Will this work

add(c(100,200,300))
[1] 101 201 301

Modify to be more useful and add more than just 1

add <- function(x, y=1) {
  x + y 
}
add(100, 10)
[1] 110

Will this work?

add(100,0)
[1] 100
plot(1:10, col="blue", typ="b")

log(10, base=10)
[1] 1

Note to Kate Input arguments can be either required or optional. The later have a fall-back default that is specified in the function code with an equals sign.

#add(x=100, y=200, z=300)

A second function

all functions in R look like this

name <- function(arg) {
  body
}

The sample() function in R …

sample(1:10, size=4)
[1] 3 9 5 6

Q. Return 12 numbers picked randomly from the input 1:10

sample(1:10, size=12, replace=TRUE)
 [1]  2 10  9  6  3  3  3  2  1  3  2  1

Q. Write the code to generate a random 12 nucleotide long DNA sequence?

nucleotide <- c("A", "T", "G", "C")
sample(nucleotide, size=12, replace=TRUE)
 [1] "G" "C" "C" "T" "G" "T" "A" "A" "C" "C" "G" "T"
  1. Write a first version function called generate_dna() that generates a user specified length n random DNA sequence?
name <- function(arg) {
  body
}
generate_dna <-function(n=6){
  nucleotide <- c("A", "T", "G", "C")
  sample(nucleotide, size=n, replace=TRUE)
}
generate_dna(100)
  [1] "G" "A" "G" "A" "G" "T" "A" "T" "T" "T" "A" "G" "T" "G" "A" "T" "G" "T"
 [19] "T" "G" "A" "T" "C" "T" "C" "T" "C" "T" "G" "C" "G" "G" "T" "G" "C" "A"
 [37] "T" "T" "G" "G" "C" "T" "T" "C" "A" "A" "T" "G" "T" "T" "C" "A" "G" "A"
 [55] "C" "T" "C" "C" "C" "T" "T" "C" "G" "G" "T" "G" "G" "A" "G" "C" "C" "C"
 [73] "T" "T" "A" "T" "A" "G" "C" "A" "G" "C" "G" "G" "T" "G" "G" "T" "G" "A"
 [91] "C" "A" "G" "T" "C" "T" "A" "T" "G" "T"

Q. Modify your function to return a FASTA like sequence so rather than [1] “G” “T” “T” “G” “T” “C” “G” “A” “G” “G” “A” “G” we want “GTTG…”

generate_dna <-function(n=6){
  nucleotide <- c("A", "T", "G", "C")
  sally<-sample(nucleotide, size=n, replace=TRUE)
  sally <- paste(sally, collapse="")
  return(sally)
}

by default the last thing you type is what you get

generate_dna(10)
[1] "CTCCGATAGG"

Q bingus is a hairless cat Give the user an option to return FASTA format output ssequence or standard multi-element vector format

generate_dna <-function(n=6, fasta=TRUE){
  nucleotide <- c("A", "T", "G", "C")
  sally<-sample(nucleotide, size=n, replace=TRUE)
  
  if(fasta){
    sally <- paste(sally, collapse="")
    cat("Helllooooooo")

  }
  return(sally)
}
generate_dna(10, fasta=T)
Helllooooooo
[1] "TATCACAATC"

A new col function

Q. Write a function called generate_protein() that generates a user specified length protein sequence in FASTA like format?

Q. Use your new generate_protein() function to generate sequences between length 6 and 12 amino acids in length and check if any of these are unique in nature (i.e. found in the NR database at NCBI)

generate_protein <- function(n, fasta=TRUE){
  amino_acids <- c("A","R","N","D","C",
  "E","Q","G","H","I",
  "L","K","M","F","P",
  "S","T","W","Y","V")
  more<-sample(amino_acids, size=n, replace=TRUE)
  if(fasta){
    more<-paste(more, collapse ="")
  }
  return(more)
}
generate_protein(6, T)
[1] "IQRNYN"
generate_protein(7, T)
[1] "DRGVYIW"
generate_protein(8, T)
[1] "SCWATMGQ"
generate_protein(9, T)
[1] "NTIYACMMY"
generate_protein(10, T)
[1] "ICTNKLKMFV"
generate_protein(11, T)
[1] "VHTIVQMQVSD"
generate_protein(12, T)
[1] "QSSSGPNSQSSS"

Or we could do a for() loop:

for (i in 6:12) {
  cat(i, "\n")
}
6 
7 
8 
9 
10 
11 
12 
for (i in 6:12) {
  cat(">", i,sep="", "\n")
  cat(generate_protein(i), "\n")
}
>6
DAYCYF 
>7
HRPDLGC 
>8
QSVAVKMF 
>9
NPYLFNYDK 
>10
ESTLTRHLEG 
>11
MTCNLEYNVCH 
>12
HESHETEVKLKP 

id AGKRT