Perl Program To Calculate Gc Content

The perl programming language would be perfect for this task. Install active perl on your pc and write a perl script to extract numbers from the text file and perform your calculation. To calculate GC content %(G+C) in a given set of fasta sequences. to calculate the observed frequency normalized by the expected frequency of CpG ('CG')in a DNA sequence - to calculate and report the occurrence of 'TATA' boxes in each DNA sequence given in a fasta (nulceotide) file. Feb 02, 2015 Just so you know, the task is to open and read a.fasta file (I think I've finally nailed something pretty well, hallelujah!), read each sequence, compute the relative G+C nucleotide content, and then write to a TABDelimited file and the names of the genes and their respective G+C content. GC content is the precentage of the genome (or DNA fragment) that is “G” or “C”. To compute the GC content, we count the occurrences of the “G” and “C” alphabets, and divide by the length of the string in question. We will be using data from chr8 of the human genome version 19 from the UCSC genome repository. GC Content Calculator. The program calculates the GC content of a given DNA/RNA sequence. Enter your DNA/RNA sequence in the box below: Results. GC content: 00.00%.

  1. Perl Program To Calculate Gc Content Analysis
  2. Perl Program To Calculate Gc Content Based
  3. Perl Program To Calculate Gc Content Inventory
  4. Perl Program To Calculate Gc Content Sample
  5. Perl Program To Calculate Gc Content Analysis

Description

Calculates the fraction of G+C bases of the input nucleic acid sequence(s). It reads in nucleic acid sequences, sums the number of 'g' and 'c' bases and writes out the result as the fraction (in the interval 0.0 to 1.0) to the total number of 'a', 'c', 'g' and 't' bases. Global G+C content GC, G+C in the first position of the codon bases GC1, G+C in the second position of the codon bases GC2, and G+C in the third position of the codon bases GC3 can be computed. All functions can take ambiguous bases into account when requested.

Perl program to calculate gc content level

Usage

Arguments

a nucleic acid sequence as a vector of single characters

for coding sequences, an integer (0, 1, 2) giving the frame

logical. if TRUE force sequence characters in lower-case. Turn this to FALSE to save time if your sequence is already in lower-case (cpu time is approximately divided by 3 when turned off)

Perl Program To Calculate Gc Content

logical: if TRUE ambiguous bases are taken into account when computing the G+C content (see details). Turn this to FALSE to save time if your you can neglect ambiguous bases in your sequence (cpu time is approximately divided by 3 when turned off)

what should be returned when the GC is impossible to compute from data, for instance with NNNNNNN. This behaviour could be different when argument exact is TRUE, for instance the G+C content of WWSS is NA by default, but is 0.5 when exact is set to TRUE

arguments passed to the function GC

for coding sequences, the codon position (1, 2, 3) that should be taken into account to compute the G+C content

logical defaulting to FALSE: should the GC content computed as in seqinR <= 1.0-6, that is as the sum of 'g' and 'c' bases divided by the length of the sequence. As from seqinR >= 1.1-3, this argument is deprecated and a warning is issued.

alphabet used. This allows you to choose ambiguous bases used during GC calculation.

Value

GC returns the fraction of G+C (in [0,1]) as a numeric vector of length one. GCpos returns GC at position pos. GC1, GC2, GC3 are wrappers for GCpos with the argument pos set to 1, 2, and 3, respectively. NA is returned when seq is NA. NA.GC defaulting to NA is returned when the G+C content can not be computed from data.

Details

When exact is set to TRUE the G+C content is estimated with ambiguous bases taken into account. Note that this is time expensive. A first pass is made on non-ambiguous bases to estimate the probabilities of the four bases in the sequence. They are then used to weight the contributions of ambiguous bases to the G+C content. Let note nx the total number of base 'x' in the sequence. For instance suppose that there are nb bases 'b'. 'b' stands for 'not a', that is for 'c', 'g' or 't'. The contribution of 'b' bases to the GC base count will be:

nb*(nc + ng)/(nc + ng + nt)

The contribution of 'b' bases to the AT base count will be:

nb*nt/(nc + ng + nt)

All ambiguous bases contributions to the AT and GC counts are weighted is similar way and then the G+C content is computed as ngc/(nat + ngc).

References

citation('seqinr').

Perl Program To Calculate Gc Content Analysis

The program codonW used here for comparison is available at http://codonw.sourceforge.net/.

See Also

You can use s2c to convert a string into a vetor of singlecharacter and tolower to convert upper-case characters intolower-case characters. Do not confuse with gc for garbage collection.

Perl Program To Calculate Gc Content Based

Examples

Perl Program To Calculate Gc Content Inventory

VecScreen(National Center for Biotechnology Information) - screens your DNA sequence for potential vector sequence. Well worth running before doing any other analysis.

Perl Program To Calculate Gc Content Sample

Base composition - consider WORDCOUNT (EMBOSS Suite) which gives one the option of choosing the 'word size', and GEMS (Genomatix, Germany). The latter provides a nice output of mono-, di- and trinucleotide frequencies. Select 'create statistics' and 'start task' to get to the sequence entry page.

Perl Program To Calculate Gc Content Analysis

Genomics %G~C Content Calculator(Science Buddies.org) - simple calculator for mol%G+C plus counts the individual bases.
Compositional heterogeneity - Graphe:ADN riche en:(Atelier BioInformatique l'Université de Provence, France)N.B. In French but obvious (Soumettre = Submit). Presents in graphic format AT, GC or single base enrichment in the sequence. A simpler version is GC Content Plot Online.
GraphDNA - DNA Skew Graphing (Viral Bioinformatics Resource Center, University of Victoria, Canada) - this Java applet performs DNA walks, purine, AT and GC skews on small (<1 Mb) genomes. Requires registration and login. Alternative locations for cumulative GC skew are the GC Skewing(Davidson College, U.S.A.), and GenSkew: Genomic nucleotide skew application (Developed by TU Munich; maintained by Department of Computational Systems Biology of the University of Vienna, Austria) .