Exam Questions about Computational Biology

1) GFF (General feature format) files are widely used in data annotation, describing genes and
other molecules. Find the example GFF formatted file in your cms (Q1.gff).
a) Search and print every line that contains the word “overlap” using find command in cmd
(10 pts)
b) Redirecting (pipeline) the output of the problem (a) to the input of a new find command,
find all the lines that contain the word “chrXII” (10 pts)
c) Redirect the output of problem (b) to a new file named “Answer1.txt” (5 pts)
d) Put all the command lines that you used in this question to a file named
“Answer1_cmdline.txt”. Send both of the files.
2) Find the “blast_exam” folder in your cms.
a) Index the database file (7 pts)
b) Blast the query to database using “blastp” but use a switch to filter out all the alignments
that has identity below “95.3” percent (the number “95.3” is a float value) (13 pts)
(Hint: Help is always given on command line prompt to those who ask for it)
c) Put all the command lines that you used in this question to a file named
“Answer2_cmdline.txt”. Send this file and screenshots.
3) Find the “xtandem.zip” file in your cms. Inside the “src” folder, you have “exam_spectra.mgf”
and “exam_protein.fasta” files. Don’t move any file.
a) Index the database file (from “src” folder). (5 pts)
b) Fix “taxonomy.xml” and “input.xml” files in such a way that you can search
“exam_spectra.mgf” in “exam_protein.fasta” (mgf and indexed fasta files will be used
from src folder) (18 pts)
c) Set “fragment monoisotopic mass error” option to 0.7 and run xtandem program (7 pts)
d) Put all the command lines that you used in this question to a file named
“Answer3_cmdline.txt”. Send “Answer3_cmdline.txt”, “default_input.xml”,
“taxonomy.xml”, “input.xml” and screenshot files.
Please put all your files and answers in separate folders (e.g. folder “a1”, “a2”…)
then zip them into one file with the name “yourname_yourstudentid”.zip.

ComputationalComputational Biology Exam Folder