Bioinformatics

1

Which of the following are sequence elements that algorithms can exploit to search for genes in a prokaryotic genome?

1

is an example of a first generation sequencing technology

1

Sanger sequencing has been automated by fluorescent labelling

1

Which of the following are advantages of sanger sequencing?

1

select the technologies that are second generation sequencing methods

1

Which of the following are limitations of 454 pyrosequencing?

1

A homopolymer error is a problem with base calling which there are multiple bases in a row as the signal does not increase with linearity

1

454 pyrosequencing and ion torrent use solid-phase bridge PCR

1

Ion torrent detects the incorporation of a base based on whereas 454 pyrosequencing detects the incorporation of a base based on

1

What are the advantages of third generation sequencing technologies?

1

and are all examples of large scale genome sequencing projects

1

is the most common sequencing approach for whole genomes

1

a ( contig, scaffold, read, coverage ) is a set of overlapping DNA fragments that together represent a consensus region of DNA

1

the de bruijn graph method is a greedy method of assembly

1

is the parameter used in the de bruijn graph assembly algorithm

1

sequence assembly can be...

1

Which of the following are de bruijn graph sequence assemblers?

1

Genomes always need to be finished

1

hybrid sequencing is an effective way of closing gaps in genome assembly as different technologies are biased in sequencing in different ways

1

in the equation N = (a x g) / L
N is the ( reads, coverage, genome length, read length ) a is the ( coverage, reads, genome length, read length ) g is the genome length and L is the read length

1

Which of the following are examples of challenges faced during sequence assembly?

1

Why can't BLAST be used for short read mapping to assemble our reads using a reference genome?

1

when might short-read mapping be beneficial to use?

1

- is the name of the algorithm which is used by mapping alignment packages such as Bowtie in order to convert the genome into a different format so matches can be easily found

1

We always need to assemble the genome in metagenomics experiments

1

raw sequencing data from sequencing experiments are saved in the sequence read archive

1

annotated sequence data from sequencing experiments are saved in GenBank and EMBL

1

Which of the following are legitimate methods of assessing a sequence assembly?

1

the N50 statistic is the length of the smallest contig in the set that contains the fewest contigs whose combined length represents 50% of the assembly

1

sequence annotation involves identifying...

1

gene prediction involves finding UTRs and alternative splice isoforms

1

what are the 2 major approaches for gene finding?

1

ab initio gene finding approaches are more accurate for eukaryotes than prokaryotes

1

the gene finding tools Glimmer and GeneScan use models

1

which of the following make eukaryotic gene finding more difficult than prokaryotic gene finding?

1

What measures can be used to assess gene prediction?

1

There is a trade-off when it comes to the specificity and sensitivity of gene prediction tools

1

( prokka, genescan, glimmer, genie ) is a genome annotation pipeline good for prokaryotes and small eukaryotes

1

order the types of mutation in terms of relative frequency:
1. ( point, deletion, inversion, insertion, translocation, duplication )
2. ( deletion, point, insertion, inversion, duplication, translocation )
3. ( duplication, point, deletion, inversion, insertion, translocation )
4. ( inversion, insertion, point, deletion, translocation, duplication )
5. ( insertion, inversion, translocation, duplication, point, deletion )
6. ( translocation, inversion, insertion, duplication, point, deletion )

1

silent, missense and nonsense are all types of mutation

1

nonsense mutations can be conservative or non-conservative (similar AA or not)

1

introns, intergenic regions and pseudogenes are highly conserved and intolerant to change

1

Gene duplicates experience relaxed evolutionary constraints

1

when does gene duplication occur in bacteria?

1

( duplication, point mutation, inversion, insertion, deletion ) is an essential mutation for evolutionary change to occur in eukaryotes

1

gene duplication can lead to or

1

which of the following are sources of variation in prokaryotes?

1

genes that share a common ancestor are said to be what?

1

genes that have diverged as a result of speciation are said to be what?

1

genes within the same genome created as a result of gene duplication are said to be what?

1

homology is a measure of similarity

1

which of the following are simplistic measure of similarity when it comes to measuring sequence similarity?

1

what kind of mutations are more common?

1

PAM and BLOSUM are example of

1

1 PAM is 1% similarity

1

PAM is better for alignments whilst BLOSUM is better for alignments

1

BLOSUM matrices are derived from the database

1

A higher PAM matrix will find weaker, longer alignments and a BLOSUM matrix with a higher number are better for similar sequences

1

A local alignment tries to align all the residues in a sequence

1

Dynamic programming is used for alignment methods

1

Needleman-Wunsch is a alignment algorithm

1

Smith-waterman is a local alignment algorithm

1

The trajectory refers to the traceback arrows in a trajectory table

1

BLAST and FASTA are examples of alignment methods

1

Exact alignment methods are not guaranteed to find an optimal solution

1

K-tuple alignment methods are a family of approximate alignment methods, and BLAST is part of the family

1

a approach is taken with multiple sequence alignment because an exact approach has complexity O(L^N)

1

progressive, iterative and statistical are all approaches used for

1

Which of the following are examples of progressive alignment algorithms?

1

Which of the following algorithms takes a hybrid approach for multiple sequence alignment?

1

A is part of a protein sequence associated with a particular biological function

1

A ( pattern, profile ) is a qualitative description of a motif
A ( profile, pattern ) is a quantitative description of a motif

1

Which of the following databases describe motifs in terms of pattern and profile?

1

PSI-BLAST is more powerful than BLAST for picking up distant relationships between sequences

1

in phylogenetics, masking an alignment involved looking for regions or conservation and removing data that does not appear homologous

1

Which of the following are examples of distance-based tree building methods?

1

can be added to branches in phylogenetic trees to summarise the degree of certainty for a given branching

1

uses a flat average whilst UPGMA uses a weighted average that takes into account the number of taxa in a group

1

microarrays and RNA-sequencing are examples of what kind of experiments?

1

aims to remove technical variation existing in microarray experiments

1

Which of the following are methods for quality control to remove outliers from microarray experiments?

1

following a microarray experiment, probeset QC removes noise and uninformative data points (i.e close to the background level of detection)

1

- is the most common multiple testing correction used in microarray, RNA-seq and proteomics experiments

1

Benjamin-Hochberg FDR modifies -values

1

Which of the following are not advantages for RNA-seq experiments over microarrays?

1

- gets rid of uninteresting, abundant RNA such as rRNA and haemoglobin RNA in blood samples in preparation for RNA-seq experiment

1

RNA-sequencing relies on reverse transcription

1

RNA-sequencing experiments are quantifiable - the sequencing reads in the library are proportional to the abundance of RNA

1

RPKM and FPKM are examples of tools used following an RNA-sequencing experiment

1

T-tests can be used to analyse microarray and RNA-seq data as both are continuous

1

microarrays can be used to discover novel transcripts

1

transcriptomics is used instead of proteomics as the transcript level always correlates to the protein abundance

1

the two main approaches in expression proteomics experiments are up and down experiments

1

Which of the following are experimental strategies used in proteomics?

1

Which of the following are disadvantages of 2DGE?

1

is a variation of 2DGE whereby multiple samples are ran on one gel but are differentially labelled to eliminate running difference between gels

1

Technical variation is higher in microarrays and RNA-seq than 2DGE and liquid chromatography tandem MS

1

in 2DGE, proteins are separated based first on then on

1

progenesis is a software used in ( 2DGE, microarray, RNA-seq, HPLC ) experiments

1

- is used to identify which proteins are contained within spots on a gel from a 2DGE experiment

1

2DGE can be used to identify membrane proteins

1

2DGE cannot be used to show post-translational modifications

1

in a proteomics experiment, proteins are first isolated then digested using an enzyme such as as it cuts in a predictable ways

1

in a peptide-mass fingerprinting experiment, resulting peak-lists can be the same for very similar proteins

1

in tandem MS, when fragments are introduced they are broken up by argon gas, which preferentially breaks peptide bonds

1

Which of the following databases of hypothetical spectra is used to identify peptides from an MS experiment?

1

the intensity of peaks in MS can be used to quantify proteins

1

is the main driving force of protein folding process

1

secondary structure refers to global interactions within a protein

1

helix, sheet and are the 3 secondary structure states

1

protein are subunits within a protein with quasi-independent folding stability

1

the structure refers to proteins formed from several subunits or monomers

1

protein structures solved by NMR or crystallography are saved as files

1

a visualises and clusters residues of an amino acid sequence based on psi and phi angles of the residue backbone

1

CATH, SCOP and FSSP/DDD are all examples of what?

1

the levels of hierarchy in the CATH system to catalogue proteins are ordered from bottom to top as follows:
1. ( class, domain, architecture, superfamily, fold )
2. ( architecture, class, domain, fold, superfamily )
3. ( fold, domain, class, architecture, superfamily )
4. ( superfamily, architecture, domain, class, fold )
5. ( domain, class, architecture, fold, superfamily )

1

mainly alpha and mainly beta are examples of CATH folds

1

3D protein structure prediction is treated as a machine learning problem

1

machine learning in the context of protein structure prediction aims to minimise the energy function

1

Dynamic programming is an optimisation method

1

Which of the following are types of machine learning?

1

a is similar to a substitution matrix but specifically tailored to the sequence being aligned

1

is the most popular secondary structure prediction software

1

PSIPRED uses hidden markov models

1

is the number of connections a residue in a protein has

1

is the amount of surface exposed of each residue

1

which of the following are the broad approaches for 3D PSP?

1

which 3 ways can a template by identified for 3D PSP?

1

Fold recognition is used to identify a template with high structural similarity but low sequence identity with the target protein, when homology modelling is not an option

1

in 3D PSP, profile-based methods make profiles for residues in a sequence based on...

1

in 3D PSP, fragment assembly combines with methods

1

in fragment assembly, are candidate structure generated from all the possible combinations of fragments. They energy minimisation process is applied to them and they are clustered. The final models are selected from the centre of this cluster,

1

I-Tasser is a used for protein structure prediction

1

a network is a graph consisting of a series of connect by

1

in a biological network, genes, proteins and cell types can be depicted as

1

in a network, sink nodes have high in degree and sources have a high out degree

1

Which of the following is not a type of degree distribution in a network?

1

In a network, the distance can be defined by Pajek or Watts

1

The longest shortest path between all pairs of nodes is...

1

the is defined by the number of edges as a fraction of the number of possible edges

1

Which of the following are measures of centrality of a network?

1

The betweenness centrality is a fraction of the shortest paths of the network for which a certain node is a member of

1

rewards nodes from which within a few edges, any node can be accessed

1

a random Boolean network is undirected

1

A random Boolean network can be used to study dynamic processes such as gene expression

1

an network uses data from high-quality databases such as BioGrid as well as our own experimental data

1

Gene co-expression networks are built using by

1

In gene co-expression networks, similarity in expression across samples is usually computed by

1

A gene co-prediction network relies on a set of rules and an edge connects genes that co-predict with high frequency

1

PathExpand and TopoGSA are examples of network packages

1

force, arc, circular and hive are all examples of network

1

An Arc network is more scalable than a Hive network

1

community detection is also known as

1

identifies sub-parts of a network with many connections and often reflect meaningful modules within the network organisation i.e cellular machinery or biological processes

1

represent relationships in a computationally amenable way by providing controlled vocabulary of terms

1

Which of the following are ontologies used by GO to describe the associations of gene products

1

there are amino acids used in biological systems

1

Which of the following is not commonly used to assess sequencing methods?

1

Which of the following is not a database combined in the INSDC major collection point for sequencing data?

1

Sanger, 454, ion torrent and ilumina sequencing all sequence by

1

Third generation sequencing involves a PCR step

1

the current gold-standard for shotgun sequencing assembly is a -fold coverage

1

Which of the following is not a reason for making sequence assembly difficult?

1

coverage assumes that DNA is randomly fragmented and all DNA is able to be sequenced.

1

the coverage equation often underestimates the number of reads necessary

1

silent mutations usually occur in the base of a

1

genes are those which are homologous and have been gained via horizontal gene transfer

1

in sequence alignments a represents a perfect match, a represents a similar AA and a blank space represents a larger AA change

1

Heuristic alignment methods are better when computational power is not a problem or for a small number of sequences

1

in a BLAST search, the number of hits one can expect to see by chance when searching a database of a particular size is defined by the -

1

in a MSA, the alignment table can be summarised in a single line, a pseudo sequence called the

1

A MSA algorithm which starts with a complete MSA, makes changes, computes score, keeps the MSA if the score is better then repeats is known as an method

1

In a progression MSA, the original mapping can be changed

1

progressive multiple sequence alignment strategies use pairwise alignments

1

the muscle MSA alignment method uses the matrix to make a global alignment during the improved progressive alignment

1

muscle uses WPGMA to make alignments

1

a can be incorporated into MSA and PSP algorithms to give better results

1

PSI-BLAST uses a position-specific scoring matrix

1

UPGMA can be fitted with an evolutionary model

1

microarrays assay gene expression by quantification of mRNA using hybridisation

1

normalisation is a method of normalisation which ranks data, then takes the median value for each rank and replace the original values with the ranked averages

1

principle component analysis reduces multi-dimensional data down to dimensions

1

when analysing microarray data, multiple testing correction controls for the error rate due to false positives being produced by multiple T-tests

1

which of the following does not encompass the same methods between microarrays and RNA-seq?

1

when analysing data from an RNA-seq experiment, DE-seq assumes a distribution

1

organisms have 1 genome and 1 proteome

1

in 2DGE, there is a pH gradient running left to right. Where a protein is positioned depends on its

1

in 2DGE, it is valid to compare spots between gels if the spot is absent on one of the gels

1

Sensitivity is good in 2DGE as the dye is linearly incorporated

1

LC-MS can be multidimensional, separating proteins based on more than 2 physiochemical properties

1

iTRAQ is used to label samples in order to quantify them. Tags are made up of an group to tag to the protein, a of varying sizes and a to balance the mass

1

when using iTRAQ to quantify proteins during LC-MS, the balancer moiety is measured - when there is a more balancer moiety, there is a higher peak and therefore more peptide.

1

iTRAQ is a relative quantification method in LC-MS

1

data from LC-MS experiments have been locked in up until recently, meaning that specialist software was required to view and analyse data depending on the technology used.

1

spot profiles for LC-MS data can be clustered based on how similar their expression profiles are or based on how similar their function are

1

the function of a protein depends on its

1

a beta hairpin is an example of a ( supersecondary, secondary, tertiary, CATH, primary, domain ) structure

1

which of the following is not an example of a structural property of an individual residue that can be predicted

1

Quiz on Bioinformatics, created by lauren beck on 19/01/2020.

Bioinformatics

Which of the following are sequence elements that algorithms can exploit to search for genes in a prokaryotic genome?

is an example of a first generation sequencing technology

Sanger sequencing has been automated by fluorescent labelling

Which of the following are advantages of sanger sequencing?

select the technologies that are second generation sequencing methods

Which of the following are limitations of 454 pyrosequencing?

A homopolymer error is a problem with base calling which there are multiple bases in a row as the signal does not increase with linearity

454 pyrosequencing and ion torrent use solid-phase bridge PCR

Ion torrent detects the incorporation of a base based on whereas 454 pyrosequencing detects the incorporation of a base based on

What are the advantages of third generation sequencing technologies?

and are all examples of large scale genome sequencing projects

is the most common sequencing approach for whole genomes

a contig scaffold read coverage( contig, scaffold, read, coverage ) is a set of overlapping DNA fragments that together represent a consensus region of DNA

the de bruijn graph method is a greedy method of assembly

is the parameter used in the de bruijn graph assembly algorithm

sequence assembly can be...

Which of the following are de bruijn graph sequence assemblers?

Genomes always need to be finished

hybrid sequencing is an effective way of closing gaps in genome assembly as different technologies are biased in sequencing in different ways

a ( contig, scaffold, read, coverage ) is a set of overlapping DNA fragments that together represent a consensus region of DNA

in the equation N = (a x g) / L
N is the ( reads, coverage, genome length, read length ) a is the ( coverage, reads, genome length, read length ) g is the genome length and L is the read length