bioperl-i-handouts
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
my $annotation = $seq->annotation(); my @keys = $annotation->get_all_annotation_keys(); for my $ref ($annotation->get_Annotations("reference")) { # a Bio::Annotation::Reference object printf("title: %s\nauthors: %s\njournal: %s\n\n", $ref->title(), $ref->authors(), $ref->location()); }
Bio::Seq methods
• "core" get/set methods:
– display_id() – description() – seq() [Bio::Seq objects return a Bio::PrimarySeq object; Bio::PrimarySeq objects return a scalar string] – alphabet() ["dna", "rna", or "protein"]
• CPAN:
% cpan [or, perl -MCPAN -e shell] cpan> install [Bundle::BioPerl] Bio::Perl cpan> quit
• CVS:
bioperl-live, bioperl-run, bioperl-ext, bioperl-db
% mkdir ~/cvs; cd ~/cvs % cvs -d:pserver:cvs@cvs.openbio.org:/home/repository/bioperl login % cvs -d:pserver:cvs@cvs.openbio.org/home/repository/bioperl checkout bioperl-live
• http://bugzilla.open-bio.org - bug reports and tracking
BioPerl - Where is it?
• http://www.bioperl.org/Core/Latest/
% % % % tar -xzvf current-core-stable.tar.gz cd bioperl-1.4; more README INSTALL perl Makefile.PL [LIB=~user/lib] make; make test; make install
#!/usr/bin/perl -Tw use strict; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-format => “fasta”, -file => “myseq.fa”); while (my $seq = $seqio->next_seq()) { printf(“Sequence %s has length: %d\n”, $seq->display_id, $seq->length); }
• calculations:
– length() – subseq($start, $end) [1-based coords]
• transformations (return new Bio::Seq):
– revcom() [reverse complement, DNA only] – translate() [DNA only] – trunc($start, $end) [1-based coords]
Format conversion
The *IO modules can be used to easily convert between formats:
#!/usr/bin/perl -Tw use strict; use Bio:wk.baidu.comSeqIO; my $in = Bio::SeqIO->new(-format -file my $out = Bio::SeqIO->new(-format -file while (my $seq = $in->next_seq()) $out->write_seq($seq); }
– – – – Bio::Annotation::Reference Bio::Annotation::Comment Bio::Annotation::DBLink Bio::Annotation::SimpleValue
• individual members of the collection are obtained by "key" (like a hash):
• CPAN/download installs currently stable versions of both bioperl-live and bioperl-run modules.
– Developers releases: odd numbered (e.g. 1.5 vs. 1.4) – CVS “head” - always the most current, potentially buggy
BioPerl functionality: IO
• data input/output (*IO modules)
– – – – – – – – – SeqIO: FASTA, GenBank, EMBL, … AlignIO: ClustalW, MSF, Phylip, … TreeIO: Newick, Nexus, NHX, … SearchIO: BLAST, FASTA, HMMER, … MapIO: MapMaker Matrix::IO: Scoring, Phylip Assembly::IO: Ace, Phrap Ontology::IO: InterPro, GO, SO and others …
BioPerl datatype: Bio::Seq
# # # # A FASTA-formatted sequence: >someID a description here CGATTACAACA TCCGAGTTCAC
# the same sequence, generated de novo in BioPerl my $seq = Bio::Seq->new( -seq => "CGATTACAACATCCGAGTTCAC", -display_id => "someID", -description=> "a description here", );
– format conversion – report processing – data manipulation – sequence analyses – batch processing
• and much, much more …
BioPerl - Who is it?
• a troupe of ~100 diverse volunteer developers • a “core” group of ~10 volunteer developers who devote far too much of their time to BioPerl • thousands of users in academia, government and industry
BioPerl - Introduction
Bioinformatics: Writing Software for Genome Research CSHL, Fall 2004
Aaron Mackey amackey@pcbi.upenn.edu
BioPerl - What is it?
• open source, object oriented Perl modules • a bioinformatics toolkit for:
More Bio::Seq methods
• when reading from GenBank/EMBL records, additional data are available:
primary_id() e.g. GI number species() a Bio::Species object molecule() e.g. "mRNA" division() e.g. "PRI" accession() e.g. "U21892.3" version() e.g. "3" get_dates() e.g. "28-DEC-1995" annotation() a Bio::Annotation::Collection object – get_SeqFeatures() an array of Bio::SeqFeature::Generic-like objects – – – – – – – –
BioPerl documentation
• User documentation:
– http://bioperl.org/Core/Latest/modules.html – at the shell:
% perldoc bioperl % perldoc Bio::Seq % perldoc bptutorial.pl
Bio::Annotation objects
• an annotation is something about the sequence, not local to any one region of the sequence • Bio::Annotation::Collection is a container for:
=> => => => {
“genbank”, “myseq.gbk”); “fasta”, “>myseq.fa”);
BioPerl functionality: data
*IO modules read formatted data, generating in-memory representations (i.e. objects):
BioPerl - Where is it?
• http://bioperl.org - main website • mailing lists
– bioperl-announce : announcements – bioperl-l : discussion, help – bioperl-guts : code changes
BioPerl versions
• The entirety of BioPerl is split in CVS:
– bioperl-live: “core” modules – bioperl-ext: compiled C extensions for sequence alignment, binary trace file IO – bioperl-run: most Bio::Tools::Run modules for batch processing – bioperl-db: OO interface to a bioSQL relational database schema
• Developer documentation:
– modules that end in “I” are developer interfaces
% perldoc Bio::SeqI % perldoc Bio::Align::AlignI
– ignore *I.pm files for now
Bio::SeqFeature objects
• "features" describe specific locations in the sequence; e.g. the GenBank/EMBL Feature Table. • Bio::SeqFeature objects have get/set methods:
Bio::Seq methods
• "core" get/set methods:
– display_id() – description() – seq() [Bio::Seq objects return a Bio::PrimarySeq object; Bio::PrimarySeq objects return a scalar string] – alphabet() ["dna", "rna", or "protein"]
• CPAN:
% cpan [or, perl -MCPAN -e shell] cpan> install [Bundle::BioPerl] Bio::Perl cpan> quit
• CVS:
bioperl-live, bioperl-run, bioperl-ext, bioperl-db
% mkdir ~/cvs; cd ~/cvs % cvs -d:pserver:cvs@cvs.openbio.org:/home/repository/bioperl login % cvs -d:pserver:cvs@cvs.openbio.org/home/repository/bioperl checkout bioperl-live
• http://bugzilla.open-bio.org - bug reports and tracking
BioPerl - Where is it?
• http://www.bioperl.org/Core/Latest/
% % % % tar -xzvf current-core-stable.tar.gz cd bioperl-1.4; more README INSTALL perl Makefile.PL [LIB=~user/lib] make; make test; make install
#!/usr/bin/perl -Tw use strict; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-format => “fasta”, -file => “myseq.fa”); while (my $seq = $seqio->next_seq()) { printf(“Sequence %s has length: %d\n”, $seq->display_id, $seq->length); }
• calculations:
– length() – subseq($start, $end) [1-based coords]
• transformations (return new Bio::Seq):
– revcom() [reverse complement, DNA only] – translate() [DNA only] – trunc($start, $end) [1-based coords]
Format conversion
The *IO modules can be used to easily convert between formats:
#!/usr/bin/perl -Tw use strict; use Bio:wk.baidu.comSeqIO; my $in = Bio::SeqIO->new(-format -file my $out = Bio::SeqIO->new(-format -file while (my $seq = $in->next_seq()) $out->write_seq($seq); }
– – – – Bio::Annotation::Reference Bio::Annotation::Comment Bio::Annotation::DBLink Bio::Annotation::SimpleValue
• individual members of the collection are obtained by "key" (like a hash):
• CPAN/download installs currently stable versions of both bioperl-live and bioperl-run modules.
– Developers releases: odd numbered (e.g. 1.5 vs. 1.4) – CVS “head” - always the most current, potentially buggy
BioPerl functionality: IO
• data input/output (*IO modules)
– – – – – – – – – SeqIO: FASTA, GenBank, EMBL, … AlignIO: ClustalW, MSF, Phylip, … TreeIO: Newick, Nexus, NHX, … SearchIO: BLAST, FASTA, HMMER, … MapIO: MapMaker Matrix::IO: Scoring, Phylip Assembly::IO: Ace, Phrap Ontology::IO: InterPro, GO, SO and others …
BioPerl datatype: Bio::Seq
# # # # A FASTA-formatted sequence: >someID a description here CGATTACAACA TCCGAGTTCAC
# the same sequence, generated de novo in BioPerl my $seq = Bio::Seq->new( -seq => "CGATTACAACATCCGAGTTCAC", -display_id => "someID", -description=> "a description here", );
– format conversion – report processing – data manipulation – sequence analyses – batch processing
• and much, much more …
BioPerl - Who is it?
• a troupe of ~100 diverse volunteer developers • a “core” group of ~10 volunteer developers who devote far too much of their time to BioPerl • thousands of users in academia, government and industry
BioPerl - Introduction
Bioinformatics: Writing Software for Genome Research CSHL, Fall 2004
Aaron Mackey amackey@pcbi.upenn.edu
BioPerl - What is it?
• open source, object oriented Perl modules • a bioinformatics toolkit for:
More Bio::Seq methods
• when reading from GenBank/EMBL records, additional data are available:
primary_id() e.g. GI number species() a Bio::Species object molecule() e.g. "mRNA" division() e.g. "PRI" accession() e.g. "U21892.3" version() e.g. "3" get_dates() e.g. "28-DEC-1995" annotation() a Bio::Annotation::Collection object – get_SeqFeatures() an array of Bio::SeqFeature::Generic-like objects – – – – – – – –
BioPerl documentation
• User documentation:
– http://bioperl.org/Core/Latest/modules.html – at the shell:
% perldoc bioperl % perldoc Bio::Seq % perldoc bptutorial.pl
Bio::Annotation objects
• an annotation is something about the sequence, not local to any one region of the sequence • Bio::Annotation::Collection is a container for:
=> => => => {
“genbank”, “myseq.gbk”); “fasta”, “>myseq.fa”);
BioPerl functionality: data
*IO modules read formatted data, generating in-memory representations (i.e. objects):
BioPerl - Where is it?
• http://bioperl.org - main website • mailing lists
– bioperl-announce : announcements – bioperl-l : discussion, help – bioperl-guts : code changes
BioPerl versions
• The entirety of BioPerl is split in CVS:
– bioperl-live: “core” modules – bioperl-ext: compiled C extensions for sequence alignment, binary trace file IO – bioperl-run: most Bio::Tools::Run modules for batch processing – bioperl-db: OO interface to a bioSQL relational database schema
• Developer documentation:
– modules that end in “I” are developer interfaces
% perldoc Bio::SeqI % perldoc Bio::Align::AlignI
– ignore *I.pm files for now
Bio::SeqFeature objects
• "features" describe specific locations in the sequence; e.g. the GenBank/EMBL Feature Table. • Bio::SeqFeature objects have get/set methods: