利用KEGG数据库进行ID转换
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
利用KEGG 数据库进行ID 转换
clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.library(clusterProfiler) data(gcSample) hg head(hg)## [1] '4597' '7111' '5266' '2175' '755' '23046' eg2np ## Warning in bitr_kegg(hg, fromType = 'kegg', toType = 'ncbi-proteinid', ## organism = 'hsa'): 3.7% of input gene IDs are fail to map...
head(eg2np)## kegg ncbi-proteinid
## 1 8326 NP_003499
## 2 58487 NP_001034707
## 3 139081 NP_619647
## 4 59272 NP_068576
## 5 993 NP_001780
## 6 2676 NP_001487
np2up
head(np2up)## ncbi-proteinid uniprot
## 1 NP_005457 O75586
## 2 NP_005792 P41567
## 3 NP_005792 Q6IAV3
## 4 NP_037536 Q13421
## 5 NP_006054 O60662
## 6 NP_001092002 O95398
The ID type (both fromType & toType) should be one of
'kegg', 'ncbi-geneid', 'ncbi-proteinid' or 'uniprot'. The 'kegg' is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the 'kegg' ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.Many prokaryote species don't have entrezgene ID available. For example we can check the gene information of ece:Z5100 in
http://www.genome.jp/dbget-bin/www_bget?ece:Z5100, which have NCBI-ProteinID and UnitProt links in the Other DBs Entry, but not NCBI-GeneID.If we try to convert Z5100 to ncbi-geneid, bitr_kegg will throw error of ncbi-geneid is not supported.bitr_kegg('Z5100', fromType='kegg',
toType='ncbi-geneid', organism='ece')
## Error in KEGG_convert(fromType, toType, organism) : ## ncbi-geneid is not supported for ece ...
We can of course convert it to ncbi-proteinid and uniprot:bitr_kegg('Z5100', fromType='kegg', toType='ncbi-proteinid', organism='ece') ## kegg ncbi-proteinid ## 1
Z5100 AAG58814 bitr_kegg('Z5100', fromType='kegg', toType='uniprot', organism='ece') ## kegg uniprot ## 1
Z5100 Q7DB85 search_kegg_organismclusterProfiler supports more than 4k species listed in
http://www.genome.jp/kegg/catalog/org_list.html for hypergeometric test (enrichKEGG & enrichMKEGG) and GSEA (gseKEGG & gseMKEGG). We can use bitr_kegg to convert ID for all these 4k species. To facilitate searching scientific name abbreviate used in the organism parameter of these functions, I implemented the
search_kegg_organism function. We can search by
kegg_code, scientific_name or common_name (which is not available for prokaryotes).search_kegg_organism('ece', by='kegg_code')## kegg_code