BIQ-Analyzer 演示过程

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Description: When the BiQ Analyzer program is properly installed you can start BiQ Analyzer simply by clicking on the "start BiQ_Analyzer.bat"
batch file (Windows) or by running the shell script
"unix_start_BiQ_Analyzer" (Unix/Linux/MacOS). If you have any problems, please consult the Installation section and the FAQ / Troubleshooting
section.
Description: This is the main screen of BiQ Analyzer. Its main elements are the status indicator on the top left, the text box for the genomic sequence at the top, some parameter settings on the top right, an empty space for the input sequences on the left and an empty space for the multiple sequence alignment on the right. In addition - and most importantly - there is a message box at the bottom right, which will give you hints on how to proceed in each step. Please read these messages carefully - they will help you to understand how the program works. We will now proceed as suggested: first paste the genomic sequence into the text box at the top and then press the "Next" button. Importantly, the genomic sequence must not contain primers and must be unconverted!
Description: The program requests us now to select the raw sequence files that were bisulphite converted and sequenced. These files must be in FASTA format. They should have the same orientation as the genomic sequence and it is helpful if they don't contain any primers (although both can be adjusted for later on). Each file must contain exactly one sequence, it is not possible to import multiple sequences from one file. Furthermore, all sequences must be located in the same directory (if all this sounds complicated, don't worry: just select the sequences as they come out of the sequencer's software and, most likely, it will work!). You can find the example sequences for this guided tour in the "Demo_data" subdirectory of your BiQ Analyzer installation directory.
If you have trouble to select multiple files, here are some hints for Windows users: to select all sequences from one directory, please select one and then press <CONTROL><A>. To select sequences individually, keep <SHIFT> or <CONTROL> pressed and click at the files with the mouse. Finally, press the "Open" button to start the analysis.
Description: If everything works fine, you will now see two thing happen. Firstly, the sequences that you selected will appear on the left. Secondly, there will briefly appear a dialog box indicating the the program is currently busy with an alignment calculation. Please be patient until this is finished. For this exemplary analysis it should not take more than 30 seconds but if you use a high number of sequences (>20) the waiting time may be significantly longer.
Description: Now the multiple sequence alignment between the genomic sequence and all selected bisulphite converted sequences will appear, all CpGs and unconverted Cs neatly highlighted (too many colors? See "BiQ Analyzer Color Codes Explained.pdf" in the program directory if you are confused...). That was easy, wasn't it!?
Now the real work starts, namely the quality control process. In the first step, you should look closely at the alignment and check whether any of the sequences are reversed compared to the genomic sequence. The program will help you with an automatic analysis, highlighting sequences or groups of sequences that don't agree well with the genomic sequence. However, it cannot assess whether the reverse complement fits better without calculating a new alignment (which would take too long), so be sure to check the program's suggestion.
In this case, the suggestion is good, so we can now press the "Recalculate" button to see whether the alignment improves (the difference between the "Next" button and the "Recalculate" button is that the former proceeds to the next step whereas the latter repeats the current step with the new
settings).
Description: Now the multiple sequence alignment looks better (fewer errors and gaps). Now is the right time to remove any remaining primers from the sequences. You can do so by directly editing the text boxes on the left (your modifications will not be written back to the original text files).
In this example, the primers are already removed, so we can directly
proceed to the next step by pressing the "Next" button.
Description: In the next step, the aim is to remove all sequences with an unacceptably low conversion rate or with a high number of sequencing errors from the alignment. This step is necessary to ensure high data quality. Here we see that sequences [5] and [7] have a conversion rate below 90% (this is the default cutoff, which can be changed in the program's configuration file) and the program suggests to exclude these two sequences. There is one aspect that may lead to confusion here: why does sequence [5] fall below the threshold even though it has a conversion rate of exactly 90%? This is because in fact is has a conversion rate slightly below 90%, which is rounded to 90%. But the cutoff is done on the exact value.
Furthermore, the program suggests to exclude sequence [11] because of a
high error rate.
Description: In this example, we will accept the program's suggestion to exclude sequences [7] and [11], but for sequence [5] we decide to relax our conditions a little and include it even if it has a conversion rate slightly below 90%. This is done by changing the choice box below the corresponding sequence on the left side from the suggestion "Exclude" to the previous selection "Include as is". After that, we press the "Recalculate" button to see the results of our decisions.
Description: Here are the results: The program removed the two sequences that we decided to exclude but it retained our sequence [5]. However, it still insists to exclude it, hence we have to change the choice box again to "Include as is", before we can press the "Next" button.
Description: Now, the program asks us the inspect the multiple sequence alignment for clones, i.e. sequences that are likely to come from the same chromosome of the same cell. Such sequences are a potential threat to all statistical analyses that are based on their methylation data. The program interpretes all sequences as clones that agree in all C positions of the genomic sequence. Then it suggests to exclude all but one sequences from each group of clone sequences.
Description: Now, after all program-supported quality control steps are completed, the program asks the user to manually validate the alignment. This is a critical step that should be taken seriously to ensure high data quality. In many cases, the best idea is to print out the current alignment by pressing the "Print (via browser)" button: then the program will load the current alignment including all highlighting into a web browser, from where you can print it. If you find any additional errors, you can again directly edit the sequence text boxes on the left or change the choice boxes to exclude doubtful sequences.
Description: Here's the final result of the analysis. We checked it before, hence we can directly press the "Next" button and proceed to the data documentation and export step.
Please press the "Next" link now in order to proceed to the next step.
Description: The program now displays an experiment documentation questionnaire, which should help the user to properly document his/her results. The upper part with the small text boxes is standardized and you should fill it out in all cases. The "Details" box below is based on the "ExpDetails_Template.txt" file, which can be found in the program directory. It can either be used as provided or it can be adapted to local requirements using any text editor. However, one aspect is critical: you should give very specific details on the genome position of the analyzed sequences, in order to make it easy to find the location later on. Finally, the text box "Free comment" allows you to add any additional information that you might find useful when coming back to this analysis in 6 months or 10 years time.
In order to facilitate filling out this questionnaire, you can use the "Cycle through history" button, which provides a history function for all that you have entered into the text boxes since you installed the program. Finally, after filling out the questionnaire, we continue by pressing the "Save data" button.
Description: The program is now ready to export the results of the analysis into a single documentation HTML file that contains: the questionnaire, some basic data about the experiment, the genomic sequence unconverted and converted, the final multiple sequence alignment, and a
lollipop-style diagram of the methylation patterns. We enter a name for that file and press the "Open" button to save the file (be careful: when you select an existing file here, it will be overwritten without further warning).
Description: In addtion to the HTML file, the program also suggests to save the raw methylation data into a plain text file (as tab-separated values). This is only important if you want to further analyze your methylation data in a statistics package that does not properly support copy & paste. More often than not, you can skip this, hence we press the
"Cancel" button.
Description: The program successfully saved the results. From here, there are several ways to continue. First, we want to look at the results file, that the BiQ Analyzer produced. So we press the "Next" button to open it into a web browser.
Description: The web browser shows the full documentation of the bisulphite sequencing experiment (click here to see the full file). We close the browser window again to explore another way to look at the results.
Description: When we finished the analysis, the BiQ Analyzer has copied the generated methylation into the system clipboard. Hence, in order to carry out a statistical analysis of that data, we can directly go to a spreadsheet program and paste our methylation data.
Description: Here are the results: Each row corresponds to one bisuphite treated sequence and each column to a CpG dinucleotide in the genomic sequence. A '1' represents a methylated C, a '0' represents an unmethylated C, and an 'x' represents a non-CpG or ambiguous position. From that data, you can easily calculate average methylation, medians, variances, etc.
We close the spreadsheet program again and return to BiQ Analyzer. Please press the "Next" link now in order to proceed to the next step.。

相关文档
最新文档