探索不确定性与遥感数据论文 英译汉
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Exploring uncertainty in remotely sensed data with
parallel coordinate plots
Yong Ge , Sanping Li , V. Chris Lakhan , Arko Lucieer
Abstract The existence of uncertainty in classified remotely sensed data necessitates the application of enhanced techniques for identifying and visualizing the various degrees of uncertainty. This paper, therefore, applies the multidimensional graphical data analysis technique of parallel coordinate plots (PCP) to visualize the uncertainty in Landsat Thematic Mapper (TM) data classified by the Maximum Likelihood Classifier (MLC) and Fuzzy C-Means (FCM). The Landsat TM data are from the Yellow River Delta, Shandong Province, China. Image classification with MLC and FCM provides the probability vector and fuzzy membership vector of each pixel. Based on these vectors, the Shannon’s entropy (S.E.) of each pixel is calculated. PCPs are then produced for each classification output. The PCP axes denote the posterior probability vector and fuzzy membership vector and two additional axes represent S.E. and the associated degree of uncertainty. The PCPs highlight the distribution of probability values of different land cover types for each pixel, and also reflect the status of pixels with different degrees of uncertainty. Brushing functionality is then added to PCP visualization in order to highlight selected pixels of interest. This not only reduces the visualization uncertainty, but also provides invaluable information on the positional and spectral characteristics of targeted pixels.
1. Introduction
A major problem that needs to be addressed in remote sensing is the analysis,
identification and visualization of the uncertainties arising from the classification of
remotely sensed data with classifiers such as the Maximum Likelihood Classifier (MLC)
and Fuzzy C-Means (FCM). While the estimation and mapping of uncertainty has been
discussed by several authors (for example, Shi and Ehlers, 1996; van der Wel et al., 1998;
Dungan et al., 2002; Foody and Atkinson, 2002; Lucieer and Kraak, 2004; Ibrahim et al.,
2005; Ge and Li, 2008a), very little research has been done on identifying, targeting and
visualizing pixels with different degrees of uncertainty. This paper, therefore, applies
parallel coordinate plots (PCP) (Inselberg, 1985, 2009; Inselberg and Dimsdale, 1990) to
visualize the uncertainty in sample data and classified data with MLC and Fuzzy
C-Means. A PCP is a multivariate visualization tool that plots multiple attributes on the
X-axis against their values on the Y-axis and has been widely applied to data mining and
visualization (Inselberg and Dimsdale, 1990; Guo, 2003; Edsall, 2003; Gahegan et al., 2002; Andrienko and Andrienko, 2004; Guo et al., 2005; Inselberg, 2009). The PCP is useful for providing a representation of high dimensional objects for visualizing uncertainty in remotely sensed data compared with two-dimensional, three-dimensional, animation and other visualization techniques (Ge and Li, 2008b). Several advantages of the PCP technique for visualizing multidimensional data have been outlined by Siirtoia and Ra¨iha¨ (2006).
Data for the PCPs are from a 1999 Landsat Thematic Mapper (TM) image acquired over the Yellow River Delta, Shandong Province, China. After classifying the data the paper emphasizes the uncertainties arising from the classification process. The probability vector and fuzzy membership vector of each pixel, obtained with classifiers MLC and FCM, are then used in the calculation of Shannon’s entropy (S.E.), a measure of the degree of uncertainty. Axes on the PCP illustrate S.E. and the degrees of uncertainty. Brushing is then added to PCP visualization in order to highlight selected pixels. As demonstrated by Siirtoia and Ra¨iha¨(2006) the brushing operation enhances PCP visualization by allowing interaction with PCPs whereby polylines can be selected and highlighted.
2. Remarks on parallel coordinate plots and brushing
Parallel coordinate plots, a data analysis tool, can be applied to a diverse set of multidimensional problems (Inselberg and Dimsdale, 1990). The basis of PCP is to place coordinate axes in parallel, and to present a data point as a connection line between coordinate values. A given row of data could be represented by drawing a line that connects the value of that row to each corresponding axis (Kreuseler, 2000; Shneiderman, 2004). A single set of connected line segments representing one multidimensional data item is called a polyline (Siirtoia and Ra¨iha¨, 2006). An entire dimensional dataset can be viewed whereby all observations are plotted on the same graph. All attributes in two-dimensional feature space could, therefore, be distinguished thereby allowing for the recognition of uncertainties and outliers in the dataset.
The PCP can be used to visualize not only multidimensional data but also
non-numerical multidimensional data. Jang and Yang (1996) discussed applications of PCP especially its usefulness as a dynamic graphics tool for multivariate data analysis. In this paper the PCP is applied to Landsat TM multidimensional data. This paper follows the procedures of Lucieer (2004) and Lucieer and Kraak (2004) who adopted the PCP from Hauser et al. (2002) and Ledermann (2003) to represent the uncertainties in the spectral characteristics of pixels and their corresponding fuzzy membership. To enhance the visualization of the PCP tinteractive brushing functionality is employed. In the brushing operation, pixels of interest are selected and then highlighted. Brushing permits not only highlighting but also masking and deletion of specific pixels and spatial areas. 3. Use of PCP to explore uncertainty in sample data
In the supervised classification process of remotely sensed imagery, the quantity of samples is a major factor affecting the accuracy of the image classification. In this section, the uncertainty in the sample data is, therefore, first explored with PCP.
3.1. Data acquired
In this paper, the Landsat TM image representing the study area was acquired on August 28, 1999, and covers an area of the Yellow River delta. The image area is at the intersection of the terrain between Dongying and Binzhou, Shandong Province. The upper left latitude and longitude coordinates of the image are 11880034.0700E and 37822024.0000N, respectively. The lower right latitude and longitude coordinates are 118810052.8300E and 37813058.1300N, respectively. The image size is 515 by 515 pixels, with each pixel having a resolution of 30 m. Fig. 1 is a pseudo-color composite image of bands 5, 4 and 3.
3.2. Exploring the uncertainty in sample data
The image includes six land cover types, namely Water, Agriculture_1, Agriculture_2 (Agriculture_1 and Agriculture_2 are different crops), Urban, Bottomland (the channel of the Yellow River), and Bareground. Sample data are selected from the region of interest (see Fig. 2), and represents a total of 26,639 pixels.The Parbat software developed by Lucieer (2004) and Lucieer and Kraak (2004) is used to produce the PCP. The PCP depicts the multidimensional characteristics of pixels in the remote sensing image through
a set of parallel axes and polylines in a twodimensional plane (Fig. 3). It is noticeable that there is a clear representation of sample data from different land cover types, as shown by clustering of spectral signatures, and the dispersion and overlapping of spectral signatures from different land cover types. The digital numbers (DNs) of all pixels in the land cover type,
Fig. 1. Pseudo-color composition image of the study area.
Fig. 2. Sample data in the region of interest
Bottomland, are very concentrated within a narrow band. The range of DNs of pixels in the Water class is also narrow, except for band 3. The land cover types of Water and Bottomland in Fig. 3 can be easily distinguished from the other land covers in bands 5 and 7. Further differentiation is provided in band 3. The radiation responses for Agriculture_1 and Agriculture_2 have close similarity in bands 1, 2, 3, 6 and 7, with a degree of overlap in bands 4 and 5. There is an almost perfect positive correlation between bands 1 and 2 for all categories. This occurrence presents difficulties in clearly differentiating pixels for Agriculture_1 and Agriculture_2. Hence, it is evident that there is uncertainty in differentiating pixels for the land cover types Agriculture_1 and Agriculture_2.
4. Classification and measurement of uncertainty in classified remote sensing images 4.1. Uncertainties arising from the classification process
The supervised classifiers, MLC and FCM, are applied to the image, with the condition that no pixel is assigned to a null class. The classified images are shown in Fig. 4a and b.
Comparison of Fig. 4a and b reveals that the classified results from MLC and FCM are not identical for the data pertaining to the same region of interest (ROI).
Fig. 3. PCP of sample data.
The difference images between these two classified results are presented in Fig. 5a and b, with Fig. 5a showing the classified results for MLC. The classified results of FCM
for the difference pixels are illustrated in Fig. 5b. The number of difference pixels total 16,416. These difference pixels are distributed mainly on the banks of the river, and in mixed areas of Bareground, Agriculture_1, and Agriculture_2. For the MLC classified result, 57.1% of the difference pixels are classified as Agriculture_1, and 36.7% are classified as Agriculture_2. The FCM classified results, however, demonstrate that 90.5% of difference pixels are Bareground while 9.3% are Agriculture_2.
The number of pixels in the ROI and in each of the classification categories from MLC and FCM are illustrated in Fig. 6. Evidently, there is a significant difference in the number of pixels for Agriculture_1, Agriculture_2, and Bareground, while the number of pixels for Water and Bottomland are very similar. This is also demonstrated in Fig.
Fig. 4. (a) Classification result from MLC; (b) classification result from fuzzy Cmeans
Fig. 5. Spatial distribution of difference pixels between MLC and FCM: (a) difference pixels in the classified result from MLC; (b) difference pixels in the classified result from FCM.
Fig. 6. Comparison of numbers of pixels in the ROI; each category from MLC and FCM.
5a and b. Based on the classified results from MLC and FCM it is possible to claim that there are relatively high uncertainties in identifying Agriculture_1, Agriculture_2 and Bareground. There are, however, lower uncertainties in the identification of Water and Bareground.
4.2. Measurement
4.2.1. Probability/fuzzy membership
FCM produces a fuzzy membership vector for each pixel. This fuzzy membership can be taken as the area proportion within a pixel (Bastin et al., 2002). It is possible for pixels to have the same class type, but their posterior probabilities or fuzzy memberships could be different. Hence, the probability vector or fuzzy membership is normally used as a measure for uncertainty on a pixel scale. The posterior probability and fuzzy membership of Water and Agriculture_2 are illustrated in Figs. 7 and 8, respectively. The uncertainty in spatial distributions can be clearly observed in Figs. 7 and 8. In Fig. 7, pixels belonging to the Water class have larger posterior probability or fuzzy membership, thereby indicating that these pixels have smaller uncertainties. While there are insignificant variations between Fig. 7a and b there are, however, noticeable differences between Fig.8a and b, especially for the class,Agriculture_2.
Fig. 7. Water class: (a) posterior probability from MLC; (b) fuzzy membership from
FCM. 4.2.2. Shannon entropy [S.E.]
Entropy is a measure of uncertainty and information formulated in terms of probability theory, which expresses the relative support associated with mutually exclusive alternative classes (Foody, 1996; Brown et al., 2009). Shannon ’s entropy (Shannon, 1948), applied to the measurement of the quality of a remote sensing image, is defined as the required amount of information that determines a pixel completely belonging to a category and expresses the overall uncertainty information of the probability or membership vector, and all elements in the probability vector are used in the calculation (Maselli et al., 1994). Therefore, it is an appropriate method to measure the uncertainty for each pixel in classified remotely sensed images (van der Wel et al., 1998). The use of S.E. in this paper is represented by considering the following. Given U as the universe of discourse in the remotely sensed imagery, U contains all pixels in this image and is partitioned by {X1, X2, . . ., Xn} where n is the number of classes. The probability of each partition is denoted as pi = P(Xi) giving the S.E. as
()1log 2n i i
i H X p p ==∑ (1) where H(X) is the information entropy of the information source. When pi = 0, the
equation becomes 0 log 0 = 0 (Zhang et al., 2001; Liang and Li, 2005). It is accepted that: 0 _ H(X) _ log n.
On the basis of Bayesian decision rules, MLC determines the class type of every pixel according to its maximum posterior probability in a probability vector. The classification process could, therefore, be associated with uncertainty. S.E. derived from a probability vector or fuzzy membership vector can represent the variation in the posterior probability vector and can be taken as one of the measures on a pixel scale (van der Wel et al., 1998). Similarly, applying FCM to remotely sensed data can produce fuzzy membership values in all land cover classes. Hence, fuzzy membership can also be considered as a measure of uncertainty on a pixel scale. On the basis of the fuzzy membership of each pixel the corresponding S.E. can be calculated. To compare the MLC and FCM methods it is necessary to normalize the computed S.E. values.
Fig. 8. Agriculture_2 class: (a) posterior Fig. 9.
probability from MLC; (b) fuzzy
membership from FCM.
From Eq. (1) S.E. can be calculated through posterior probability and fuzzy memberships. Fig. 9a and b displays normalized S.E. values of classified pixels from MLC and FCM, respectively. When the grey value of a pixel is zero then the uncertainty is zero, and when the grey value is 255 then the uncertainty will be at the maximum value of 1. From Fig. 9a and b it can be observed that the classes of Water and Bottomland have lower uncertainties while the classes of Bareground, Agriculture_1 and Agriculture_2 have higher uncertainties. These results emphasize that S.E. values calculated from MLC are comparatively higher than those obtained from FCM. Obviously, FCM produces more information about the end members within a pixel than MLC.
4.2.3. Degree of uncertainty
While S.E. provides information on the uncertainties of pixels it is, however, known that when there is a large range of greyscale values representing brightness values [0,255] the subtle differences in greyscales are not easily discernible to humans. Hence, there are difficulties in differentiating the degree of uncertainty. To overcome this problem the S.E. is, therefore, discretized equidistantly. For instance, the S.E. is discretized into the following intervals: 0.00, (0.00, 0.20], (0.20, 0.40], (0.40, 0.60], (0.60, 0.80], (0.80, 1.00]. Measurements falling into the same interval have the same degree of uncertainty. By assigning a color to each degree of uncertainty, a pixel-based uncertainty visualization is produced (see Fig. 10a and b). This discretization clearly highlights the degrees of uncertainty in the classified remotely sensing image. In Fig. 10a, representing MLC, the degrees of uncertainty for most of the pixels are 0, 1 and 2 while in Fig. 10b, associated with FCM, the degrees of uncertainty for most of the pixels are 1, 2 and 3 and occasionally 4. It is worthwhile to note that the interval map of S.E. permits a comparison of the degrees of uncertainty in classified results from different classifiers.
5. PCP and brushing
The PCP is useful for visually exploring the degree of dispersion or aggregation of the DN values of pixels in each band, and can be conveniently used to investigate the reasons contributing to uncertainty. With the brushing operation, the pixels of interest could be selected and highlighted.
5.1. PCP
From the MLC results two sets of pixels have been, respectively, randomly selected from the classes of Water and Agriculture_2 tobe represented on a PCP. The Parbat software (Lucieer, 2004;
Fig. 10. Degree of uncertainty: (a) derived from MLC; (b) derived from FCM.
Lucieer and Kraak, 2004) is used to produce a PCP of Water (see Fig. 11) and Agriculture_2 (see Fig. 12) classified by MLC. The first six axes are posterior probabilities of the six land cover types and the last two axes are the S.E. and degree of uncertainty. For instance, Fig. 12 illustrates the uncertainties of the pixels selected from the class type of Agriculture_2 and their distributions. The posterior probability of Agriculture_2 to each pixel is relatively higher than other categories. Of significance, there is negative correlation between Agriculture_1 and Agriculture_2. In this example, the S.E. is equally divided into five intervals to obtain the degree of uncertainty. The uncertainties of the pixels selected from Water and Agriculture_2, respectively, and their distributions are illustrated by Figs. 11 and 12.
Figs. 13 and 14 are obtained by placing all the DNs of these pixels, fuzzy memberships, S.E., and associated degree of uncertainty as attribute dimensions on the horizontal axis of the PCP. Similarly, pixels are randomly selected to be represented on the PCP, and the S.E. is divided into five equal intervals to obtain the degree of uncertainty. Fig. 13 highlights the uncertainty characteristics associated with the spectral
signatures of pixels from the Water class while Fig. 14 shows the spectral signatures of pixels from the Agriculture_2 class and their related uncertainty characteristics. B.1–B.7 and C.1–C.6 denote the seven bands of the TM image and their fuzzy memberships in the six land cover types, which are Water, Agriculture_1, Agriculture_2, Urban, Bottomland, and Bareground, respectively. The S.E. from the FCM classifier and the degree of uncertainty within the range 0–5 are denoted by ShE and UnL. The red line within bands 1–7 emphasize pixels with a degree of uncertainty of zero. Different colors denote different degrees of uncertainty in the PCP. From the PCP it is possible to discern the distribution of fuzzy memberships and Shannon entropies of pixels with different degrees of uncertainty.
A comparison of Figs. 11 and 13 reveals that the MLC posterior probabilities results on classified pixels in the Water class have a non-zero value which are concentrated in the Bareground class. For the FCM (see Fig. 13) most of the classified pixels in the Water class have a fuzzy membership closer to 1. The fuzzy memberships of some pixels in the Bareground and Agriculture_1 and 2 classes are away from zero. It is, therefore, apparent that pixels with a high
degree of uncertainty are mixed pixels in the Water and Bareground classes, namely the boundary pixels between Water and Bareground. As expected, the spectral response for these pixels contains characteristics of both land cover types.
Fig. 13 provides additional information on the spectral characteristics of Water and its uncertainty distribution. For instance, the degrees of dispersion in bands 2, 3, 4 and 5 are high, and the distance to the red line in these four bands is also relatively high. As such, their uncertainties are high. Pixels with a degree of uncertainty of 3, represented by the blue line, are relatively far The fuzzy memberships of pixels for Bareground are in the range 0.2–0.4 thereby giving rise to a high degree of uncertainty.
from MLC.
Fig. 13. Spectral features of the Water class and their uncertainty characteristics.
Pixels with a degree of uncertainty of 4, represented by the pink line, are relatively far from those pixels with a degree of uncertainty of zero, as shown by the red line in band 4. There are two independent distribution ranges in band 3. In the case of the Water class pixels with a high degree of uncertainty, assuming their fuzzy membership on Bottomland is high, then the DNs in band 3 will be greater than the DNs of pixels with a degree of
uncertainty of zero.
Fig. 14. Spectral features of pixels from the Agriculture_2 class and their uncertainty characteristics
Fig. 15. (a) Class of Water: pixels with a low degree of uncertainty in the PCP; (b) class of Water: pixels with a high degree of uncertainty degree in the PCP.
A comparison of Figs. 12 and 14 reveals that the classes of Agriculture_1 and Bareground have the highest uncertainty in Agriculture_2. The difference between the PCPs from MLC and FCMis because the influence of Agriculture_1 on Agriculture_2 in MLC is greater than the influence of Bareground on Agriculture_ 2. For FCM the two influences on Agriculture_2 are almost similar. When the fuzzy memberships of pixels for Agriculture_1 are greater, their DNs in bands 4 and 5 are less than that of pixels with a degree of uncertainty of zero. The DNs in all bands are dispersed for pixels with large fuzzy membership for Bareground.
5.2. Brushing
From Figs. 11 and 14 it is demonstrated that when different colors are used to represent different degrees of uncertainty, a certain amount of overlap develops between color lines, especially on the axes where the distribution of polylines is concentrated. The superposition of polylines definitely increases the difficulty in visual uncertainty analysis.
A new ‘‘visual’’uncertainty is, thereby, introduced. To improve on this visualization the brushing operation is introduced to the PCP (Hauser et al., 2002; Ledermann, 2003). The difference from that of a conventional approach is that the user selects pixels of interest, and then highlights them with a brush, instead of using colored polylines. This is a suitable and convenient method for conducting targeted analysis on the spectral characteristics of pixels and their associated uncertainty.
Brushing is applied to a set of pixels with a low degree of uncertainty (see Fig. 15a) and a set of pixels with a high degree of uncertainty (see Fig. 15b). The pixels targeted for investigation belong to the class type,Water.Agreen polyline denotes the pixels being brushed, while a grey polyline represents the distribution characteristics of all pixels belonging to the class type, Water. The red line for bands 1–7 represents pixels with a zero degree of uncertainty. In Fig. 15a, there is a strong negative correlation between C.5 (Bottomland) and C.6 (Bareground). This means that
the larger the memberships of pixels in the class of Bottomland, the smaller the memberships of pixels in the class of Bareground. To further investigate the class type, Agriculture_2, PCP and brushing are used to visualize pixels with low uncertainty (see
Fig. 16a) and pixels with high uncertainty (see Fig. 16b). From the Figures it becomes noticeable that when PCP is combined with brushing the user could focus on the spectral characteristics and the uncertainty distribution of pixels of interest. This effectively reduces the uncertainty introduced by the visualization of the remotely sensed data.
Fig. 16. (a) Class of Agriculture_2: pixels with a low degree of uncertainty in the PCP; (b) class of Agriculture_2: pixels with a high degree of uncertainty in the PCP.
6. Discussion and conclusion
A major unresolved problem in image processing is how to identify and visualize the uncertainty arising out of the classification of remotely sensed data. Without doubt, targeting uncertainties will not only permit better visualization of features in geographical space but also enhance the capabilities of policymakers who have to make reliable
decisions on a broad range of geospatial issues. This paper has demonstrated the effectiveness of the combined PCP and brushing operation to explore and visualize the uncertainties in remotely sensed data classified with MLC and FCM.
The MLC and FCM results demonstrate that water class pixels, with a high degree of uncertainty, have high posterior probability or fuzzy membership for the type Bottomland. Furthermore, some Agriculture_2 class pixels, with a high degree of uncertainty, have high fuzzy membership for the Bareground class type. Therefore, it is possible to compare the pixel distribution for Water and Bottomland or Agriculture_2 and Bareground, and further analyze the possible reasons causing the uncertainty by investigating the spectral characteristics for all bands. Essentially, it is necessary to use the probability vector and fuzzy membership vector of each pixel to compute the S.E. The degree of uncertainty of each pixel can then be represented on a PCP. As illustrated in this paper two axes on the PCP represent Shannon’s entropy and the degree of uncertainty.
The PCP technique is also advantageous for highlighting the distribution of probability values of different land covers of each pixel, and also reflects the status of pixels with different degrees of uncertainty. Moreover, a PCP can be produced for the spectral characteristics of sample data and uncertainty attributes of classified data. The class type of the sample data can be included in the PCP to evaluate the quality of the data. Moreover, the sample data can then be compared to the classified data to evaluate whether the sample data are a reasonable reflection of the spectral characteristics of all bands. The identification of any dissimilarities or uncertainties is a definite indication of improvement in the visualization process. This paper demonstrates that there could be enhancements in PCP visualization with the addition of the brushing operation. Instead of using color polylines, as done with previous approaches, brushing permits the user to select pixels of interest. These pixels could then be highlighted with brushing instead of with color polylines. Evidently, brushing facilitates targeted analysis of the spectral characteristics of pixels and any associated uncertainty. It could, therefore, be concluded that the integration of PCP with the brushing operation is beneficial for not only visualizing uncertainty but also gaining insights on the spectral characteristics and
attribute information of pixels of interest. By interacting with the PCP through the brushing operation it is possible to conduct an exploration of uncertainty, even at the sub-pixel level.
Acknowledgements
This research received partial support from the National Natural Science Foundation of China (Grant No. 40671136) and the National High Technology Research and Development Program of China (Grant No. 2006AA120106).
References
Andrienko, G., Andrienko, N., 2004. Parallel coordinates for exploring properties of subsets CMV. In: Roberts, J. (Ed.), Proceedings International Conference on Coordinated & Multiple Views in Exploratory Visualization, London, England, July 13, 2004, pp. 93–104.
Bastin, L., Fisher, P.F., Wood, J., 2002. Visualizing uncertainty in multi-spectral remotely sensed imagery. Computers & Geosciences 28, 337–350.
Brown, K.M., Foody, G.M., Atkinson, P.M., 2009. Estimating per-pixel thematicuncertainty in remote sensing classifications. International Journal of RemoteSensing 30, 209–229.
Dungan, J.L., Kao, D., Pang, A., 2002. The uncertainty visualization problem in remote sensing analysis. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, Toronto, Canada, June 2, 2002, pp. 729–731.
Edsall, R.M., 2003. The parallel coordinate plot in action: design and use for geographic visualization. Computational Statistics and Data Analysis 43, 605–619.
Foody, G.M., Atkinson, P.M., 2002. Uncertainty in Remote Sensing and GIS. Wiley Blackwell, London.
Foody, G.M., 1996. Approaches for the production and evaluation of fuzzy land cover classifications from remotely-sensed data. International Journal of Remote Sensing 17, 1317–1340.
Gahegan, M., Takatsuka, M., Wheeler, M., amd Hardisty, F., 2002. Introducing geo VISTA studio: an integrated suite of visualization and computational methods for。