Summary of suggested tools for comparison and evaluation
Diversity between classifications
The following indices determining the similarity between two partitions (classifications) of one set of observations have been calculated: Rand Index [RI], Adjusted Rand Index [ARI], Jaccard Index [JI], Mutual Information [MI], Normalized Mutual Information [NMI].
Separability and within-type variability of classifications
Evaluation criteria are: the Pattern Correlation Ratio (PCR, expressed as percentages), the Within-Type Standard Deviation (WSD), the Explained Variation (EV, expressed as percentages), the Pseudo-F Statistic (PF) and the Silhouette Index (SIL).
The several criteria are briefly described in Table 1.
Evaluation criteria have been estimated based on the following variables: mean sea level pressure MSLP, 2m temperature 2mT, large scale precipitation LSP, convective precipitation CP and precipitation sum (LSP+CP) PRCP.
Table 1: Selected criteria for evaluating circulation classifications.
Indicators used to assess links between the occurrence of a phenomenon (e.g. flood) and circulation patterns.
- Indicator 1: frequency anomaly
Measure of contribution of a pattern type i to the occurrence of floods: relative number of days with pattern i in the N=N* days preceding the flood, compared to purely random frequency of occurrence of pattern i during the season considered. I have actually modified this measure so that negative values show an occurrence of pattern i less often during a flood than usual; positive if more often than usual. Significance is assessed using the Chi2 test.
- Indicator 2: persistence measure
Conditional probability of finding at least k days out of N* with pattern or pattern group i given that a flood occurred on day zero
Significance is assessed in comparing this conditional probability with the Binomial probability of at least k days out of N* of pattern i using historical frequencies of occurrence
- Indicator 3: Brier Skill Score (BSS)
The Brier skill score is widely used to evaluate probability forecasts, but can also be adopted for the evaluation of classifications, where it takes a particularly simple form (Schiemann and Frei, 2009):
Here, N is the total number of observations (e.g., days), N_i is the number of observations (days) with circulation type i, y_i is the relative frequency of an event (e.g., the exceedance of a threshold by some variable) during circulation type i, o bar is the climatological (unconditional) event frequency, and I is the total number of types.
Dispersion between classifications
The Gini coefficient method [Gini 1921] based on the Lorenc curve [Lorenc 1905] can be applied to compare CTCs. In order to calculate Gini coefficient G for some classification, the probability pi=mi/ni of occurrence days with some characteristic (e.g. high pollution concentration, large precipitation, fog ) for each class ought to be calculated and finally sorted according to rising pi.
where ni is a total number of days for class i (after sorting), mi is a number of days meeting our criteria for class i , N is a total number of days for all classes, M is a total number of days meeting our criteria for all classes and L is a number of classes.
Hubert, L. and P. Arabie, 1985: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/hubert_arabie.pdf Comparing Partitions]. Journal of Classification, 2, 193-218. (Adjusted Rand index)
Kuncheva, L.I. and S. T. Hadjitodorov, 2004: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/kuncheva_diversity.pdf Using diversity in cluster ensembles]. 2004 IEEE International Conference on Systems, Man and Cybernetics, Vol. 2, 1214-1219. (Brief discussion of diversity indices)
Rand, W. M., 1971: Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc., 66, 846–850. (Rand index)
Southwood, T. R. E., 1978: Ecological Methods, 2nd edn. London: Chapman & Hall. (Jaccard index)
Strehl, A. and J. Gosh, 2002: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Strehl2002.pdf Cluster ensembles – A knowledge reuse framework for combining partitions]. Journal of Machine Learning Research, 3, 583-617. (Mutual information)
Calinski, T., and J. Harabasz, 1974: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Calinski1974.pdf A dendrite method for cluster analysis]. Commun. Stat., 3, 1–27. (Pseudo-F)
Huth, R., 1996: An intercomparison of computer-assisted circulation classification methods. Int. J. Climatol. 16, 893-922. (Pattern correlation ratio)
Kalkstein, L. S., G. Tan, and J. A. Skindlov, 1987: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Kalkstein1987.pdf An evaluation of three clustering procedures for use in synoptic climatological classification]. J. Appl. Meteor., 26, 17–730. (Within-type standard deviation)
Milligan, G., and M. Cooper, 1985: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Milligan1985.pdf An examination of procedures for determining the number of clusters in a data set]. Psychometrika, 50, 159–179. (Comparison of evaluation criteria)
Rousseeuw, P., 1987: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. (Silhouette index)
Duckstein, L., Bardossy, A. and Bogardi, I., 1993: Linkage between the occurrence of daily atmospheric circulation patterns and floods: an Arizona case study. Journal of Hydrology, 143(3-4): 413-428.
Lorenz, M. O., 1905: Methods of measuring the concentration of wealth. Publications of the American Statistical Association 9, 209–219.
Gini, Corrado, 1921: Measurement of Inequality and Incomes. The Economic Journal 31, 124–126.