Evaluation tools

Summary of suggested tools for comparison and evaluation

Diversity between classifications

The following indices determining the similarity between two partitions (classifications) of one set of observations have been calculated: Rand Index [RI], Adjusted Rand Index [ARI], Jaccard Index [JI], Mutual Information [MI], Normalized Mutual Information [NMI].

Separability and within-type variability of classifications

Evaluation criteria are: the Pattern Correlation Ratio (PCR, expressed as percentages), the Within-Type Standard Deviation (WSD), the Explained Variation (EV, expressed as percentages), the Pseudo-F Statistic (PF) and the Silhouette Index (SIL).

The several criteria are briefly described in Table 1.

Evaluation criteria have been estimated based on the following variables: mean sea level pressure MSLP, 2m temperature 2mT, large scale precipitation LSP, convective precipitation CP and precipitation sum (LSP+CP) PRCP.

Table 1: Selected criteria for evaluating circulation classifications.


Indicators used to assess links between the occurrence of a phenomenon (e.g. flood) and circulation patterns.

- Indicator 1: frequency anomaly

Measure of contribution of a pattern type i to the occurrence of floods: relative number of days with pattern i in the N=N* days preceding the flood, compared to purely random frequency of occurrence of pattern i during the season considered. I have actually modified this measure so that negative values show an occurrence of pattern i less often during a flood than usual; positive if more often than usual. Significance is assessed using the Chi2 test.


- Indicator 2: persistence measure

Conditional probability of finding at least k days out of N* with pattern or pattern group i given that a flood occurred on day zero


Significance is assessed in comparing this conditional probability with the Binomial probability of at least k days out of N* of pattern i using historical frequencies of occurrence

- Indicator 3: Brier Skill Score (BSS)

The Brier skill score is widely used to evaluate probability forecasts, but can also be adopted for the evaluation of classifications, where it takes a particularly simple form (Schiemann and Frei, 2009):

http://perswww.kuleuven.be/~u0044657/COST733/Table5.png .

Here, N is the total number of observations (e.g., days), N_i is the number of observations (days) with circulation type i, y_i is the relative frequency of an event (e.g., the exceedance of a threshold by some variable) during circulation type i, o bar is the climatological (unconditional) event frequency, and I is the total number of types.

Dispersion between classifications

Gini coefficient

The Gini coefficient method [Gini 1921] based on the Lorenc curve [Lorenc 1905] can be applied to compare CTCs. In order to calculate Gini coefficient G for some classification, the probability pi=mi/ni of occurrence days with some characteristic (e.g. high pollution concentration, large precipitation, fog ) for each class ought to be calculated and finally sorted according to rising pi.



where ni is a total number of days for class i (after sorting), mi is a number of days meeting our criteria for class i , N is a total number of days for all classes, M is a total number of days meeting our criteria for all classes and L is a number of classes.


Diversity indices

Hubert, L. and P. Arabie, 1985: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/hubert_arabie.pdf Comparing Partitions]. Journal of Classification, 2, 193-218. (Adjusted Rand index)

Kuncheva, L.I. and S. T. Hadjitodorov, 2004: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/kuncheva_diversity.pdf Using diversity in cluster ensembles]. 2004 IEEE International Conference on Systems, Man and Cybernetics, Vol. 2, 1214-1219. (Brief discussion of diversity indices)

Rand, W. M., 1971: Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc., 66, 846–850. (Rand index)

Southwood, T. R. E., 1978: Ecological Methods, 2nd edn. London: Chapman & Hall. (Jaccard index)

Strehl, A. and J. Gosh, 2002: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Strehl2002.pdf Cluster ensembles – A knowledge reuse framework for combining partitions]. Journal of Machine Learning Research, 3, 583-617. (Mutual information)

Evaluation criteria

Calinski, T., and J. Harabasz, 1974: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Calinski1974.pdf A dendrite method for cluster analysis]. Commun. Stat., 3, 1–27. (Pseudo-F)

Huth, R., 1996: An intercomparison of computer-assisted circulation classification methods. Int. J. Climatol. 16, 893-922. (Pattern correlation ratio)

Kalkstein, L. S., G. Tan, and J. A. Skindlov, 1987: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Kalkstein1987.pdf An evaluation of three clustering procedures for use in synoptic climatological classification]. J. Appl. Meteor., 26, 17–730. (Within-type standard deviation)

Milligan, G., and M. Cooper, 1985: [http://geo21.geo.uni-augsburg.de/cost733_WG3/Literature/Milligan1985.pdf An examination of procedures for determining the number of clusters in a data set]. Psychometrika, 50, 159–179. (Comparison of evaluation criteria)

Rousseeuw, P., 1987: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. (Silhouette index)

Occurrence/Frequency criteria

Duckstein, L., Bardossy, A. and Bogardi, I., 1993: Linkage between the occurrence of daily atmospheric circulation patterns and floods: an Arizona case study. Journal of Hydrology, 143(3-4): 413-428.

Lorenz, M. O., 1905: Methods of measuring the concentration of wealth. Publications of the American Statistical Association 9, 209–219.

Gini, Corrado, 1921: Measurement of Inequality and Incomes. The Economic Journal 31, 124–126.

cost733wiki: Cost733EvaluationTools (last edited 2012-02-09 08:31:03 by p54A4583B)