Overall Agreement Statistics
It is important to note that in each of the three situations in Table 1, the passport percentages are the same for both examiners, and if the two examiners are compared to a typical 2-×-2 test for mated data (McNemar test), there would be no difference between their performance; On the other hand, the agreement between the observers is very different in these three situations. The basic idea that must be understood here is that «agreement» quantifies the agreement between the two examiners for each of the «couples» of the scores, not the similarity of the total pass percentage between the examiners. There are a number of statistics that can be used to determine the reliability of interramas. Different statistics are adapted to different types of measurement. Some options are the common probability of an agreement, Cohens Kappa, Scott`s pi and the Fleiss`Kappa associated with it, inter-rate correlation, correlation coefficient, intra-class correlation and Krippendorff alpha. Nevertheless, important guidelines have appeared in the literature. Perhaps the first Landis and Koch stated that the values < 0 were unseable and 0-0.20 as light, 0.21-0.40 as just, 0.41-0.60 as moderate, 0.61-0.80 as a substantial agreement and 0.81-1 almost perfect. However, these guidelines are not universally accepted; Landis and Koch did not provide evidence, but relied on personal opinion. It was found that these guidelines could be more harmful than useful.  Fleiss`:218 Equally arbitrary guidelines characterize Kappas beyond 0.75 as excellent, 0.40 to 0.75 as just to good and less than 0.40 bad. If the number of categories used is small (z.B. 2 or 3), the probability of 2 advisors agreeing by pure coincidence increases considerably. This is because the two advisors must limit themselves to the limited number of options available, which affects the overall agreement rate, not necessarily their propensity to enter into an "intrinsic" agreement (an agreement is considered "intrinsic" if not due to chance).
Here, the coverage of quantity and opinion is instructive, while Kappa hides the information. In addition, Kappa poses some challenges in calculating and interpreting, because Kappa is a report. It is possible that the Kappa report returns an indefinite value due to zero in the denominator. In addition, a report does not reveal its meter or denominator. For researchers, it is more informative to report disagreements in two components, quantity and allocation. These two components more clearly describe the relationship between categories than a single synthetic statistic. If prediction accuracy is the goal, researchers may more easily begin to think about opportunities to improve a forecast using two components of quantity and assignment rather than a Kappa report.  where in is the relative correspondence observed between advisors (identical to accuracy) and pe being the hypothetical probability of a random agreement, the observed data being used to calculate the probabilities of each observer who sees each category at random.