A Meta-Analysis Of High Resolution Audio Perceptual Evaluation Article By Joshua Reiss

Home | Hi-Fi Audio Reviews | Audiophile Shows | Partner Mags | News

July 2016

A Meta-Analysis Of High Resolution Audio Perceptual Evaluation
Article By Joshua Reiss

1.3.2 Format discrimination studies
Studies in this category are in some sense focused on our ability to discriminate the rendering of high resolution content or formats. Many of these studies may be considered indirect discrimination, since they don't ask participants to select a stimuli or to identify whether a difference exists. Notable among these are studies that measure brain response. [44] showed that high frequency sounds are processed by the brain and observed an increase in activity when listeners were presented with broad-spectrum signals compared with those containing either the low frequency signal (below 22 kHz) or high frequency signal (above 22 kHz) alone. But this does not necessarily imply that high resolution audio is consciously, or even subconsciously, distinguished.

Other forms of indirect discrimination include studies that ask participants to identify or rate semantic descriptors [44, 52], or to perform a task with or without high resolution audio, e. g., localize a sound source [23], set listening level [46], discriminate timing [54]. Such studies may show, at a high level, what perceptual attributes are most affected. However, the difficulty with subjecting such studies to meta-analysis is that a well-designed experiment may (correctly) give a null result on the indirect discrimination task even if participants can discriminate high resolution audio by other means.

Several studies have been focused on tasks involving direct discrimination between competing high resolution audio formats. In [56], test subjects generally did not perceive a difference between DSD (64x 44.1 kHz, 1 bit) and DVD-A (176.4 kHz/16-bit) in an ABX test, whereas [57] showed a statistically significant discrimination between PCM (192kHz/24-bits) and DSD. However, in both cases, high resolution audio formats are compared against each other. Certainly in the first case, the null result does not suggest that there would be a null result when discriminating between CD quality and a higher resolution format. The second case is intriguing, but closer inspection of the experimental set-up revealed that the two formats were subject to different processing, most notably, different filtering of the low frequency content.

2. Secondary Analysis
Table 2A lists the studies that were included in the secondary analysis and meta-analysis. For the remainder of the paper, they are referred to by 'AuthorYear' notation, to distinguish the studies from related publications (many studies were described in multiple publications, and some papers described multiple studies). In this section, we revisit data from these studies, where available, in order to perform additional analysis of the results and to present the results in a form suitable for later meta-analysis.

2.1 Transformation of study data
Yoshikawa 1995 involved discrimination of 96kHz and 48kHz in an AXY test. Although only t values are reported for each stimulus/participant combination, these are derived from trials with a discrete set of results. By computing all possible sets of results and comparing the resultant t values with the reported t values, we were able to estimate the number of correct answers for each participant.

In King 2012, participants were asked to rate 44.1kHz, 96kHz, 192 kHz, all at 24 bit, and 'live' stimuli in terms of audio quality. This methodology is problematic in that the ranking may be inconclusive, yet people might still hear a difference, i.e. some may judge low sample rate as higher quality due to a personal preference, regardless of their ability to discriminate.

We were provided with the full data from the experiment. A priori, the decision was made to treat the 'live' stimuli as a reference, allowing the ranking data to be transformed into a form of A/B/X experiment. For each trial, it was treated as a correct discrimination if the highest sample rate, 192 kHz, was ranked closer to 'live' than the lowest sample rate, 44.1 kHz, and an incorrect discrimination if 44.1 kHz was ranked closer to 'live' than 192 kHz. Other rankings were excluded from analysis since they may have multiple interpretations. Thus if there is an inability to discriminate high resolution content, the probability of a correct answer is 50%.

In Repp 2006, participants also provided quality ratings, in this case between 24-bit/192 kHz, 16-bit/44.1kHz, and lower quality formats. This can be transformed into an XY test by assuming that correct discrimination is made when 24 bit/ 192 kHz was rated higher than 16-bit/44.1kHz, and incorrect discrimination if 24-bit/192kHz was rated lower than 16-bit/44.1kHz. Results where they are rated equal are ignored, since there is no way of knowing if participants perceived a difference but simply considered it too small compared to differences between other formats, and hence cannot be categorized. Note also that here, unlike King 2012, there is no reference with which to compare the high resolution and CD formats. Thus, without training, there may be no consistent definition of quality and it may not be possible to identify correct discrimination of formats.

2.2 Meyer 2007 revisited
Meyer 2007 deserves special attention, since it is well-known and has the most participants of any study, but could only be included in some of the meta-analysis in Section 3 due to lack of data availability. This study reported that listeners could not detect a difference between an SACD or DVD-A recording, and that same recording when converted to CD quality. However, their results have been disputed, both in online forums (www.avsforum.com, www.sa-cd.net, www.hydrogenaud.io and secure.aes.org/forum/pubs/journal/) and in research publications [11, 76].

First, much of the high-resolution stimuli may not have actually contained high-resolution content for three reasons; the encoding scheme on SACD obscures frequency components above 20 kHz and the SACD players typically filter above 30 or 50 kHz, the mastering on both the DVD-A and SACD content may have applied additional low pass filters, and the source material may not all have been originally recorded in high resolution. Second, their experimental set-up was not well-described, so it is possible that high resolution content was not presented to the listener even when it was available. However, their experiment was intended to be close to a typical listening experience on a home entertainment system, and one could argue that these same issues may be present in such conditions. Third, their experiment was not controlled. Test subjects performed variable numbers of trials, with varying equipment, and usually (but not always) without training. Trials were not randomized, in the sense that A was always the DVD-A/SACD and B was always CD. And A was on the left and B on the right, which introduces an additional issue that if the content was panned slightly off-center, it might bias the choice of A and B.

Meyer and Moran responded to such issues by stating [76], "...there are issues with their statistical independence, as well as other problems with the data. We did not set out to do a rigorous statistical study, nor did we claim to have done so..." But all of these conditions may contribute towards Type II errors, i.e. an inability to demonstrate discrimination of high resolution audio.

Although full details of their experiment, methodology and data are not available, some interesting secondary analysis is possible. [76] noted that 'the percentage of subjects who correctly identified SACD at least 70% of the time appears to be implausibly low." In trials with at least 55 subjects, only one subject had 8 out of 10 correct and 2 subjects achieved 7 out of 10 correct. The probability of no more than 3 people getting at least 7 out of 10 correct by chance, is 0.97%. This suggests that the results were far from the binomial distribution that one would expect if the results were truly random.

If no one was able to distinguish between formats and there were no issues in the experimental design, then all trial results would be independent, regardless of whether the trials were by the same participant, and regardless of how participants are categorized. But [63] also gave a breakdown of correct answers by gender, age, audio experience and hearing ability, depicted in Table 3. Non-audiophiles, in particular, have very low success rates, 30 out of 87, which has a probability of only (p(X<=30)=0.25%). Chi squared analysis comparing audiophiles with non-audiophiles gives a p value of 0.18%, suggesting that it is extremely unlikely that the data for these two groups are independent. Similarly, analysis suggests that the results for those with and without strong high frequency hearing also do not appear independent, p=4.92%. Note, however, that if there was a measurable effect, one would expect some dependency between answers from the same participant. The analysis in Table 3 is based only on total correct answers, not correct answers per participant, since this data was not available.

2.3 Multiple comparisons
Some p value analysis was misleading. The discrimination tests all have a finite number of trials, each with dichotomous outcomes. Thus, they each give results with discrete probabilities, which may not align well with a given level of significance. For instance, if a discrimination trial is repeated ten times with a participant, and a=0.05, then only 9 or 10 correct could give p≤a, even though this occurs by chance with probability p=1.07%, which is much less than the significance level. This low statistical power implies that a lack of participants with p≤a may be less of an indicator of an inability to discriminate than it first appears. This should also be taken into consideration when accounting for multiple comparisons.

In several studies, a small number of participants had some form of evaluation with a p value less than 0.05. This is not necessarily evidence of high resolution audio discrimination, since the more times an experiment is run, the higher the likelihood that any result may appear significant by chance. Several experiments also involved testing several distinct hypotheses, e.g., does high resolution audio sound sharper, does it sound more tense, etc. Given enough hypotheses, some are bound to have statistical significance.

This well-known multiple comparisons problem was accounted for using the Holm, Holm-Bonferroni and Sidak corrections (see Appendix), which all gave similar results, and we also looked at the likelihood of finding a lack of statistically significant results where no or very few low p values were found. This is summarized in Table 4, which also gives the actual significance levels given that each participant has a limited number of trials with dichotomous outcomes. Interestingly, the results in Table 4 agree with the results of retesting statistically significant individuals in Nishiguchi 2003 and Hamasaki 2004, confirm the statistical significance of several results in Yoshikawa 1995, and highlight the implausible lack of seemingly significant results amongst the test subjects in Meyer 2007, previously noted by [76]. For Pras 2010, they refute the significance of the specific individuals who 'anti-discriminate' (consistently misidentify the high resolution content in an ABX test), but confirms the significance of there being 3 such individuals out of 16, and similarly for the 3 significant results out of 15 stimuli.

---> Next Page.

Quick Links

Premium Audio Review Magazine
High-End Audiophile Equipment Reviews

Equipment Review Archives
Turntables, Cartridges, Etc
Digital Source
Do It Yourself (DIY)
Preamplifiers
Amplifiers
Cables, Wires, Etc
Loudspeakers/ Monitors
Headphones, IEMs, Tweaks, Etc
Superior Audio Gear Reviews

Show Reports
HIGH END Munich 2024
AXPONA 2024 Show Report
Montreal Audiofest 2024 Report
Southwest Audio Fest 2024
Florida Intl. Audio Expo 2024
Capital Audiofest 2023 Report
Toronto Audiofest 2023 Report
UK Audio Show 2023 Report
Pacific Audio Fest 2023 Report
T.H.E. Show 2023 Report
Australian Hi-Fi Show 2023 Report
...More Show Reports

Videos
Our Featured Videos

Industry & Music News
High-Performance Audio & Music News

Partner Print Magazines
audioXpress
Australian Hi-Fi Magazine
hi-fi+ Magazine
Sound Practices
VALVE Magazine

For The Press & Industry
About Us
Press Releases
Official Site Graphics

Home | Hi-Fi Audio Reviews | News | Press Releases | About Us | Contact Us

All contents copyright^�1995 - 2024 Enjoy the Music.com^�
May not be copied or reproduced without permission. All rights reserved.