CalibratedIQ

how IQ scores are calculated — the math behind the number

An IQ score is not simply the number of questions you answered correctly. It is a derived score that places your raw performance on a standardized scale, allowing direct comparison against a reference population. The process involves several steps: collecting raw scores, transforming them using a statistical formula, and mapping the result onto a normal distribution.

Understanding this process helps clarify why IQ scores behave the way they do: why 100 is always average, why the same raw score can produce different IQ values on different tests, and why scores at the extremes (very high or very low) are inherently less precise. For background on what IQ is and how the concept developed, see our introductory article.

raw scores vs standard scores

A raw score is the most basic measurement: how many items you answered correctly. On a 30-item test, a raw score might be 22 out of 30. On a 60-item test, it might be 47 out of 60. These numbers are not directly comparable, because the tests have different numbers of items, different difficulty levels, and different populations taking them.

A standard score solves this comparability problem. It expresses your performance relative to a specific reference group (the norming sample), using a consistent scale. The most common standard score in IQ testing uses a mean of 100 and a standard deviation of 15. This is sometimes called the Wechsler scale, after David Wechsler, who introduced it.

the normal distribution

IQ score calculation relies on the normal distribution (also called the Gaussian distribution or bell curve). When a large, representative sample of people takes a well-designed cognitive test, their scores tend to form this characteristic symmetrical shape: most scores cluster near the center, and scores become progressively rarer toward the extremes.

The normal distribution is fully defined by two parameters: the mean (the center point) and the standard deviation (a measure of how spread out the scores are). For IQ, the mean is set at 100 and the standard deviation at 15.

The standard deviation determines the width of the curve. One standard deviation above the mean is 115; one below is 85. Two standard deviations above is 130; two below is 70. These boundaries define the familiar IQ classification ranges described in the IQ scale and score chart.

the formula: from raw score to IQ

The conversion from a raw score to an IQ score proceeds in two steps. First, the raw score is converted to a z-score, which expresses how many standard deviations the score falls above or below the mean of the norming sample.

z = (X - M) / S

where X = raw score, M = mean of norming sample, S = standard deviation of norming sample

Then, the z-score is transformed onto the IQ scale:

IQ = 100 + 15z

where 100 is the IQ mean and 15 is the IQ standard deviation

For example, suppose a norming sample of 1,000 people takes a 30-item test. The mean raw score is 18 and the standard deviation is 5. A person who scores 23 out of 30 has a z-score of (23 - 18) / 5 = 1.0. Their IQ would be 100 + 15(1.0) = 115, placing them one standard deviation above the mean, at approximately the 84th percentile.

percentile ranks from z-scores

Once an IQ score is calculated, its corresponding percentile rank can be determined using the cumulative distribution function (CDF) of the normal distribution. The percentile indicates the percentage of the reference population that a person scores equal to or above.

The following table shows the relationship between IQ scores, z-scores, percentile ranks, and rarity.

IQz-scorePercentileRarity
145+3.099.871 in 741
140+2.6799.621 in 261
135+2.3399.011 in 101
130+2.097.721 in 44
125+1.6795.251 in 21
120+1.3390.881 in 11
115+1.084.131 in 6
110+0.6774.861 in 4
105+0.3363.061 in 3
100050.001 in 2
95-0.3336.941 in 3
90-0.6725.141 in 4
85-1.015.871 in 6
80-1.339.121 in 11
75-1.674.751 in 21
70-2.02.281 in 44

ceiling and floor effects

Every test has a maximum possible raw score (the ceiling) and a minimum (the floor, typically zero). These boundaries create measurement limitations at the extremes.

A ceiling effect occurs when the test is not difficult enough to differentiate among very high-ability individuals. If several test-takers achieve a perfect or near-perfect raw score, the test cannot distinguish between them, even though their true abilities may differ considerably. This is why the Standard Progressive Matrices is less effective for assessing giftedness than the Advanced version. For the differences between these versions, see Raven's Progressive Matrices.

A floor effect is the inverse: when the test is too difficult for very low-ability individuals, many achieve zero or near-zero scores, and the test cannot discriminate within this group. Ceiling and floor effects mean that IQ estimates at the extremes (below 70 or above 145) are inherently less reliable than those near the center of the distribution.

why different tests give different scores

It is common for a person to receive different IQ scores from different tests. This is not an error; it reflects several legitimate sources of variation.

Different norming samples. Each test is normed on a specific population sample. If the norming sample is more or less cognitively able than the general population, all derived scores will be systematically shifted. A score of 120 on one test may correspond to 115 or 125 on another, depending on who was in the norming sample.

Different cognitive domains. Tests that emphasize fluid intelligence (like matrix reasoning) may produce different scores for the same person than tests that emphasize crystallized intelligence (like vocabulary or general knowledge). A person with strong reasoning but limited formal education might score higher on RPM than on the WAIS Verbal Comprehension Index.

Measurement error.Every psychological test has a standard error of measurement (SEM). For most IQ tests, the SEM is approximately 3 to 5 points. This means a "true" IQ of 115 might produce observed scores ranging from 110 to 120 across multiple administrations. A single test administration is always an estimate, not an exact value.

the Flynn effect

The Flynn effect, named after researcher James Flynn, is the well-documented observation that average IQ scores have risen substantially over the 20th century, at a rate of approximately 3 points per decade. This effect is most pronounced on tests of fluid intelligence, including Raven's Progressive Matrices.

The Flynn effect has practical consequences for IQ calculation. Because norming samples become outdated over time, a person taking a test normed 20 years ago will appear to score higher than they would on a recently normed test, because the older norms reflect a less-skilled reference population. This is why IQ tests are periodically re-normed, and why the specific edition of a test matters when interpreting scores.

The causes of the Flynn effect remain debated. Proposed explanations include improved nutrition, greater access to education, increased exposure to abstract reasoning through technology, and reduced environmental toxins. More recently, some researchers have reported a reversal of the Flynn effect in certain developed countries, with scores plateauing or declining since the 1990s.

Our test applies the same statistical methodology described above to convert your matrix reasoning performance into a calibrated IQ estimate.

Take the free IQ test

references