The Art of Statistics

@tags:: #lit✍/📚book/highlights
@links::
@ref:: The Art of Statistics
@author:: David Spiegelhalter

2021-12-29 David Spiegelhalter - The Art of Statistics

Book cover of "The Art of Statistics"

Reference

Notes

Quote

Figure 0.3 The PPDAC problem-solving cycle, going from Problem, Plan, Data, Analysis to Conclusion and communication, and starting again on another cycle.
- Location 418
-

Quote

Turning experiences into data is not straightforward, and data is inevitably limited in its capacity to describe the world.
- Location 458
-

Quote

Data that records whether individual events have happened or not is known as binary data, as it can only take on two values,
- Location 501
-

Quote

table can be considered as a type of graphic, and requires careful design choices of colour, font and language to ensure engagement and readability. The audience’s emotional response to the table may also be influenced by the choice of which columns to display.
- Location 512
-

Quote

negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ‘5% mortality’ sounds worse than ‘95% survival’.
- Location 516
-

Quote

both positive and negative frames should be presented if we want to provide impartial information, although the order of columns might still influence how the table is interpreted. The order of the rows of a table also needs to be considered carefully. Table 1.1 shows the hospitals in order of the number of operations in each, but if they had been presented, say, in order of mortality rates with the highest at the top of the table, this might give the impression that this was a valid and important way of comparing hospitals.
- Location 530
-

Quote

Alberto Cairo, author of influential books on data visualization,3 suggests you should always begin with a ‘logical and meaningful baseline’, which in this situation appears difficult to identify—my rather arbitrary choice of 86% roughly represents the unacceptably low survival in Bristol twenty years previously.
- Location 542
-

Quote

Horizontal bar-chart of 30—day survival rates for thirteen hospitals. The choice of the start of the horizontal axis, here 86%, can have a crucial effect on the impression given by the graphic. If the axis starts at 0%, all the hospitals will look indistinguishable, whereas if we started at 95% the differences would look misleadingly dramatic.
- Location 548
-

Quote

Categorical variables are measures that can take on two or more categories, which may be • Unordered categories: such as a person’s country of origin, the colour of a car, or the hospital in which an operation takes place. • Ordered categories: such as the rank of military personnel. • Numbers that have been grouped: such as levels of obesity, which is often defined in terms of thresholds for the body mass index (BMI).*
- Location 559
-

Quote

pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas.
- Location 565
-

Quote

Comparisons are better based on height or length alone in a bar chart.
- Location 569
-

Quote

The figure of 18% is known as a relative risk since it represents the increase in risk of getting bowel cancer between a group of people who eat 50g of processed meat a day, which could, for example, represent a daily two-rasher bacon sandwich, and a group who don’t.
- Location 593
-

Quote

absolute risk, which means the change in the actual proportion in each group who would be expected to suffer the adverse event.
- Location 595
-

Quote

expected frequencies: instead of discussing percentages or probabilities, we just ask, ‘What does this mean for 100 (or 1,000) people?’ Psychological studies have shown that this technique improves understanding: in fact communicating only that this additional meat-eating led to an ‘18% increased risk’ could be considered manipulative, since we know this phrasing gives an exaggerated impression of the importance of the hazard.
- Location 603
-

Quote

icon arrays to directly represent the expected frequencies of bowel cancer in 100 people.
- Location 606
-

Quote

such scatter has been shown to increase the impression of unpredictability, it should only be used when there is a single additional highlighted icon. There should be no need to count icons in order to make a quick visual comparison.
- Location 608
-

Quote

using multiple ‘1 in…’ statements is not recommended, as many people find them difficult to compare. For example, when asked the question, ‘Which is the bigger risk, 1 in 100, 1 in 10 or 1 in 1,000?’, around a quarter of people answered incorrectly: the problem is that the bigger number is associated with the smaller risk, and so some mental dexterity is required to keep things clear.
- Location 618
-

Quote

Although extremely common in the research literature, odds ratios are a rather unintuitive way to summarize differences in risk. If the events are fairly rare then the odds ratios will be numerically close to the relative risks, as in the case of bacon sandwiches, but for common events the odds ratio can be very different from the relative risk, and the following example shows this can be very confusing for journalists (and others).
- Location 625
-

Quote

Relative risks tend to convey an exaggerated importance, and absolute risks should be provided for clarity.
- Location 653
-

Quote

Odds ratios arise from scientific studies but should not be used for general communication.
- Location 656
-

Quote

Graphics need to be chosen with care and awareness of their impact.
- Location 657
-