The Art of Statistics

@tags:: #litāœ/šŸ“šbook/highlights
@links:: data science, statistics,
@ref:: The Art of Statistics
@author:: David Spiegelhalter

=this.file.name

Book cover of "The Art of Statistics"

Reference

Notes

Quote

Figure 0.3 The PPDAC problem-solving cycle, going from Problem, Plan, Data, Analysis to Conclusion and communication, and starting again on another cycle.
- LocationĀ 418
-

Quote

Turning experiences into data is not straightforward, and data is inevitably limited in its capacity to describe the world.
- LocationĀ 458
-

Quote

Data that records whether individual events have happened or not is known as binary data, as it can only take on two values,
- LocationĀ 501
-

Quote

table can be considered as a type of graphic, and requires careful design choices of colour, font and language to ensure engagement and readability. The audienceā€™s emotional response to the table may also be influenced by the choice of which columns to display.
- LocationĀ 512
-

Quote

negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ā€˜5% mortalityā€™ sounds worse than ā€˜95% survivalā€™.
- LocationĀ 516
-

Quote

both positive and negative frames should be presented if we want to provide impartial information, although the order of columns might still influence how the table is interpreted. The order of the rows of a table also needs to be considered carefully. Table 1.1 shows the hospitals in order of the number of operations in each, but if they had been presented, say, in order of mortality rates with the highest at the top of the table, this might give the impression that this was a valid and important way of comparing hospitals.
- LocationĀ 530
-

Quote

Alberto Cairo, author of influential books on data visualization,3 suggests you should always begin with a ā€˜logical and meaningful baselineā€™, which in this situation appears difficult to identifyā€”my rather arbitrary choice of 86% roughly represents the unacceptably low survival in Bristol twenty years previously.
- LocationĀ 542
-

Quote

Horizontal bar-chart of 30ā€”day survival rates for thirteen hospitals. The choice of the start of the horizontal axis, here 86%, can have a crucial effect on the impression given by the graphic. If the axis starts at 0%, all the hospitals will look indistinguishable, whereas if we started at 95% the differences would look misleadingly dramatic.
- LocationĀ 548
-

Quote

Categorical variables are measures that can take on two or more categories, which may be ā€¢ Unordered categories: such as a personā€™s country of origin, the colour of a car, or the hospital in which an operation takes place. ā€¢ Ordered categories: such as the rank of military personnel. ā€¢ Numbers that have been grouped: such as levels of obesity, which is often defined in terms of thresholds for the body mass index (BMI).*
- LocationĀ 559
- categorization,

Quote

pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas.
- LocationĀ 565
- data visualization, pie charts,

Quote

Comparisons are better based on height or length alone in a bar chart.
- LocationĀ 569
- data visualization,

Quote

The figure of 18% is known as a relative risk since it represents the increase in risk of getting bowel cancer between a group of people who eat 50g of processed meat a day, which could, for example, represent a daily two-rasher bacon sandwich, and a group who donā€™t.
- LocationĀ 593
-

Quote

absolute risk, which means the change in the actual proportion in each group who would be expected to suffer the adverse event.
- LocationĀ 595
-

Quote

expected frequencies: instead of discussing percentages or probabilities, we just ask, ā€˜What does this mean for 100 (or 1,000) people?ā€™ Psychological studies have shown that this technique improves understanding: in fact communicating only that this additional meat-eating led to an ā€˜18% increased riskā€™ could be considered manipulative, since we know this phrasing gives an exaggerated impression of the importance of the hazard.
- LocationĀ 603
- animal advocacy, communication, persuasion, percentages, vegan advocacy,

Quote

icon arrays to directly represent the expected frequencies of bowel cancer in 100 people.
- LocationĀ 606
-

Quote

such scatter has been shown to increase the impression of unpredictability, it should only be used when there is a single additional highlighted icon. There should be no need to count icons in order to make a quick visual comparison.
- LocationĀ 608
-

Quote

using multiple ā€˜1 inā€¦ā€™ statements is not recommended, as many people find them difficult to compare. For example, when asked the question, ā€˜Which is the bigger risk, 1 in 100, 1 in 10 or 1 in 1,000?ā€™, around a quarter of people answered incorrectly: the problem is that the bigger number is associated with the smaller risk, and so some mental dexterity is required to keep things clear.
- LocationĀ 618
-

Quote

Although extremely common in the research literature, odds ratios are a rather unintuitive way to summarize differences in risk. If the events are fairly rare then the odds ratios will be numerically close to the relative risks, as in the case of bacon sandwiches, but for common events the odds ratio can be very different from the relative risk, and the following example shows this can be very confusing for journalists (and others).
- LocationĀ 625
-

Quote

Relative risks tend to convey an exaggerated importance, and absolute risks should be provided for clarity.
- LocationĀ 653
-

Quote

Odds ratios arise from scientific studies but should not be used for general communication.
- LocationĀ 656
-

Quote

Graphics need to be chosen with care and awareness of their impact.
- LocationĀ 657
-


dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: The Art of Statistics
source: clippings

@tags:: #litāœ/šŸ“šbook/highlights
@links:: data science, statistics,
@ref:: The Art of Statistics
@author:: David Spiegelhalter

=this.file.name

Book cover of "The Art of Statistics"

Reference

Notes

Quote

Figure 0.3 The PPDAC problem-solving cycle, going from Problem, Plan, Data, Analysis to Conclusion and communication, and starting again on another cycle.
- LocationĀ 418
-

Quote

Turning experiences into data is not straightforward, and data is inevitably limited in its capacity to describe the world.
- LocationĀ 458
-

Quote

Data that records whether individual events have happened or not is known as binary data, as it can only take on two values,
- LocationĀ 501
-

Quote

table can be considered as a type of graphic, and requires careful design choices of colour, font and language to ensure engagement and readability. The audienceā€™s emotional response to the table may also be influenced by the choice of which columns to display.
- LocationĀ 512
-

Quote

negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ā€˜5% mortalityā€™ sounds worse than ā€˜95% survivalā€™.
- LocationĀ 516
-

Quote

both positive and negative frames should be presented if we want to provide impartial information, although the order of columns might still influence how the table is interpreted. The order of the rows of a table also needs to be considered carefully. Table 1.1 shows the hospitals in order of the number of operations in each, but if they had been presented, say, in order of mortality rates with the highest at the top of the table, this might give the impression that this was a valid and important way of comparing hospitals.
- LocationĀ 530
-

Quote

Alberto Cairo, author of influential books on data visualization,3 suggests you should always begin with a ā€˜logical and meaningful baselineā€™, which in this situation appears difficult to identifyā€”my rather arbitrary choice of 86% roughly represents the unacceptably low survival in Bristol twenty years previously.
- LocationĀ 542
-

Quote

Horizontal bar-chart of 30ā€”day survival rates for thirteen hospitals. The choice of the start of the horizontal axis, here 86%, can have a crucial effect on the impression given by the graphic. If the axis starts at 0%, all the hospitals will look indistinguishable, whereas if we started at 95% the differences would look misleadingly dramatic.
- LocationĀ 548
-

Quote

Categorical variables are measures that can take on two or more categories, which may be ā€¢ Unordered categories: such as a personā€™s country of origin, the colour of a car, or the hospital in which an operation takes place. ā€¢ Ordered categories: such as the rank of military personnel. ā€¢ Numbers that have been grouped: such as levels of obesity, which is often defined in terms of thresholds for the body mass index (BMI).*
- LocationĀ 559
- categorization,

Quote

pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas.
- LocationĀ 565
- data visualization, pie charts,

Quote

Comparisons are better based on height or length alone in a bar chart.
- LocationĀ 569
- data visualization,

Quote

The figure of 18% is known as a relative risk since it represents the increase in risk of getting bowel cancer between a group of people who eat 50g of processed meat a day, which could, for example, represent a daily two-rasher bacon sandwich, and a group who donā€™t.
- LocationĀ 593
-

Quote

absolute risk, which means the change in the actual proportion in each group who would be expected to suffer the adverse event.
- LocationĀ 595
-

Quote

expected frequencies: instead of discussing percentages or probabilities, we just ask, ā€˜What does this mean for 100 (or 1,000) people?ā€™ Psychological studies have shown that this technique improves understanding: in fact communicating only that this additional meat-eating led to an ā€˜18% increased riskā€™ could be considered manipulative, since we know this phrasing gives an exaggerated impression of the importance of the hazard.
- LocationĀ 603
- animal advocacy, communication, persuasion, percentages, vegan advocacy,

Quote

icon arrays to directly represent the expected frequencies of bowel cancer in 100 people.
- LocationĀ 606
-

Quote

such scatter has been shown to increase the impression of unpredictability, it should only be used when there is a single additional highlighted icon. There should be no need to count icons in order to make a quick visual comparison.
- LocationĀ 608
-

Quote

using multiple ā€˜1 inā€¦ā€™ statements is not recommended, as many people find them difficult to compare. For example, when asked the question, ā€˜Which is the bigger risk, 1 in 100, 1 in 10 or 1 in 1,000?ā€™, around a quarter of people answered incorrectly: the problem is that the bigger number is associated with the smaller risk, and so some mental dexterity is required to keep things clear.
- LocationĀ 618
-

Quote

Although extremely common in the research literature, odds ratios are a rather unintuitive way to summarize differences in risk. If the events are fairly rare then the odds ratios will be numerically close to the relative risks, as in the case of bacon sandwiches, but for common events the odds ratio can be very different from the relative risk, and the following example shows this can be very confusing for journalists (and others).
- LocationĀ 625
-

Quote

Relative risks tend to convey an exaggerated importance, and absolute risks should be provided for clarity.
- LocationĀ 653
-

Quote

Odds ratios arise from scientific studies but should not be used for general communication.
- LocationĀ 656
-

Quote

Graphics need to be chosen with care and awareness of their impact.
- LocationĀ 657
-