The Art of Statistics
@tags:: #litā/šbook/highlights
@links:: data science, statistics,
@ref:: The Art of Statistics
@author:: David Spiegelhalter
=this.file.name
Reference
=this.ref
Notes
Figure 0.3 The PPDAC problem-solving cycle, going from Problem, Plan, Data, Analysis to Conclusion and communication, and starting again on another cycle.
- LocationĀ 418
-
Turning experiences into data is not straightforward, and data is inevitably limited in its capacity to describe the world.
- LocationĀ 458
-
Data that records whether individual events have happened or not is known as binary data, as it can only take on two values,
- LocationĀ 501
-
table can be considered as a type of graphic, and requires careful design choices of colour, font and language to ensure engagement and readability. The audienceās emotional response to the table may also be influenced by the choice of which columns to display.
- LocationĀ 512
-
negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ā5% mortalityā sounds worse than ā95% survivalā.
- LocationĀ 516
-
both positive and negative frames should be presented if we want to provide impartial information, although the order of columns might still influence how the table is interpreted. The order of the rows of a table also needs to be considered carefully. Table 1.1 shows the hospitals in order of the number of operations in each, but if they had been presented, say, in order of mortality rates with the highest at the top of the table, this might give the impression that this was a valid and important way of comparing hospitals.
- LocationĀ 530
-
Alberto Cairo, author of influential books on data visualization,3 suggests you should always begin with a ālogical and meaningful baselineā, which in this situation appears difficult to identifyāmy rather arbitrary choice of 86% roughly represents the unacceptably low survival in Bristol twenty years previously.
- LocationĀ 542
-
Horizontal bar-chart of 30āday survival rates for thirteen hospitals. The choice of the start of the horizontal axis, here 86%, can have a crucial effect on the impression given by the graphic. If the axis starts at 0%, all the hospitals will look indistinguishable, whereas if we started at 95% the differences would look misleadingly dramatic.
- LocationĀ 548
-
Categorical variables are measures that can take on two or more categories, which may be ā¢ Unordered categories: such as a personās country of origin, the colour of a car, or the hospital in which an operation takes place. ā¢ Ordered categories: such as the rank of military personnel. ā¢ Numbers that have been grouped: such as levels of obesity, which is often defined in terms of thresholds for the body mass index (BMI).*
- LocationĀ 559
- categorization,
pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas.
- LocationĀ 565
- data visualization, pie charts,
Comparisons are better based on height or length alone in a bar chart.
- LocationĀ 569
- data visualization,
The figure of 18% is known as a relative risk since it represents the increase in risk of getting bowel cancer between a group of people who eat 50g of processed meat a day, which could, for example, represent a daily two-rasher bacon sandwich, and a group who donāt.
- LocationĀ 593
-
absolute risk, which means the change in the actual proportion in each group who would be expected to suffer the adverse event.
- LocationĀ 595
-
expected frequencies: instead of discussing percentages or probabilities, we just ask, āWhat does this mean for 100 (or 1,000) people?ā Psychological studies have shown that this technique improves understanding: in fact communicating only that this additional meat-eating led to an ā18% increased riskā could be considered manipulative, since we know this phrasing gives an exaggerated impression of the importance of the hazard.
- LocationĀ 603
- animal advocacy, communication, persuasion, percentages, vegan advocacy,
icon arrays to directly represent the expected frequencies of bowel cancer in 100 people.
- LocationĀ 606
-
such scatter has been shown to increase the impression of unpredictability, it should only be used when there is a single additional highlighted icon. There should be no need to count icons in order to make a quick visual comparison.
- LocationĀ 608
-
using multiple ā1 inā¦ā statements is not recommended, as many people find them difficult to compare. For example, when asked the question, āWhich is the bigger risk, 1 in 100, 1 in 10 or 1 in 1,000?ā, around a quarter of people answered incorrectly: the problem is that the bigger number is associated with the smaller risk, and so some mental dexterity is required to keep things clear.
- LocationĀ 618
-
Although extremely common in the research literature, odds ratios are a rather unintuitive way to summarize differences in risk. If the events are fairly rare then the odds ratios will be numerically close to the relative risks, as in the case of bacon sandwiches, but for common events the odds ratio can be very different from the relative risk, and the following example shows this can be very confusing for journalists (and others).
- LocationĀ 625
-
Relative risks tend to convey an exaggerated importance, and absolute risks should be provided for clarity.
- LocationĀ 653
-
Odds ratios arise from scientific studies but should not be used for general communication.
- LocationĀ 656
-
Graphics need to be chosen with care and awareness of their impact.
- LocationĀ 657
-
dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: The Art of Statistics
source: clippings
@tags:: #litā/šbook/highlights
@links:: data science, statistics,
@ref:: The Art of Statistics
@author:: David Spiegelhalter
=this.file.name
Reference
=this.ref
Notes
Figure 0.3 The PPDAC problem-solving cycle, going from Problem, Plan, Data, Analysis to Conclusion and communication, and starting again on another cycle.
- LocationĀ 418
-
Turning experiences into data is not straightforward, and data is inevitably limited in its capacity to describe the world.
- LocationĀ 458
-
Data that records whether individual events have happened or not is known as binary data, as it can only take on two values,
- LocationĀ 501
-
table can be considered as a type of graphic, and requires careful design choices of colour, font and language to ensure engagement and readability. The audienceās emotional response to the table may also be influenced by the choice of which columns to display.
- LocationĀ 512
-
negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ā5% mortalityā sounds worse than ā95% survivalā.
- LocationĀ 516
-
both positive and negative frames should be presented if we want to provide impartial information, although the order of columns might still influence how the table is interpreted. The order of the rows of a table also needs to be considered carefully. Table 1.1 shows the hospitals in order of the number of operations in each, but if they had been presented, say, in order of mortality rates with the highest at the top of the table, this might give the impression that this was a valid and important way of comparing hospitals.
- LocationĀ 530
-
Alberto Cairo, author of influential books on data visualization,3 suggests you should always begin with a ālogical and meaningful baselineā, which in this situation appears difficult to identifyāmy rather arbitrary choice of 86% roughly represents the unacceptably low survival in Bristol twenty years previously.
- LocationĀ 542
-
Horizontal bar-chart of 30āday survival rates for thirteen hospitals. The choice of the start of the horizontal axis, here 86%, can have a crucial effect on the impression given by the graphic. If the axis starts at 0%, all the hospitals will look indistinguishable, whereas if we started at 95% the differences would look misleadingly dramatic.
- LocationĀ 548
-
Categorical variables are measures that can take on two or more categories, which may be ā¢ Unordered categories: such as a personās country of origin, the colour of a car, or the hospital in which an operation takes place. ā¢ Ordered categories: such as the rank of military personnel. ā¢ Numbers that have been grouped: such as levels of obesity, which is often defined in terms of thresholds for the body mass index (BMI).*
- LocationĀ 559
- categorization,
pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas.
- LocationĀ 565
- data visualization, pie charts,
Comparisons are better based on height or length alone in a bar chart.
- LocationĀ 569
- data visualization,
The figure of 18% is known as a relative risk since it represents the increase in risk of getting bowel cancer between a group of people who eat 50g of processed meat a day, which could, for example, represent a daily two-rasher bacon sandwich, and a group who donāt.
- LocationĀ 593
-
absolute risk, which means the change in the actual proportion in each group who would be expected to suffer the adverse event.
- LocationĀ 595
-
expected frequencies: instead of discussing percentages or probabilities, we just ask, āWhat does this mean for 100 (or 1,000) people?ā Psychological studies have shown that this technique improves understanding: in fact communicating only that this additional meat-eating led to an ā18% increased riskā could be considered manipulative, since we know this phrasing gives an exaggerated impression of the importance of the hazard.
- LocationĀ 603
- animal advocacy, communication, persuasion, percentages, vegan advocacy,
icon arrays to directly represent the expected frequencies of bowel cancer in 100 people.
- LocationĀ 606
-
such scatter has been shown to increase the impression of unpredictability, it should only be used when there is a single additional highlighted icon. There should be no need to count icons in order to make a quick visual comparison.
- LocationĀ 608
-
using multiple ā1 inā¦ā statements is not recommended, as many people find them difficult to compare. For example, when asked the question, āWhich is the bigger risk, 1 in 100, 1 in 10 or 1 in 1,000?ā, around a quarter of people answered incorrectly: the problem is that the bigger number is associated with the smaller risk, and so some mental dexterity is required to keep things clear.
- LocationĀ 618
-
Although extremely common in the research literature, odds ratios are a rather unintuitive way to summarize differences in risk. If the events are fairly rare then the odds ratios will be numerically close to the relative risks, as in the case of bacon sandwiches, but for common events the odds ratio can be very different from the relative risk, and the following example shows this can be very confusing for journalists (and others).
- LocationĀ 625
-
Relative risks tend to convey an exaggerated importance, and absolute risks should be provided for clarity.
- LocationĀ 653
-
Odds ratios arise from scientific studies but should not be used for general communication.
- LocationĀ 656
-
Graphics need to be chosen with care and awareness of their impact.
- LocationĀ 657
-