April 2024 – Page 3

Screen Time and Happiness

April 10, 2024

The effect of screen time on mental and social well-being is a subject of great concern in child development studies. The common knowledge in the field revolves around the “dispalcement hypothesis”, which says that the harm is directly proportional to the exposure.

Przybylski and Weinstein published a study on this topic in Psychological Science in 2017. The research analysed data collected from 120,115 English adolescents. Mental well-being (the dependent variable) was estimated using the Warwick-Edinburgh Mental Well-Being Scale (WEMWBS ). The WEMWBS is a 14-item scale, each answered on a 1 to 5 scale, ranging from “none of the time” to “all the time.” The fourteen items in WEMWBS are:

1	I’ve been feeling optimistic about the future
2	I’ve been feeling useful
3	I’ve been feeling relaxed
4	I’ve been feeling interested in other people
5	I’ve had energy to spare
6	I’ve been dealing with problems
7	I’ve been thinking clearly
8	I’ve been feeling good about myself
9	I’ve been feeling close to other people
10	I’ve been feeling confident
11	I’ve been able to make up my own mind about things
12	I’ve been feeling love
13	I’ve been interested in new things
14	I’ve been feeling cheerful

The study results

I must say, the authors were not alarmists in their conclusions. The study showed a non-linear relationship between screen time and mental well-being. Well-being increased a bit with screen time but later declined. Yet, the plots were in the following form (see the original paper in the reference for the exact graph).

A casual look at the graph shows a steady decline in mental well-being as the screen time increases from 2 hours onwards. Until you notice the scale of the Y-axis!

In a 14-item survey with a 1-5 range in scale, the overall score must range from 14 (min) to 70 (max). Instead, In the present plot, the scale was from 40 to 50, thus visually exaggerating the impact. Had it been plotted following the (unwritten) rules of visualisation, it would have looked like this:

To conclude

Screen time impacts the mental well-being of adolescents. It increases a bit, followed by a decline. The magnitude of the decrease (from 0 screen time to 7 hr) is about 3 points on a 14-70 point scale.

References

Andrew K. Przybylski and Netta Weinstein, A Large-Scale Test of the Goldilocks Hypothesis: Quantifying the Relations between Digital-Screen Use and the Mental Well-Being of Adolescents, Psychological Science, 2017, Vol. 28(2) 204–215.
Joshua Marmara, Daniel Zarate, Jeremy Vassallo, Rhiannon Patten, and Vasileios Stavropoulos, Warwick Edinburgh Mental Well-Being Scale (WEMWBS): measurement invariance across genders and item response theory examination, BMC Psychol. 2022; 10: 31.

Screen Time and Happiness Read More »

Generalised Linear Models

April 9, 2024

In an ordinary linear model, like linear regression exercises, we express the dependant variation (y) as a function of the independent variable (x) as:

$y_i = \beta_0 + \beta_1 x_i + \epsilon_i$

The equation is divided into two parts. The first part is the equation of a line.

$\mu_i = \beta_0 + \beta_1 x_i$

where beta₀ is the intercept and beta₁ is the slope of the line. It just described the line (the approximation), but you need to add the error term to include the points around the line. The second part is the error term or the points around the idealised line.

$\epsilon_i = N(\mu_i, sd)$

The points around the line are normally distributed in linear regression, so the epsilon term is normal.

LM to GLM

Imagine if the dependent variable is binary—it takes one or zero. The random component (error term) is no longer normally distributed in such cases. That is where the concept of generalised linear models (GLM) comes in. Here, the first part remains the same, while the second part can take other types of distributions as well. In the case of binary, as in logistic regression, GLM is used with binomial distribution through a link function.

$\eta = \beta_0 + \beta_1 x$

link function

$\eta = logit(\mu)$

random component is a binomial error distribution family.

$\epsilon_i = Binomial(\mu)$

In Poisson regression, the error term takes a Poisson distribution.

$\eta = \beta_0 + \beta_1 x$

$\eta = log(\mu)$

$\epsilon_i = Poisson(\mu)$

In R, you use glm() with an attribute on the family as, glm(formula, family = “binomial”)

binomial()

Family: binomial 
Link function: logit

poisson()

Family: poisson 
Link function: log

Generalised Linear Models Read More »

Poisson Regression

April 8, 2024

We have seen the use of linear regression models for continuous variables and logistic regression for binary variables. Another class of variables is called count variables, such as,

The number of claims received per month by an insurance company
Weekly accidents happening in a particular region

The dependent variables (claims or accidents) in these examples have a few standard features. The data represent numbers (counts) or rates (counts per time). They also can only take values of zero or positive discrete numbers. As the Poisson random variable is used to model counts, the relevant regression in the above examples could be Poisson regression.

Poisson Regression Read More »

Flight Accidents

April 7, 2024

Year	Accidents
1976	24
1977	25
1978	31
1979	31
1980	22
1981	21
1982	26
1983	20
1984	16
1985	22

We assume that flight accidents are random and independent. This implies that the likelihood function (the nature of the phenomenon) is likely to follow a Poisson distribution. Let Y be the number of events occurring within the time interval.

$Y|\theta = Pois(\theta)$

Theta is the (unknown) parameter of interest, and y is the data (total of 10 observations). We will use Bayes’ theorem to estimate the posterior distribution p(theta|data) from a prior, p(theta). As we established long ago, we select gamma distribution for the prior (conjugate pair of Poisson).

Flight Accidents Read More »

Bias in a Coin and Beta Prior

April 6, 2024

Bias in a Coin and Beta Prior Read More »

Bland-Altman Plot

April 5, 2024

Bland-Altman analysis is used to study the agreement between two measurements. Here is how it is created.

Step 1: Collect the two measurements

Sample_Data <- data.frame(A=c(6, 5, 3, 5, 6, 6, 5, 4, 7, 8, 9,
10, 11, 13, 10, 4, 15, 8, 22, 5), B=c(5, 4, 3, 5, 5, 6, 8, 6, 4, 7, 7, 11, 13, 5, 10, 11, 14, 8, 9, 4))

Step 2: Calculate the means of the measurement 1 and measurement 2

Sample_Data$average <- rowMeans(Sample_Data)

Step 3: Calculate the difference between measurement 1 and measurement 2

Sample_Data$difference <- Sample_Data$A - Sample_Data$B

Step 4: Calculate the limits of the agreement based on the chosen confidence interval

mean_difference <- mean(Sample_Data$difference)
lower_limit <- mean_difference - 1.96*sd( Sample_Data$difference )
upper_limit <- mean_difference + 1.96*sd( Sample_Data$difference )

Step 5: Create a scatter plot with the mean on the X-axis and the difference on the Y-axis. Mark the limits and the mean of difference.

ggplot(Sample_Data, aes(x = average, y = difference)) +
  geom_point(size=3) +
  geom_hline(yintercept = mean_difference, color= "red", lwd=1.5) +
  geom_hline(yintercept = lower_limit, color = "green", lwd=1.5) +
  geom_hline(yintercept = upper_limit, color = "green", lwd=1.5) +
  ggtitle("")+ 
       ylab("Difference")+
       xlab("Average")

Bland-Altman Plot Read More »

Larry Bird and Binomial Distribution

April 4, 2024

Following are the free throw statistics from basketball great Larry Bird’s two seasons.

Total pairs of throws: 338
Pairs where both throws missed: 5
Pairs where one missed: 82
Pairs where both made: 251

Test the hypothesis that Mr Bird’s free throw follows binomial distribution with p = 0.8.
H₀ = Bird’s free throw probability of success followed a binomial distribution with p = 0.8
H_A = The distribution did not follow a binomial distribution with p = 0.8

We will use chi-square Goodness of Fit to test the hypothesis. The probabilities of making 0, 1 and 2 free throws for a person with a probability of success of 0.8 is

bino_prob <- dbinom(0:2, 2, 0.8)

 0.04 0.32 0.64

The chi-square test is:

chisq.test(child_perHouse, p = bino_prob, rescale.p = TRUE)


	Chi-squared test for given probabilities

data:  child_perHouse
X-squared = 17.256, df = 2, p-value = 0.000179

Larry Bird and Binomial Distribution Read More »

This Sentence is False!

April 3, 2024

‘This sentence is false’ is an example of what is known as the Liar Paradox.

This sentence is false.

Look at the first option for the answer—true. To do that, we check what the sentence says about itself. It says about itself that it is false. If it is true, then it is false, which is a contradiction, and therefore, the answer ‘true’ is not acceptable.

The second option is false. Since the sentence claims about itself as false, then it’s false that it’s false, which again is a contradiction.

This Sentence is False! Read More »

The Z-score and Percentile

April 2, 2024

It has been found that the scores obtained by students follow a normal distribution with a mean of 75 and a standard deviation of 10. The top 10% end up in the university. What is the minimum mark for a student if she gets admission to the university?

The first step is to convert the percentage to the Z-score. It can be done in one of two ways.

qnorm(0.1, lower.tail = FALSE)

qnorm(0.9, lower.tail = TRUE)

1.28

Note that if you do not specify, the default for qnorm will be lower.tail = TRUE.

Z = (X – mean)/standard deviation
X = Z x standard deviation + mean
X = 1.28 x 10 + 75 = 87.8

The Z-score and Percentile Read More »

Test for Independence – Illustration

April 1, 2024

We have seen how R calculates the chi-squared test for independence. This time, we will estimate it manually while developing an intuition of the calculations. Here are the observed values.

	High School	Bachelors	Masters	Ph.d.	Total
Female	60	54	46	41	201
Male	40	44	53	57	194
Total	100	98	99	98	395

Now, the expected values are estimated by assuming independence, which allows us to multiply the marginal probabilities to obtain the joint probabilities.

First cell

The observed frequency of the female and high school is 60. The expected frequency, if they are independent, is the product of the marginals (being a female and being in high school): (201/395) x (100/395) x 395. The last multiplication with 395 is to get the frequency from the probability. (201/395) x (100/395) x 395 = 50.88. In the same way, we can estimate the other cells.

	High School	Bachelors	Masters	Ph.d.	Total
Female	50.88	49.87	50.38	49.87	201
Male	49.11	48.13	48.62	48.13	194
Total	100	98	99	98	395

chi-squared = sum(observed - expected)² / expected
= (60 - 50.88)²/50.88 +  (54 - 49.87)²/49.87 + (46 - 50.38)²/50.38 + (41 - 49.87)²/49.87 + (40 - 49.11)²/49.11 + (44 - 48.13)²/48.13 + (53 - 48.62)²/48.62 + (58 - 48.13)²/48.13

 8.008746

You can look at the chi-squared table for 8.008746 with degrees of freedom = 3 for the p-value.

Test for Independence – Illustration Read More »