--- title: "Confidence interval for a mean" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Confidence interval for a mean} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE,comment=NA,fig.width=7,fig.height=4) library(interpretCI) library(glue) ``` ```{r,echo=FALSE,message=FALSE} x=meanCI(mtcars,mpg) two.sided<-greater<-less<-FALSE if(x$result$alternative=="two.sided") two.sided=TRUE if(x$result$alternative=="less") less=TRUE if(x$result$alternative=="greater") greater=TRUE twoS="The null hypothesis will be rejected if the sample mean is too big or if it is too small." lessS="The null hypothesis will be rejected if the sample mean is too small." greaterS="The null hypothesis will be rejected if the sample mean is too big." ``` This document is prepared automatically using the following R command. ```{r,echo=FALSE} call=paste0(deparse(x$call),collapse="") x1=paste0("library(interpretCI)\nx=",call,"\ninterpret(x)") textBox(x1,italic=TRUE,bg="grey95",lcolor="grey50",width=6) ``` ## Problem ```{r,echo=FALSE} string=glue("An inventor has developed a new, energy-efficient lawn mower engine. From his stock of {x$result$n*100} engines, the inventor selects a simple random sample of {x$result$n} engines for testing. The engines run for an average of {round(x$result$m,2)} minutes on a single gallon of regular gasoline, with a standard deviation of {round(x$result$s,2)} minutes. What is the {(1-x$result$alpha)*100}% confidence interval for the average minutes? (Assume that run times for the population of engines are normally distributed.") textBox(string) ``` ## Confidence interval of mean The approach that we used to solve this problem is valid when the following conditions are met. - The sampling method must be simple random sampling. - The sampling distribution should be approximately normally distributed. Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval of mean. ### Raw data `r ifelse(is.na(x$data[1,1]),"Raw data is not provided.","The first 10 rows of the provided data is as follows.")` ```{r,echo=FALSE} if(!is.na(x$data[1,1])) { head(x$data,10) } ``` ### Sample statistics The sample size is `r x$result$n`, the sample mean is `r round(x$result$m,2)` and the standard error of sample is `r round(x$result$s,2)`. The confidence level is `r (1-x$result$alpha)*100` %. ### Find the margin of error Since we do not know the standard deviation of the population, we cannot compute the standard deviation of the sample mean; instead, we compute the standard error (SE). Because the sample size is much smaller than the population size, we can use the "approximate" formula for the standard error. $$ SE= \frac{s}{\sqrt{n}}$$ where **s** is the standard deviation of the sample, **n** is the sample size. $$SE=\frac{`r round(x$result$s,2)`}{\sqrt{`r x$result$n`}}=`r round(x$result$se,2)`$$ Find the critical probability(p*): ```{r,echo=FALSE} if(x$result$alternative=="two.sided"){ string=glue("$$p*=1-\\alpha/2=1-{x$result$alpha}/2={1- x$result$alpha/2}$$") } else{ string=glue("$$p*=1-\\alpha=1-{x$result$alpha}$$") } ``` `r string` The **degree of freedom**(df) is: $$df=n-1=`r x$result$n`-1=`r x$result$DF`$$ The **critical value** is the t statistic having `r x$result$DF` degrees of freedom and a cumulative probability equal to `r ifelse(x$result$alternative=="two.sided",1- x$result$alpha/2,1- x$result$alpha)`. From the t Distribution table, we find that the critical value is `r round(x$result$critical,3)`. ```{r,echo=FALSE} show_t_table(DF=x$result$DF,p=x$result$alpha,alternative=x$result$alternative) ``` ```{r,results='asis',echo=FALSE} if(x$result$alternative=="two.sided"){ string=glue("$$qt(p,df)=qt({1- x$result$alpha/2},{x$result$DF})={round(x$result$critical,3)}$$") } else { string=glue("$$qt(p,df)=qt({1- x$result$alpha},{x$result$DF})={round(x$result$critical,3)}$$") } ``` `r string` The graph shows the $\alpha$ values are the tail areas of the distribution. ```{r,echo=FALSE} draw_t(DF=x$result$DF,p=x$result$alpha,alternative=x$result$alternative) ``` Compute **margin of error**(ME): $$ME=critical\ value \times SE$$ $$ME=`r round(x$result$critical,3)` \times `r round(x$result$se,3)`=`r round(x$result$ME,3)`$$ ```{r,results='asis',echo=FALSE} if(two.sided) { string="The range of the confidence interval is defined by the sample statistic $\\pm$margin of error." } else if(less){ string="The range of the confidence interval is defined by the -$\\infty$(infinite) and the sample statistic + margin of error." } else{ string="The range of the confidence interval is defined by the sample statistic - margin of error and the $\\infty$(infinite)." } ``` Specify the confidence interval. `r string` And the uncertainty is denoted by the confidence level. ### Confidence interval of the mean ```{glue,results='asis',echo=FALSE} Therefore, the {(1-x$result$alpha)*100}% confidence interval is {round(x$result$lower,2)} to {round(x$result$upper,2)}. That is, we are {(1-x$result$alpha)*100}% confident that the true population mean is in the range {round(x$result$lower,2)} to {round(x$result$upper,2)}. ``` ### Plot You can visualize the mean difference: ```{r,echo=FALSE} plot(x) ``` ### Result of meanCI() ```{r,echo=FALSE} print(x) ``` ### Reference The contents of this document are modified from StatTrek.com. Berman H.B., "AP Statistics Tutorial", [online] Available at: https://stattrek.com/estimation/confidence-interval-mean.aspx?tutorial=AP URL[Accessed Data: 1/23/2022].