Top Special Offer! Check discount
Get 13% off your first order - useTopStart13discount code now!
Statisticians and researchers in the modern era agree that most of the phenomena in the world are normally distributed. In this regard, the empirical technique can always be applied in the determination of the probability of an occurrence from the population. For instance, we can calculate the probability of download time for a commercial tax preparation site having the mean of 2.0 seconds and a standard deviation of 0.5 seconds. It should be noted that the first step involves the standardization of the observed score after which the probability is interpreted by the help of the Z-probability statistical table. The questions bellow will be answered with respect to the mean and standard deviation mentioned above. Therefore, what is the probability that the download time is:
(a) above 1.8 seconds
Solution
Z-score =
From the above diagram it is evident that the shaded region represents the probability that the download time is greater than 1.8 seconds.
(b) Between 1.5 and 2.5 seconds?
Solution
In this regard, the probability is given by,
(c) 99% of the download times are slower (a higher number of seconds taken to download) than how many seconds?
Solution
We proceed to determine the z-score associated with the probability that the download time will be 99%. From z-statistical table, tit is evident that the z-score is given by,
Therefore, the observed value above the probability of 99% is obtained by substituting the z-score, mean and standard deviation value in the z-score formula.
The results imply that 99% of all the download times are slower than 3.2 seconds.
Part B
The analysis of raw data is fundamental not only for researchers but also to the citizens globally since inferences can be made from the results. Moreover, results from the analysis can be used by decision makers to formulate policies and forecast the events in the future. In this case, the data obtained from the cost of electricity in a big city is employed to determine if it is indeed normally distributed. Apparently, a sample of 50 observations from the year 2005 is used in this study. This will, in turn, enable the researchers to identify if the electricity company is either fair or biased regarding charges on households.
Box Plot
The box plot is often used to check the distribution of the observations in a sample. It is one of the visual aids in data analysis that helps to determine if the data is skewed or normally distributed by observing the variability of the median from the first and the third quartiles.[1]
In this regard, we first arrange the data in ascending order to determine the values of the quartiles.
Quartiles
From the box plot graph above, it is evident that the utility data set follows a normal distributed as the whiskers are of equal lengths. Moreover, the median value is almost at the center between the first and third quartile values which means that the data is evenly spread on both sides of the mean.
Histogram
Similarly, the excel spreadsheet was used in developing the histogram in representing the cost of electricity in the sample.
Bin
Frequency
Cumulative %
82
1
2.00%
101
3
8.00%
119
7
22.00%
138
8
38.00%
157
12
62.00%
176
10
82.00%
194
5
92.00%
213
3
98.00%
232
1
100.00%
More
0
100.00%
The data in the table above, was used in drawing the histogram for electricity cost below,
The graph above depicts data that the electricity cost in the sample is approximately normally distributed. In this regard, it can be deduced that the number of observations on the left and right with respect to the mean is equal. Therefore, most of the electricity bills in the city are clustered around the mean. Apparently, there are no extreme observations in the data set as all the individual data points are significantly close to one another.
Theoretical Properties
Descriptive Statistics
Utility Charge
Mean
147.06
Standard Error
4.4818
Median
148.5
Mode
130
Standard Deviation
31.6914
Sample Variance
1004.3433
Kurtosis
-0.5442
Skewness
0.0158
Range
131
Minimum
82
Maximum
213
Sum
7353
Count
50
Confidence Level (95.0%)
9.0066
From the table above, there mean is slightly less than the median by $1.44. This implies that a majority of the individuals living in the city where the sample was collected pay their electricity bills closely to the average value of $179.06. The low skewness value of 0.0158 implies that there is no significant difference between the left and the right tails in the sample.
Inter-quartile Range (IQR)
According to the An introduction to statistical methods and data analysis by Ott and Michael (2015), the difference between the third and the first quartile values is referred to as IQR.[2]
Indeed, the IQR is 1.33 times larger than the standard deviation in the sample.
Range
The range is 131 from the descriptive statistics table above. In the bid to compare the range and the standard deviation we divide the former by the latter as shown below,
Thus, the range is 4.13 bigger than the standard deviation.
Empirical Formula
Interval
Percentage
Therefore, at least 66% of the data points lie in the interval (115.37, 178.75) which represents the observations that are one standard deviation from the mean on both sides. On the other hand, 80% of the data points lie within 1.28 standard deviations from the mean having a range of (106.50, 187.62) and 2% of the observations in the sample fall outside 2 standard deviations from the mean.
Normal Probability Plot
Number
Ordered X
Ordered probability
Ordered Z
1
82
0.01
-2.32635
2
90
0.03
-1.88079
3
95
0.05
-1.64485
4
96
0.07
-1.47579
5
102
0.09
-1.34076
6
108
0.11
-1.22653
7
109
0.13
-1.12639
8
111
0.15
-1.03643
9
114
0.17
-0.95417
10
116
0.19
-0.8779
11
119
0.21
-0.80642
12
123
0.23
-0.73885
13
127
0.25
-0.67449
14
128
0.27
-0.61281
15
129
0.29
-0.55338
16
130
0.31
-0.49585
17
130
0.33
-0.43991
18
135
0.35
-0.38532
19
137
0.37
-0.33185
20
139
0.39
-0.27932
21
141
0.41
-0.22754
22
143
0.43
-0.17637
23
144
0.45
-0.12566
24
147
0.47
-0.07527
25
148
0.49
-0.02507
26
149
0.51
0.025069
27
149
0.53
0.07527
28
150
0.55
0.125661
29
151
0.57
0.176374
30
153
0.59
0.227545
31
154
0.61
0.279319
32
157
0.63
0.331853
33
158
0.65
0.38532
34
163
0.67
0.439913
35
165
0.69
0.49585
36
166
0.71
0.553385
37
167
0.73
0.612813
38
168
0.75
0.67449
39
171
0.77
0.738847
40
172
0.79
0.806421
41
175
0.81
0.877896
42
178
0.83
0.954165
43
183
0.85
1.036433
44
185
0.87
1.126391
45
187
0.89
1.226528
46
191
0.91
1.340755
47
197
0.93
1.475791
48
202
0.95
1.644854
49
206
0.97
1.880794
50
213
0.99
2.326348
From the graph above, the scatter plots fall along the trend line which justified the fact that the data is normally distributed. Besides, it should be noted that most of the non-normal distributions have an s-shaped graph regarding normal probability plot.
Conclusion
Indeed the cost of electricity in the sample is approximately normal from the analysis sections above. This implies that the company offering the electricity fairly charges its customers. Therefore, it can be concluded that the cost of electricity for individuals living in a one bedroomed house in the city under study also follows a normal distribution. However, this result cannot be used to determine the normality of other cities in the country because of the varying nature of the rates as one move from once city to another.
Bibliography
Ott, R. Lyman, and Micheal T. Longnecker. An introduction to statistical methods and data analysis. Nelson Education, 2015.
Triola, Mario F. Elementary statistics. Reading, MA: Pearson/Addison-Wesley, 2006.
[1] Mario F. Triola,
Elementary statistics. Reading, MA: Pearson/Addison-Wesley, 2006
[2] Lyman R. Ott, and Longnecker T. Micheal, An introduction to statistical methods and data analysis. Nelson Education, 2015.
Hire one of our experts to create a completely original paper even in 3 hours!