Demystifying Data Analysis

117 views 4 pages ~ 1053 words Print

Statisticians and researchers in the modern era agree that most of the phenomena in the world are normally distributed. In this regard, the empirical technique can always be applied in the determination of the probability of an occurrence from the population. For instance, we can calculate the probability of download time for a commercial tax preparation site having the mean of 2.0 seconds and a standard deviation of 0.5 seconds. It should be noted that the first step involves the standardization of the observed score after which the probability is interpreted by the help of the Z-probability statistical table. The questions bellow will be answered with respect to the mean and standard deviation mentioned above. Therefore, what is the probability that the download time is:

(a) above 1.8 seconds

Solution

Z-score =  

From the above diagram it is evident that the shaded region represents the probability that the download time is greater than 1.8 seconds.

(b) Between 1.5 and 2.5 seconds?

Solution

                                                           

In this regard, the probability is given by,

(c) 99% of the download times are slower (a higher number of seconds taken to download) than how many seconds?

Solution

We proceed to determine the z-score associated with the probability that the download time will be 99%. From z-statistical table, tit is evident that the z-score is given by,

Therefore, the observed value above the probability of 99% is obtained by substituting the z-score, mean and standard deviation value in the z-score formula.

The results imply that 99% of all the download times are slower than 3.2 seconds.

Part B

The analysis of raw data is fundamental not only for researchers but also to the citizens globally since inferences can be made from the results. Moreover, results from the analysis can be used by decision makers to formulate policies and forecast the events in the future. In this case, the data obtained from the cost of electricity in a big city is employed to determine if it is indeed normally distributed. Apparently, a sample of 50 observations from the year 2005 is used in this study. This will, in turn, enable the researchers to identify if the electricity company is either fair or biased regarding charges on households.

Box Plot

The box plot is often used to check the distribution of the observations in a sample. It is one of the visual aids in data analysis that helps to determine if the data is skewed or normally distributed by observing the variability of the median from the first and the third quartiles.[1]

In this regard, we first arrange the data in ascending order to determine the values of the quartiles.

Quartiles

 

From the box plot graph above, it is evident that the utility data set follows a normal distributed as the whiskers are of equal lengths. Moreover, the median value is almost at the center between the first and third quartile values which means that the data is evenly spread on both sides of the mean.

Histogram

            Similarly, the excel spreadsheet was used in developing the histogram in representing the cost of electricity in the sample.

Bin

Frequency

Cumulative %

82

1

2.00%

101

3

8.00%

119

7

22.00%

138

8

38.00%

157

12

62.00%

176

10

82.00%

194

5

92.00%

213

3

98.00%

232

1

100.00%

More

0

100.00%

            The data in the table above, was used in drawing the histogram for electricity cost below,

The graph above depicts data that the electricity cost in the sample is approximately normally distributed. In this regard, it can be deduced that the number of observations on the left and right with respect to the mean is equal. Therefore, most of the electricity bills in the city are clustered around the mean. Apparently, there are no extreme observations in the data set as all the individual data points are significantly close to one another.

Theoretical Properties

Descriptive Statistics

Utility Charge

Mean

147.06

Standard Error

4.4818

Median

148.5

Mode

130

Standard Deviation

31.6914

Sample Variance

1004.3433

Kurtosis

-0.5442

Skewness

0.0158

Range

131

Minimum

82

Maximum

213

Sum

7353

Count

50

Confidence Level (95.0%)

9.0066

From the table above, there mean is slightly less than the median by $1.44. This implies that a majority of the individuals living in the city where the sample was collected pay their electricity bills closely to the average value of $179.06. The low skewness value of 0.0158 implies that there is no significant difference between the left and the right tails in the sample.

Inter-quartile Range (IQR)

            According to the An introduction to statistical methods and data analysis by Ott and Michael (2015), the difference between the third and the first quartile values is referred to as IQR.[2]

Indeed, the IQR is 1.33 times larger than the standard deviation in the sample.

Range

The range is 131 from the descriptive statistics table above. In the bid to compare the range and the standard deviation we divide the former by the latter as shown below,

Thus, the range is 4.13 bigger than the standard deviation.

Empirical Formula

Interval

Percentage

Therefore, at least 66% of the data points lie in the interval (115.37, 178.75) which represents the observations that are one standard deviation from the mean on both sides. On the other hand, 80% of the data points lie within 1.28 standard deviations from the mean having a range of (106.50, 187.62) and 2% of the observations in the sample fall outside 2 standard deviations from the mean.

Normal Probability Plot

Number

Ordered X

Ordered probability

Ordered Z

1

82

0.01

-2.32635

2

90

0.03

-1.88079

3

95

0.05

-1.64485

4

96

0.07

-1.47579

5

102

0.09

-1.34076

6

108

0.11

-1.22653

7

109

0.13

-1.12639

8

111

0.15

-1.03643

9

114

0.17

-0.95417

10

116

0.19

-0.8779

11

119

0.21

-0.80642

12

123

0.23

-0.73885

13

127

0.25

-0.67449

14

128

0.27

-0.61281

15

129

0.29

-0.55338

16

130

0.31

-0.49585

17

130

0.33

-0.43991

18

135

0.35

-0.38532

19

137

0.37

-0.33185

20

139

0.39

-0.27932

21

141

0.41

-0.22754

22

143

0.43

-0.17637

23

144

0.45

-0.12566

24

147

0.47

-0.07527

25

148

0.49

-0.02507

26

149

0.51

0.025069

27

149

0.53

0.07527

28

150

0.55

0.125661

29

151

0.57

0.176374

30

153

0.59

0.227545

31

154

0.61

0.279319

32

157

0.63

0.331853

33

158

0.65

0.38532

34

163

0.67

0.439913

35

165

0.69

0.49585

36

166

0.71

0.553385

37

167

0.73

0.612813

38

168

0.75

0.67449

39

171

0.77

0.738847

40

172

0.79

0.806421

41

175

0.81

0.877896

42

178

0.83

0.954165

43

183

0.85

1.036433

44

185

0.87

1.126391

45

187

0.89

1.226528

46

191

0.91

1.340755

47

197

0.93

1.475791

48

202

0.95

1.644854

49

206

0.97

1.880794

50

213

0.99

2.326348

            From the graph above, the scatter plots fall along the trend line which justified the fact that the data is normally distributed. Besides, it should be noted that most of the non-normal distributions have an s-shaped graph regarding normal probability plot.

Conclusion

            Indeed the cost of electricity in the sample is approximately normal from the analysis sections above. This implies that the company offering the electricity fairly charges its customers. Therefore, it can be concluded that the cost of electricity for individuals living in a one bedroomed house in the city under study also follows a normal distribution. However, this result cannot be used to determine the normality of other cities in the country because of the varying nature of the rates as one move from once city to another.

Bibliography

Ott, R. Lyman, and Micheal T. Longnecker. An introduction to statistical methods and data analysis. Nelson Education, 2015.

Triola, Mario F. Elementary statistics. Reading, MA: Pearson/Addison-Wesley, 2006.

[1] Mario F. Triola,

Elementary statistics. Reading, MA: Pearson/Addison-Wesley, 2006

[2] Lyman R. Ott, and Longnecker T. Micheal, An introduction to statistical methods and data analysis. Nelson Education, 2015.

September 25, 2023
Subcategory:

Medicine Math

Subject area:

Data Analysis Statistics

Number of pages

4

Number of words

1053

Downloads:

47

Writer #

Rate:

4.7

Expertise Statistics
Verified writer

Clive2020 is an excellent writer who is an expert in Nursing and Healthcare. He has helped me earn the best grades with a theorists paper and the shadowing journal. Great job that always stands out!

Hire Writer

Use this essay example as a template for assignments, a source of information, and to borrow arguments and ideas for your paper. Remember, it is publicly available to other students and search engines, so direct copying may result in plagiarism.

Eliminate the stress of research and writing!

Hire one of our experts to create a completely original paper even in 3 hours!

Hire a Pro