Top Special Offer! Check discount
Get 13% off your first order - useTopStart13discount code now!
Using the Random Number Generator feature in Excel, 5000 samples of size 10 are generated. Out of the 13 possible integer values ranging from 0 to 12, the distribution of the sample is assumed to follow a binomial distribution of p = 0.25.
The mean of a binomial distribution is given as the product of the number of trials and the p-value. In this case,
The variance of a binomial distribution is given by the following formula
However, when we calculate the probability of each individual value, then we find their mean and standard deviation, only the mean is equal with what we have calculated above, whereas the variance is not.
Question 1 – (b)
The test statistics of each of the samples is calculated in excel as follows
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample8
sample9
sample10
mean (xbar)
2.9844
2.9642
3.0004
2.9834
3.0004
2.9784
3.0008
3.0028
3.0154
2.9698
median (Md)
3
3
3
3
3
3
3
3
3
3
z
-0.73539105
-0.07547
0.000843
-0.035
0.000843
-0.04554
0.001687
0.005903
0.032466
-0.06367
unbiased s2
1.481011586
1.500056
1.498015
1.481878
1.502415
1.518418
1.487293
1.531555
1.46942
1.479907
t-test
-0.90642186
-2.06688
0.023109
-0.96424
0.023075
-1.23949
0.046385
0.159984
0.898324
-1.75539
chi-square
3290.478631
3332.791
3328.256
3292.405
3338.033
3373.588
3304.436
3402.775
3264.724
3288.025
Histograms
Mean of means histogram
It is normally distributed
Median histogram
The histogram of the medians is also normally distributed
Histogram of z is also normally distributed
Histgram of t is normally distributed,
Histogram of chi-square
Descriptive statistics
The descriptive statistics of each of the sample is calculated in excel and displayed below.
MEANS
MEDIAN
Z
Mean
2.99
Mean
2.9111
Mean
2.243209
Standard Error
0.006577
Standard Error
0.008288
Standard Error
0.014639
Median
3
Median
3
Median
2.1
Mode
2.9
Mode
3
Mode
1.877778
Standard Deviation
0.465076
Standard Deviation
0.586017
Standard Deviation
1.035133
Sample Variance
0.216295
Sample Variance
0.343415
Sample Variance
1.0715
Kurtosis
-0.0789
Kurtosis
-0.01959
Kurtosis
1.772618
Skewness
0.08915
Skewness
0.087995
Skewness
0.993063
Range
3.1
Range
4
Range
8.4
Minimum
1.5
Minimum
1
Minimum
0.222222
Maximum
4.6
Maximum
5
Maximum
8.622222
Sum
14950
Sum
14555.5
Sum
11216.04
Count
5000
Count
5000
Count
5000
Z
-0.4714
-4.19079
-35.6755
T
-1.52041
-10.727
-51.697
CHI
480.56
762.9929
2380.635
T
CHI
Mean
-0.07672
Mean
8.972836
Standard Error
0.015854
Standard Error
0.058556
Median
0
Median
8.4
Mode
0
Mode
7.511111
Standard Deviation
1.121081
Standard Deviation
4.140531
Sample Variance
1.256824
Sample Variance
17.144
Kurtosis
1.193707
Kurtosis
1.772618
Skewness
-0.26572
Skewness
0.993063
Range
10.71052
Range
33.6
Minimum
-6.12795
Minimum
0.888889
Maximum
4.582576
Maximum
34.48889
Sum
-383.615
Sum
44864.18
Count
5000
Count
5000
-145.038
281.5622
-194.06
102.0022
2792.383
38090.16
Shape of the histograms
The shape of the histogram shows that they are positively skewed. This means that the it has a long right tail.
Based on what we discussed in class, when the p = 0.5, then the distribution of the binomial distribution is similar to the normal distribution, i.e. bell-shaped. However, when p < 0.5, as in this case, the shape of the histogram is positively skewed and larger values of p > 0.5 makes the shape to be negatively skewed.
The tabulated values of the test statistics are as follows
Z0.025 = 1.96 (two-tailed); Z0.05
= 1.64 (one-tailed).
t4999 = 1.64
χ2 (0.05) = 124
The results calculated are different from those in the one from the theory (tabulated) because of the standard error associated with sampling.
Question 2
(a) Using a larger sample (n = 50) would make the results more accurate than using a sample of n = 10. This is due to the laws of large number which state that as the sample size increases, the statistics will be closer to the parameters. Therefore, I do not expect the results from question 1 to be the same as in question 2
(b) Histograms
Means of means
The histogram of means is as displayed below. The shape of the histogram shows that the it is similar to an normal distribution.
The descriptive statistics of the mean is as follows
DESCRIPTIVE STATISTICS OF MEANS
Mean
2.993536
Median
3
Mode
2.98
Standard Deviation
0.212386907
Sample Variance
0.045108198
Kurtosis
0.010181873
Skewness
0.042006089
T-TEST
-7.164994654
Z
-97.006464
CHI
0.982356319
Clearly, the mean and median of the means of the 5000 samples, each with n=50 are similar to the mean and median of the entire population.
Mean of medians
The histogram of the medians of the 5000 samples with each n=50 shows that the median is 3.
As for the descriptive statistics,
DESCRIPTIVE STATISTICS OF MEDIAN
Mean
2.9486
Median
3
Mode
3
Standard Deviation
0.260905451
Sample Variance
0.068071654
T-TEST
-13.93044431
Z
-2.423019237
CHI-SQUARE
1.482449361
It is also clear to see that the mean and median of the medians is approximately equal to the population mean and median.
Question 3
(a) The sample mean of and the sample median Md can both be an unbiased estimator of the population mean µ when the sample is size is 50 and above because of the law of large number. It states that a regardless of the distribution of a population, a sample taken from it will be normally distributed when it is of size 30 and above.
(b) The results from question 1 uses a sample of size 10 whereas the sample size of question 2 uses a sample size of 50. Because the sample size of question 1 is less than 30, the sample mean of and the sample median Md are not unbiased estimators of the population mean µ. On the other hand, the sample size of question 2 are more than 30 in size and therefore the of and the sample median Md are unbiased estimator of the population mean µ.
(c) The values obtained from question 1 are not consistent estimator because the sample size is smaller and thus, they are not equal, or near the true population parameters. However, since the sample size of question 2 is large enough, then the estimators are approximately equal to the population parameters and therefore, they are consistent estimators.
(d) Out of several estimators, the efficient estimator is the one which has the lowest variance, meaning that it has the smallest deviation from the population parameter is estimating. Out of the estimator in question 1 (n=10) and question 2 (n=50), the latter one is the most efficient estimator. This is because it has a larger sample size than the prior one.
Question 4
In the dataset provided in the excel file about the amount of times customers take to pay their accounts, there are two sets of customers namely country customers and city customers. The six steps procedure will be used to calculate the problem.
(a) The use of a higher value of significance level will reduce the chances of committing a type 1 error since the rejection region is bigger than when we use 0.05 or 0.01.
(b) The descriptive statistics of the city customers and the country customers are as follows.
CITY
COUNTRY
Mean
34.93913043
51.65882
Standard Error
0.713847661
1.467439
Median
35
53
Mode
28
55
Standard Deviation
7.655163326
13.52912
Sample Variance
58.60152555
183.037
Kurtosis
0.711691012
0.920904
Skewness
0.0320181
-0.02187
Range
45
77
Minimum
14
16
Maximum
59
93
Sum
4018
4391
Count
115
85
The histogram of the city customers is
(c). The population mean of the past and standard deviation of the city customers is given as follow: µ = 34, σ = 6. The z-statistic is the appropriate statistic test for this problem. The assumption of the z-score are as follows
The distribution of the sampled data is normal. This assumption is fulfilled by the histogram displayed above.
(i) the six-step procedure
Step 1: has the mean of the time of city customer changed from 34?
The hypothesis which we will be testing will be
H0: µ = 34
HA: µ ≠ 34
Step 2:
The variance is known to be 6 and according to the histogram, it is normally distributed. Therefore, the appropriate test will be the standardized normal which is
The standardized normally distributed N[0.1]
Step 3: the level of significance is α = 0.05
Step 4: decision rule
If z>zα = 1.645, we reject the H0
Step 5: calculating the statistic
= 1.26
Step 6: conclusion
Since Z(1.26) < Zα
(1.645), we fail to reject the null hypothesis and conclude that there was no change.
(ii) the appropriate p-value
Our null hypothesis states that H0: µ = 34, whereas the alternative hypothesis is µ ≠ 34. Therefore, we are using the two-tailed test. We will calculate the p-value at 0.05 level of confidence. If p-value < 0.05, we do not reject the null hypothesis. If the p-value > 0.05, we reject the null hypothesis.
Step 1: Is the mean time of the city customer different from 34
H0: µ = 34
HA: µ ≠ 34
Step 2: test statistic
The t-statistic will be used, which is , with a t-distribution of n – 1 degrees of freedom.
Step 3: level of significance is α = 0.05
Step 4: decision rule:
We will reject the H0 if t < - tα, n-1 = t0.025, 114 = 1.982.
Step 5: the statistic is
t = = 1.26.
step 6: since the t (1.26) >0.025, we fail to reject the null hypothesis and conclude that since we do not have sufficient evidence, there is no difference in mean.
(d). the six-step procedure
Step 1: hypothesis
The question which we will be answering will be:
Is the time taken to pay accounts different from city customers and country customers?
In our case, the city customers will be denoted as X1 whereas the country customers X2. The null hypothesis will be as follows
H0: µ1 - µ2 = 10
H1: µ1 - µ2 > 10
Step 2: Test statistic and sampling distribution
Since the variance is known and equal (σ2 = 6), we will use the pooled sample variance,
n1 = 115, n2 = 85
= = 36
The degrees of freedom will be n1
+ n2 – 2 which will be 115 + 85 – 2 = 198
Step 3: Level of significance
The level of significance which we will use in this problem will be α = 10%
Step 4: Decision rule
In our case, we are conducting a two-tailed test because the alternative hypothesis states that H1: µ1 ≠ µ2. The degrees of freedom we will use will be 198.
The null hypothesis will be rejected if our test statistic is below -1.645 or above 1.645.
Step 5: calculating the t-statistic
The pooled variance has already been calculated as being 36, the test statistic will be calculated as follows
= = = -19.8
Step 6: Conclusion
Since t = -19.8 < -1.645, we do not reject the null hypothesis at 10% significance level. We therefore do not have sufficient evidence at 10% to conclude that the time it takes for the city customers and country customers is different.
(e) Testing the population variance
Step 1: Stating the hypothesis
H0: σ2 = 36;
H1: σ2 ≠ 36
Step 2: test statistic
The appropriate test statistic which will be used will be the χ2 statistic:
χ2 =
The chi-square test is with 115 – 1 = 114 degrees of freedom. The assumption of this test is that the dataset is normally distributed. The histogram above shows that the data follows a normal distribution.
Step 3: Level of significance choses will be α = 0.05
Step 4: decision rule: Reject the null hypothesis H0 is > χ2α, n – 1
In our case tabulated χ2, α = 0.025, df = 114 = 130
Step 5: Calculating the test statistic
Χ2 = = 185.81
Step 6: decision
Since X2(185.81) > X2, (α = 0.025, df = 114) = 130, we therefore reject the null hypothesis and conclude that we have enough evidence to conclude that the variance of the sample is different from the population variance.
(f) In the past, country customers take 10 days more than the average city customer. Since we assume there is equal variance between the city and the country client, we use the pooled variance.
Step 1: is the time taken by the country client more than 10 days when compared to the city customers?
Assume country customers are represented by 1, and the city customer are represented by 2
H0: µ1 - µ2 = 10
HA: µ1 - µ2 ≠ 10
Step 2: The test statistics and sampling distribution
The pooled variance will be calculated as follows
S2p = = 36
Test statistic = = -19.6
Step 6: the calculated t = -19.6, whereas the tabulated t (0.05, 198) = 1.653
Since t (calculated) < t (tabulated), we fail to reject the null hypothesis and conclude that the difference in time between the city customers and country customers is still 10 days.
(g) The finance director would like to test whether there is a difference in variance between the city and country customer in paying their accounts
Step 1: Is the variance of the city customer different from the variance of the country customers
Let the city customers be denoted by X1 and those of country be X2.
H0: ; HA: ≠ 1
Step 2: The test statistic
Since the population are normally distributed, we will use the F statistic which can be expressed through the following relationship
F = where s1 is for the city customers and s2 is for country customers.
The degrees of freedom of the F distribution will be n1 – 1 for the numerator and n2 – 1 for the denominator
Step 3: level of significance
The standard significance which will be used will be 5% i.e. α = 0.05
Step 4: since it is a two tailed test, the null hypothesis will be rejected if the calculated F is below the tabulated F0.975, 114, 84 = 2.98
Step 5
S21 = 58.60152555 s22
= 183.037
The test statistic = 58.6/183
= 23.9
Step 6: decision
Since the calculated F (23.9) > tabulated F (2.98), we fail to reject the null hypothesis and conclude that the variance of the city customers and the country customers is equal to 1.
Hire one of our experts to create a completely original paper even in 3 hours!