Top Special Offer! Check discount
Get 13% off your first order - useTopStart13discount code now!
This journal entails a reflection of the statistical concepts covered between Week 3 to Week 7.
Firstly, it was realized during the course that gathering of data is done after an individual has identified a specific research problem which he or she intends to solve. As such, the research problem should be tailored in such a manner that it ensures a reasonable and measurable data types can be collected. In collection of quantitative data, it was also noted that an individual should avoid certain errors which may undermine the output of data analysis. Some of errors associated with data collection included systematic errors, random errors, errors during transfer from raw data sheet to a new document, and rounding off numerical values (Zill, Wright, & Cullen 2011, p.45). Systematic errors were realized to be those that arise during sampling when an individual only collects data from either one ethnicity, one geographical region, among basketball plays, or a single gender (Zill, Wright, & Cullen 2011, p.46). As such, they were noted to undermine the results due to bias. For instance, measuring heights of basketball players only can give inaccurate value of the overall height of students in a school. Meanwhile, random errors were realized to be unavoidable during data collection but can be minimized using a larger sample size.
Secondly, the course contents showed that for the data collected to be useful, it should be analyzed and presented in tables, pie charts, bar graphs, or line graphs. In most cases, readers do may want to quickly understand about a study topic without having to read the whole document. Therefore, drawing graphs which show maybe the trend of inflation against years is crucial in enabling them see at a glance the overall findings of a research. However, it was also noted that the graph chosen to present data should be appropriate in capturing all the aspects of the data (Zill, Wright, & Cullen 2011, p.53). For instance, a pie chart cannot be used to show the correlation between two variables. Overall, the topic was insightful in relaying the key ideas of data collection and presentation.
Generally, the topic of probability was realized to be quite challenging compared to the others within the statistics course. At its core, probability was realized to involve using past events to predict the occurrence of future events. The likelihood of occurrence identified through probability values was realized to range from 0 to 1, with zero indicating no chance of occurrence and 1 indicating perfect confidence of reoccurrence (Simon 2007, p.34). Meanwhile, probability distribution was noted to refer to the various possible outcomes which can be obtained when different events occur. For instance the probability of getting one defective item after production of 1,000 such pieces is very high say, 0.7. However, the probability of getting fifty defective high is low say, 0.1. The idea of giving a range of probabilities is crucial in indicating the degree to which a particular system is efficient. Additionally, the concept of probability distribution was noted to include key ideas of “addition rule” when the term OR is used and ”multiplication rule” when the term AND is used (Simon 2007, p.12). This implied that the probability of obtaining one or fifty defective items from the manufacturing process is calculated by adding the probability of individual events which is 0.8 from the case study shown. The other key concepts noted in the topic included calculation of expected values, mean, variance, and standard deviation. The bottom line of the probability topic is that it helps individuals make better choices. For instance, by analyzing the average rate of deaths in a country due to road accidents, an insurance company can predict the future trends and decide on the amount of price to place on its insurance premiums. Similarly, they can help a manager detect the source of inefficiencies and put in place measures to overcome the challenges.
The main idea of sampling is that a small section of a large group is taken to analyze the different characteristics of the whole population (Chatfield 2018, p.35). For instance, a researcher cannot interview all Americans about a study topic to get their views. However, he or she can interview say 300 Americans taken from all the states in the US. Their views can then be used to infer how US citizens feel about an issue. Still, it is crucial to consider the sampling proportion to ensure reliability of the final output. Meanwhile, confidence interval entails providing a range of values consisting of a minimum and maximum values within which all values in the range are considered valid (Chatfield 2018, p.37). The concept of concept interval estimation is particularly crucial where there is manufacture of large quantities of an item. In such cases, there is always variations in manufacturing process which causes slight differences in the outputs from the set specification. For instance, a tin of margarine is specified to weigh 100 g during manufacturing. However, the actual weights of the tins churned out of the process may either fall below or above the set value. Therefore, it is crucial to come up with a confidence interval by calculating the standard deviation or margin of error which is allowed during manufacturing. This is crucial to avoid reject many tins which are either less or more than 100g by a few values and hence, ensure efficiency. During the calculation of standard deviation, sampling of the tins is done; not all the tins manufactured are taken for analysis. From the sampled tins, the mean weight and the standard deviation (SD) are calculated and then the confidence interval is estimated by subtracting SD from mean to obtain the minimum weight and adding SD to the mean to obtain the maximum weight. Any weight that falls in between is conserved to have passed the evaluation test and can be sold to consumers. However, any tin which is less than or more than the range is removed from the system and not sold. The confidence interval estimation is crucial in ensuring quality of items resonates with the set specifications (Chatfield 2018, p.42).
During the course, I realized that hypothesis testing is done to analyze the assumption made regarding a particular population parameter (Stat Trek 2018). As such, the first step in confirming or rejecting a claim is to come up with the hypotheses which often include the null and alternative hypotheses. The null hypothesis often explain that a particular claim about a sample will occur while the alternative hypothesis indicates that the claim will not happen. For instance, a claim can be made that the mean height of a group of children is 1.1 meters. The null hypothesis will agree with the claim and show that mean > 1.1 meters, while the alternative hypothesis will disagree with the claim and show that mean < 1.1 meters. A test statistics can then be chosen to analyze the hypotheses. Some of the popular test statistics which can be used include t-test or z-test. However, before the test statistics is done, an evaluation criterion is identified. The criterion shows that if the calculated test statistic is greater than the claim test statistic which is 1.1 meters in this case, then the null hypothesis is accepted. However, if it is lower than the claim, then the null hypothesis is rejected. Usually, the test statistics are evaluated based on a particular significance level. As such, the rejection or acceptance of the null hypothesis can also be done by comparing the p-values obtained with the significance level (Stat Trek 2018). The bottom line of carrying out hypothesis testing is to ensure quality assurance and that the claims made about the products being manufactured are true. If found to be untrue, necessary steps can be taken to correct the source of defectiveness.
From week 7, I realized that the unique aspect of two sample hypothesis testing is that it analyzes the claim made while comparing two sets of data. For instance, machine A and machine B which are used to produce plastic bottles can be compared on whether there is a significant difference in their rates of production (Hinton 2014, p.39). As such the concept helps in identifying whether there is a systemic error in a particular machine which needs to be address. However, the whole process of statistical testing follows the procedure of the hypothesis testing noted in Week 6, the only difference is that the calculation of means for the two variables are done. Meanwhile, the correlation concept was noted to show the degree to which one variable can be used to predict the second variable. For instance, a correlation coefficient of 0.99 shows that two variables have a strong relationships and that variable 1 can almost accurately be used to predict variable 2 (Hinton 2014, p.76). On the other hand, a low correlation coefficient of say, 0.1 show that two variables being considered have a weak relationship. The concept of regression was closely related with the correlation one. However, the unique aspect of regression is that it provided a linear equation in the form of {y = mx + c} which can be used to accurately predict the values of a variable using the corresponding values of a second variable (Hinton 2014, p.85). I realized that Week 7 statistical concepts extensively involved mathematical calculations and required me to be careful in avoiding calculation errors.
Chatfield, C., 2018. Statistics for technology: a course in applied statistics. Routledge.
Hinton, P.R., 2014. Statistics explained. Routledge.
Simon, M.K., 2007. Probability distributions involving Gaussian random variables: A handbook for engineers and scientists. Springer Science & Business Media.
Stat Trek., 2018. What is hypothesis testing? Retrieved from: https://stattrek.com/hypothesis-test/hypothesis-testing.aspx
Zill, D., Wright, W.S. and Cullen, M.R., 2011. Advanced engineering mathematics. Jones & Bartlett Learning.
Hire one of our experts to create a completely original paper even in 3 hours!