Top Special Offer! Check discount
Get 13% off your first order - useTopStart13discount code now!
The technology advancement has resulted in the accumulation of large content in both business and other organizations. The business activities in information technology age are characterized by generation, analysis and use of data over the transfer channels. Big data is a term used to refer to data that is in voluminous, with high transfer rate and of vast variety (Rajaraman 696). Another direction to identify the big data theory includes technology. Big data is not only voluminous and complex but also require innovative technology to analyze it (Zakir, et al. 82). Big data has become significant in various establishments both public and private which collect vast domain-specific data that comprise advantageous and valuable information such as medical informatics, cyber security, national intelligence, geospatial engineering and marketing. The following paper will examine the concept of big data analytics by exploring the characteristic of big data, types and techniques of analytics, frameworks used and emerging trends in big data analytics.
Big data analytic replicates problems which are associated with handling too enormous, very unstructured and speedy data management using traditional methods of managing data (Zakir, et al. 81). Huge transactions from many organizations result into complex and unprecedented range of data. The huge amount of useful data has become important in the organization today since the detailed application is giving organization competitive advantage. To realize the economic use of big data, analytics becomes necessary for performance evaluation and data utilization. Data sources today exceed the traditional database storage and retrieval and companies are receiving data from emails, sensor-generated information and mobile device which are transmitted in higher speed and are unstructured to fit in the traditional database systems. This reviews for development of new ways to handle this type of data.
Characteristics of Big Data
The word big data stresses the size or scope of data (Rajaraman 698). The term scope is relative and can only be measured with it context of use. Whatever data was considered big in 1960 may fit in a small storage device today. The characteristics listed are commonly referred to as the V’s of big data.
Volume
Big data describes the support of very huge data volumes. In 2016, the digital data created, replicated and expended amounted to 12zettabytes (zetta= 1021) (Rajaraman 699). The huge amount of available and consumed data is expected to double after every two years as Moore’s law dictate. The amount of data currently referred to as huge is some petabytes (peta=1015).
Variety
In a dataset, structural heterogeneity represents what is defined as the variety (Gandomi and Haider 138). In early information technology age, the dominant data was numeric. With time, the text was incorporate in data. Currently, the data being produced and used in information computing varies from text, numbers, videos, images and audios. Previously, the production of these types of data was structured due to separate use of each but today, they are used and shared making the set of data unstructured.
Velocity
Traditional databases are slow and they are becoming obsolete in big data management. The main challenge in using traditional relational database management system is their inability to handle semi-structured data and scalability problem (Hu, et al. 653). Most of the big data are real-time like the phone conversation, data recorded by sensors, video recording and streaming and they require a fast medium of transfer and a scalable storage. The huge nature and the requirement for fast transfer have led to the development of new ways to manage big data.
Veracity
The accuracy of big data is a huge challenge. Most of the real-time data is noisy and difficult to clarify. Some of the traffic received from the internet has no validity. Development of real-time software has made clarification more difficult due to the algorithm of alteration provided by some software. This problem removed the relevance of data mining which relies on datasets that are noise free and inability to handle huge data (Garcia, et al.).
Value
The data obtained is useless if it is not processed. The processing of huge unstructured data is difficult but computational hardware and software have improved drastically over time. New processors and distributed file systems are being used to enable the handling of big data.
Analyzing Big Data
Big data analytics is concerned with the extraction of actionable data from the huge amount of data available to make inferences in decision making and future trend predictions (Najafabadi, et al. 2). The analysis is based on the use of data, hypothetical formulations developed from the experience of using the data and correlation between different variables. There four different types of analysis:
Predictive Analysis
This method uses available data to predict what is expected in the future (Zakir, et al. 82). Big businesses are using a huge amount of data available to analyze the preference and expectation of the customers in the future. Also this data is used in the prediction of market dynamics and strategizing on the possible outcomes. Different methods such collaborative filtering and clustering algorithms are developed to show the customer preference and develop products and marketing strategies to cover a huge customer base.
Descriptive Analysis
The analysis provides a comprehensive representation of past and present data in form of graphs and charts. The method uses visualization techniques in the presentation of available data for easier understanding a good example being a demographic classification that represents gender, age, occupation after the census.
Discovery Analytics
This form of analysis is aimed at finding additional insights by analyzing data from different sources. It provides the relationship between different parameters in data analysis and provides an opportunity for developing new discoveries. Social media is a source of big data and analysis of customer comments can be used by companies to discover new trends in the market.
Prescriptive Analytics
This form of analysis develops the methodology to attain the solution of the current problem. It is applicable in businesses that require study of customer preference to enable them to maximize profit.
Using the provided modes on analysis, there are different techniques applied in big data analytics. These techniques include:
Text Analytics
This is a technique used to obtain textual data from structured or unstructured data sources that involves computation from statistical data, linguistic computation and machine learning (Gandomi and Haider 140). The methods used in text analytics include the extraction of information from unstructured data, summarizing text, question and answer method and opinion mining.
Multimedia Analytics
Multimedia content analytics denotes to mining remarkable information and accepting the semantics apprehended in audiovisual aided data (Hu et al. 675). Multimedia combines the audio and visual data which require combine or separate analysis. Audio analytics involves analysis of speech using a transcript-based approach and phonetic-based approach (Gandomi and Haider 141). The former involves indexing and searching from transcription voice messages while the later transform the speech to a series of phonemes. Although video analytics is in its development stage, it involves monitoring, analyzing and obtaining useful information from streaming videos (Gandomi and Haider 141).
Web Analytics
Web analytics involves obtaining and analyzing information from web documents for knowledge discovery (Hu et al. 140). It is categorized into three parts which are web content mining which deals with getting useful information from a website, structure mining which analyzes the underlying structure in the website and web usage mining which present web sessions secondary data.
Network Analytics
Social media platforms have been on a rise in the current internet age. These online social networks have developed with social media bring a huge amount of mixed unstructured data. Linkage data analysis is used in network analysis to provide a predictive link between social influences, detection and networking (Hu et al. 676). Social media analytics is a wide field which encamps analysis of textual and multimedia data in the social media platforms. (Gandomi and Haider 142; Hu et al. 676). An extension to social media uses both linked data analysis and content-based analytic which works on data posted by users on social platforms.
Big Data Analytics Frameworks
The first attempt to analyze data was done through distributed systems before the advent of big data concept (Garcia et al.). Most of the distributed systems have a complex system for data analysis and storage which overcomes the problems related to the relational database system on speed and scalability. Unfortunately, the big data phenomenon has surpassed the distributed system ability and there has been the development of new platforms to work on big data because they require additional algorithms to support preprocessing and analytical activities. Distributed systems would require redesigning to be able to handle the extra task.
The strength of a big data framework is measured using iterative task support, real-time processing, scalability, fault tolerance, data input/ output performance, and data size supported (Sighn and Reddy). Sighn and Reddy describe the strength as follows: Scalability is described as the ability of a system to comfortably accommodate the processing activities with a growing amount of work or adding more hardware to improve performance. Data input/output represents the rate at which data moves in and out of a peripheral device. A fault tolerant system is described as a system which continues to function normally even with one or more of its components not working properly. A system which has the ability to process and provide results within stipulated time constraint can be classified as a real-time system. Data size describes the size of dataset the system can handle efficiently. Iterative task support illustrates the capacity of a system to proficiently sustain iterative processes. When choosing a system to use in large-scale data analysis, it is advisable to check the strength of the platform the system being applied is using. The platforms used, which mostly are open source has the advantage and disadvantages when using them.
MapReduce was the first revolutionary tool to be developed to analyzed largescale data (Garcia et al. 2). The tool was developed to automatically process and create numerous datasets in a distributed way. Sufficient statistics would be sourced by Map function while reduce technique would combine them to provide the required outcome (Zakir et al. 86). The system would provide the user with scalable and distributed tool which would remove the worries of data portioning, communication and failure recovery. The most popular implementation of MapReduce emerged as Apache Hadoop (Garcia et al. 2). Hadoop has also huge use because of its advantage in scalability, flexibility, cost efficiency and tolerance to faults (Hu et al. 677). Despite its huge application in large data systems, MapReduce was not able to respond to iterative and online processing. This led to an upgrade from Hadoop to Spark which was developed to use the in-memory primitive technique in processing data (Karau et al.). Its ability to repeatedly reuse the data loaded in the memory allows it to overcome the setback of iterative and online processing of MapReduce. Apache Spark is a general purpose platform and allows easier implementation in a distributed system.
Different competitors to the Apache Spark have recently been developed. A fault-tolerant and speedy Apache Storm has been developed for real-time processing (Garcia et al.). An important feature introduced in the Spark is Resilient Distributed Datasets (RDDs) (Marcu et al.). RDDs are a set of in-memory data structures that facilitate intermediate data transfer through a group of nodes for the efficient sustenance of iterative algorithms. Spark has the ability to read data from Hadoop Distributed File System since it allows running of Hadoop Yarn manager making it exceptionally adaptable to run in diverse systems (Sighn and Reddy 7). The latest development from Apache which is aimed at bridging the gaps of Spark is Apache Flink developed as a distributed platform for streaming and batch data processing (Garcia et al.). A noticeable difference between Flink and Spark is that the former uses pipeline execution compared to staged one in Spark while Flink has automatic optimization and requires fewer memory thresholds configurations (Marcu et al.).
Trends in Big Data Analytics
Big data analytics is considered as a market changer by information analysts due to the ability to improve efficiency and effectiveness it brings from the prediction received (Wamba et al. 4). In the computing world, the availability of big data has changed with the growth of mobile computing and social digital media from the devices are generating a huge amount of data. Scalable hardware and efficient software need to be developed to handle the challenges. The storage in information technology is shifting to cloud computing. Although cloud computing emerged before the big data concept, they are combining to provide different geoscience, business and astronomy solutions (Yang et al. 15) Big data analytical platforms have to provide innovative ways in handling online learning resources (Huda et al. 27).
Conclusion
The big data analytics involves analysis of unstructured data that is in huge volume, require speedy transfer and different variety. The analysis has been developed from the failure of distributed systems to handle largescale data. The variety varies from the text, audio and video. The types of analysis are developed according to data, the experience of using the data, the outcome expected and how the analysis will be used. Different platforms have been developed to aid the analysis of big data, which have built their strength from the first implementation of MapReduce platform. Apache technology has been active in developing different platforms including Spark, Flint and Hadoop. New technologies such as online learning and cloud computing have received great development by integrating big data analytics.
Works Cited
Gandomi, Amir, and Murtaza Haider. “Beyond the hype: Big data concepts, methods, and analytics.” International Journal of Information Management 35.2 (2015): 137-144.
García, Salvador, et al. “Big data preprocessing: methods and prospects.” Big Data Analytics 1.1 (2016): 9.
Hu, Han, et al. “Toward scalable systems for big data analytics: A technology tutorial.” IEEE access 2 (2014): 652-687.
Karau, Holden, et al. Learning spark: lightning-fast big data analysis. “ O’Reilly Media, Inc.”, 2015.
Najafabadi, Maryam M., et al. “Deep learning applications and challenges in big data analytics.” Journal of Big Data 2.1 (2015): 1.
Rajaraman, V. “Big data analytics.” Resonance 21.8 (2016): 695-716.
Singh, Dilpreet, and Chandan K. Reddy. “A survey on platforms for big data analytics.” Journal of big data 2.1 (2015): 8.
Wamba, Samuel Fosso, et al. “Big data analytics and firm performance: Effects of dynamic capabilities.” Journal of Business Research 70 (2017): 356-365.
Yang, Chaowei, et al. “Big Data and cloud computing: innovation opportunities and challenges.” International Journal of Digital Earth 10.1 (2017): 13-53.
Zakir, Jasmine, Tom Seymour, and Kristi Berg. “BIG DATA ANALYTICS.” Issues in Information Systems 16.2 (2015).
Hire one of our experts to create a completely original paper even in 3 hours!