Top Special Offer! Check discount
Get 13% off your first order - useTopStart13discount code now!
Kumar Singh and Swaroop (2013) define data mining as a process that uses competent knowledge discovery approaches to mine for implicit, previously unknown, and possibly important information from large databases. Tavani (2016) proposes that the examination of implicit patterns that might be discovered in data results in the indirect collecting of personal information. He goes on to say that when these data mining activities are carried out, it results in the creation of new and, at times, non-obvious categories of data. As a result, there occurs linkage of individuals that have had their data mined to groups that are newly developed which may not have been imagined to exist earlier in advance.
As the data mining technology field advances, it becomes a necessity to comprehend the various methods employed in data mining, probable misuse of the technology as well as privacy issues related to the technology. This paper endeavors to examine the history of data mining, current issues and privacy issues facing data mining and the impact the technology has on both a personal and global perspective. It makes use of a wide resource of literature including books and journal articles.
Background of the study
Data mining (data analysis / knowledge discovery) finds its origin from KDD (Knowledge Discovery in Databases) where patterns and new knowledge were identified from data stored in large repositories (Naidu Paidi, 2012). However, with advancement in technology, the definition is no longer restricted to databases (Coenen, 2004) and as such, encompasses a wide variety of fields with the major concern being the identification of hidden information within the data. The field is fast evolving leading to adoption of modern approaches and methodologies being used to perform actual data mining which range from statistical analysis methods to approaches that utilize machine learning (Naidu Paidi, 2012).
Despite the indispensability of data mining in identifying patterns within the voluminous data produced in this current information age, a concern is raised on public policy with respect to security and protection of privacy. Kumar Singh, & Swaroop (2013) postulate that there is an increased significance in the recognition of the kind of data that can be mined, most of which contains a lot of personal information. They add further that there is need to analyze whether there will be a violation of privacy especially in cases where personal, sensitive data is published. Tavani (2016) reiterates this finding by asserting that special concerns for privacy are raised with the use of some technologies for data mining.
Historical Perspective
The process of analyzing data in order to reveal useful patterns hidden therein has come a long way from methods that applied Bayes theorem in the 1700s to statistical analysis (regression) in the 1800s. These methods were suitable at the time due to the nature of the data which was not complicated (Li, 2010). However, with continued development in computing technology, data sets have continued to grow in complexity requiring the deployment of complex methods for its analysis. As a result, with the information age, there has been a significant increase in the use of indirect and automatic data analysis methods (Li, 2010).
In order to appreciate the use of data mining as an analysis method in the current age, there is need to examine the history of data analysis from the 1960s. According to Naidu Paidi (2012), data analysis has developed in the following process with specific regard to development in databases:
The 1960s
Much of the analysis of data in this period was pegged on data collection. As such, the business question of interest at the time to be answered was “what were the total sales in the last period x years?”. At the time, IBM and CDC were the providers of the technology products. Though data collection was done, the method was retrospective and limited to static delivery of information.
The 1980s
During this period, data access was the main aspect focused on. Enabling technologies at the time included ODBC, relational databases and structured query language (SQL). The business question of interest at the time was ”what were the unit sales in period x in Liverpool?”. With there being more technology providers such as Oracle, Microsoft, IBM an evolution was noted with the method where apart from being retrospective in nature, there was dynamic delivery of data at record level.
The 1990s
This period saw the significant rise in the use of decision support and data warehousing. The period saw the use of enabling technologies such as warehouses for data, databases that were of a multi-dimensional nature as well as OLAP (On-line Analytic Processing). At the time, the business question to be answered was ”what were the unit sales in period y in Liverpool? Drill down to New Orleans”. Providers of products included Cognos, Arbor, Pilot, Microstrategy etc. with the method enabling the delivery of data dynamically at multiple levels. It was a retrospective method as well.
Current period
With the current age, there has been an emergence of data mining which not only being a prospective method, it also leads to proactive delivery of information. With data mining, businesses are able to answer the question ”what is likely to happen in unit sales for Liverpool in the next month? Why would this be?”. With providers including Pilot, Lockheed, SGI, IBM etc., the method is seen to make use of algorithms that are advanced in nature, complex and voluminous databases as well as multiprocessor computation.
From the short historical perspective, it is clear that data mining as a technology is not only helping businesses understand their business but as well, hidden patterns in data are aiding in the forecasting of future business performance.
Current Issues affecting data mining
Kumar Singh, & Swaroop (2013) postulate some of the major issues affecting data mining as:
Methodology of mining and user interaction where concern is on the capability to mine different knowledge kinds in the database, capacity to mine knowledge at multiple abstraction levels, background knowledge incorporation, data mining results expression and visualization, Noise and incomplete data handling, evaluation of patterns etc.
Scalability and Performance where there is an emphasis on data mining algorithms scalability and efficiency and methods of mining that are distributed, incremental and parallel in nature.
Data type diversity issues where there is focus on handling complex and relational data types as well as capacity to mine information from databases that are of a heterogeneous nature and information systems of a global nature such as web databases.
Social Impact and application related issues where there is an emphasis on the deployment of knowledge that is discovered, tools for data mining that are specific to the domain and giving response to queries that are intelligent.
Legislation Issues in data mining
Tavani (2016) highlights a main issue with the protection of the privacy of data that in that Public Personal Information (PPI) which is non confidential and is available in public is being mined. The author notes that Non Public Personal Information (NPI) such as financial or medical records has some protection of a legal nature and as such doesn’t undergo mining. Perhaps what leads to controversy about protection of privacy when undertaking data mining of such kind of data is that law makers don’t place as much importance on non confidential information.
As such, Tavani (2016) suggests the use of privacy enhancement tools (PETs) which help protect such personal data. PETs are useful in protecting the personal identity of individuals who are interacting on the web and as well protecting the privacy of communications sent over the web. However, Tavani (2016) adds that there are issues with the use of PETs in that:
Most users are unaware they exist and as a result keep sending out their personal information into the public domain making it viable for data mining. They should thus be taught on using such tools.
A challenge later develops when users agree to enter into agreement with websites that request for their information in that policies offered by PETs may not guarantee future use of the information they provide. This is commonly referred to as the principle of informed consent.
Finally, in cases where we have financially poor users, they may be willing to sell their information in exchange of money.
Examples
According to (goopta, Wheeler, & joshi, 2014), some of the popular data mining tools include:
Rapid Miner
Weka
R
Orange
Knime
NLTK
Global dynamics / impact (issues and trends)
According to (Naidu Paidi, 2012), some of impacts of data mining on a global aspect include:
In the fight against terrorism we find most of the advanced countries such as the U.S.A and other European countries implementing laws that favor the war against terrorism. As such agencies of intelligence are granted permission to create large databases that centralize information about their population easing surveillance.
Bio-informatics and disease cures where data mining is being applied in searching for genes in regard to curing diseases such as aids and cancer.
The Semantic Web and Web applications in general where we find data mining being used to whip the web into shape making it more organized. Some of the underlying technologies in heavy use include RDF (Resource Description Framework) used in resource description as well as FOAF and Orkut technology being heavily used in Facebook for tagging. All in all, data mining is helping the web become more organized.
In business data mining is being used in forecasting customer trends and analyzing customer behavior leading to development of business intelligence.
Personal Impact from a global perspective
Based on the trends and applications discussed by (Naidu Paidi, 2012) in the earlier section, there is also a personal impact of data mining on a global perspective in the following ways:
Researching on the web is made easier – by utilizing data mining algorithms in performing searches on the web e.g. via google, one is able to find what they are interested in much more easily compared to when the web was unstructured.
Business Intelligence resulting from use of data mining has an effect on personal lives especially when used by financial institutions to determine various patterns on how people spend their income. As a result, it may lead to credit worthy people being denied access to loans from this analysis.
Summary
The paper has undertaken a review of data mining by considering its historical perspective, current issues affecting the technology as well as its applications. Legal issues have as well been cited with the emphasis being on the protection of privacy of non confidential information. It is clear that data mining is a technology that is here to stay with the large amount of data being produced from the advancement in technology leading to more resource and information sharing. Despite the indispensability of the technology due to its relevant application, there is need to ensure protection of privacy of data that is mined in order to ensure that only what users have consented to is set out for data mining. It is also imperative for users of the internet to be aware of the privacy policy they enter into with various websites in order to ensure that they are not caught up in predicament in future.
References
Coenen, F. (2004). Data Mining: Past, Present and Future. The Knowledge Engineering Review, 00(0), 1-24.
Goopta, c., Wheeler, B., & joshi, m. (2014). Six of the Best Open Source Data Mining Tools - The New Stack. The New Stack. Retrieved 4 May 2017, from https://thenewstack.io/six-of-the-best-open-source-data-mining-tools/
Kumar Singh, D., & Swaroop, V. (2013). Data Security and Privacy in Data Mining: Research Issues & Preparation. International Journal Of Computer Trends And Technology, 4(2).
Li, Y. (2010). Data mining: concepts, background and methods of integrating uncertainty in data mining. http://www.ccsc.org/. Retrieved 4 May 2017, from http://www.ccsc.org/southcentral/E-Journal/2010/Papers/Yihao%20final%20paper%20CCSC%20for%20submission.pdf
Naidu Paidi, A. (2012). Data Mining: Future Trends and Applications. International Journal Of Modern Engineering Research, 2(6).
Tavani, H. (2016). Ethics and Technology: Controversies, Questions, and Strategies for Ethical (1st ed.). John Wiley & Sons.
Hire one of our experts to create a completely original paper even in 3 hours!