Before proceeding to the introduction, a brief history of data science is described here that will enable you to understand how data science evolved and from where it started.
Historical background of Data Science
Data originates from the Latin word, “datum,” which means a “something given” The expression “data” has been utilized since 1500s, however, the modern practice began during the 1940s and 1950s. In fact, the moderate development of Data Science is slow.
Read also: A Beginner’s Guide to Data Science
1962 – John Wilder Tukey
John Wilder Tukey expounded on a move in the world of statistics in 1962. He is alluding to the converging of statistics and PCs, when factual outcomes introduced in hours, as opposed to hand-rolled work that may take a long time of several days or a month.
1974 – Peter Naur
In 1974, Peter Naur wrote the Concise Survey of Computer Methods, utilizing the expression “Data Science,” over and over. Naur exhibited his very own tangled meaning of the new idea:
“The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”
1977 – John Wilder Tukey
John Wilder Tukey composed a second paper titled “Exploratory Data Analysis” in Jan-1977 contending the significance of utilizing data in choosing “which” theories to test, and that corroborative data examination and exploratory data investigation are closely associated.
1989 – Gregory Piatetsky – Shapiro
In 1989, Gregory Piatetsky – Shapiro arranged a Knowledge Discovery in Databases workshop.
1994 – Business Week
Business Week ran the main story, Database Marketing in 1994. The surge of data was, best case scenario, confounding to organization administrators, who were attempting to choose how to manage so much separated data.
1999 – Jacob Zahavi
In 1999, Jacob Zahavi brought up the requirement for new devices to deal with the enormous measures of data accessible to organizations, in “Mining Data for Nuggets of Knowledge”. He composed…
“Scalability is a huge issue in data mining… Conventional statistical methods work well with small data sets. Today’s databases, however, can involve millions of rows and scores of columns of data… Another technical challenge is developing models that can do a better job analyzing data, detecting non-linear relationships and interaction between elements… Special data mining tools may have to be developed to address web-site decisions.”
2001 – William S. Cleveland
William S. Cleveland of Bell Labs published a paper, “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” in 2001. It depicted how to expand the specialized understanding and scope of information investigators and determined six zones of concentrate for college divisions. It advanced creating explicit assets for research in every one of the six regions. His arrangement likewise applies to government and corporate research.
The International Council for Science: Committee on Data for Science and Technology started distributing the Data Science Journal in 2001, concentrated on issues like the portrayal of data systems, their production on the web, applications and legitimate issues.
2006 – Hadoop 0.1.0
Hadoop 0.1.0, an open-source and non-relational database was released in 2006 which depended on Nutch, another open-source database.
2008 – “Title Data Scientist”
In 2008, the title, “Data Scientist” turned into a trendy expression and in the long run a piece of the language. Jeff Hammerbacher and DJ Patil of Facebook and LinkedIn are given acknowledgment for starting its utilization as a trendy expression. Johan Oskarsson was reintroduced the term NoSQL in 2009 when he sorted out a dialog on “open-source, non-relational databases”.
2011 -2012 Expansion of Data Scientists by 15000%
Data Science had already proved beneficial and had turned into a piece of corporate culture as jobs postings for Data Scientists expanded by 15,000% in between 2011 – 2012. There was an expansion in courses and gatherings dedicated explicitly to Data Science and Big Data.
The concept of Data Lakes was promoted by James Dixon in 2011. According to his concept, at the point of entry, Data Warehouse pre-categorizes the data, whereas, Data Lake accepts the data by utilizing a non-relational Database i.e. NoSQL without categorizing the data by stores it, which save time and energy as well.
According to the statistics shared by the IBM in 2013, about 90% of the data had been produced in past two years. Furthermore, Bloomberg’s Jack Clerk pointed out that 2015 is a revolutionary year for Artificial Intelligence as about 2700 software projects using Artificial Intelligence.
In the last ten years, Data Science has grown up to embrace businesses and organizations in all over the world. It is frequently used by geneticists, engineers, astronomers and in government sectors.
On the other hand, data science has become a vital piece of academic research and business growth. Generally, it includes robotics, speech recognition, research engines and machine translation. The area of data science also expended to biological sciences, medical informatics, health care, social sciences and humanities.
Data science reduces the workload on data scientists, as they have no need to waste their precious time in complex algorithms.