Data Analysis -- Why care about it?

Data Analysis — why should you care about it?

Post author:Dr. Mario Schneider
Post published:24. January 2020

First of all, what does data analysis actually mean? As often, there is no black and white definition of this term but from my point-of-view John W. Tukey, a famous statistician and data analyst, had a good one:

Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.
John W. Tukey

It is not the only “definition” around but it is the definition that I personally share. From this definition it becomes clear that data analysis is not just analyzing data after it was gathered but also includes planning steps such as design of experiments. Nevertheless, let us do one step at a time. The main drive for scientists to do data analysis is (or at least should be) to guide them answering their scientific question. Data analysis is not an end in itself. It rather contains tools that should be applied carefully. After a scientific hypothesis is properly formulated, scientists need to define their target quantities (which help them to prove/disprove the hypothesis) and select the type of experiments / measurement methods. This includes looking carefully at factors which could potentially influence the target quantities. For instance, changing the temperature of a dye solution could influence its fluorescence properties drastically and lead to false results if this temperature dependency would not be taken into account when measuring at varying temperatures. Once the data is gathered, that is where (in practice) a data analysis specialist is consulted. This is often unlucky since he should have been included from the beginning and in supporting the design of experiments. They can for instance estimate the number of trials required to obtain certain confidence. Nevertheless, once the data is gathered, it is wise to start with explorative data analysis, which in simple terms means playing around with the data and get to know it better. The outcome of this playing around are often descriptive measures such as the mean, median, etc. of the dataset. Oftentimes these measures are summarized in one or more appropriate graphs such as scatter plots or box-plots. Sometimes the data is pre-processed (e.g. smoothed for visualizing trends). Finally, the data is described by an appropriate model (that needs to be found) and whose significance needs to be proven. ANOVA or Regression are typical tools of data modelling. Finally, the results need to be interpreted in the light of the original hypothesis. Probably another iteration with different settings might be necessary to come to a final decision/conclusion. Finally, the findings should be reported in a comprehensive way.

Tags: Data analysis, Decision, Engineering, Medicine, Science, Statistics, Tukey

Dr. Mario Schneider

As an analytical chemist and certified AI Manager with a PhD in biophysics, Mario brings a unique blend of scientific rigor and strategic thinking to your business. His dedicated expertise in chemometrics and scientific data analysis allows him to turn your most complex data into clear, valuable insights. He is a published author on data analysis for scientists and a master of tools like MATLAB, R, Excel, and Python. Mario doesn't just process data; he leverages cutting-edge machine learning algorithms to uncover hidden opportunities, optimize processes, and drive innovation.

Data Analysis — why should you care about it?

Dr. Mario Schneider

Leave a Reply Cancel reply

Data analysis for scientists,
engineers and practitioners

Dr. Mario Schneider

You Might Also Like

Data Analysis in Natural Sciences: Five Common Challenges and Ways to Deal with Them

Introduction to Data Analysis for Natural Scientists

Outliers — Friends or Foes?

Leave a Reply Cancel reply