Before we dive into the technical details and the skills of automating data analysis, it's important to take some time to think about why somebody should bother. Learning a new skill is challenging. It takes reading, it involves frustration as you inevitably screw up. Why should you put up with that to learn how to automate data analysis? Is it worth the effort?
Motivation Through an Example
I'd like to begin to address this question through an example. My own experience. Hopefully by sharing my experience, the motivations, and the results you'll see some analogous benefits for you.
I started automating data analysis because I had a very common problem. I was collaborating with a test lab on a project studying drain water heat recovery devices*. As always happens when experimentation is involved, the laboratory testing was running behind schedule. This was creating a problem for me; as testing fell further and further behind the gap between me getting the data I needed to do my job and my deadline kept shrinking. Eventually I was down to a period of about two months to check the data to ensure that all tests were performed correctly, analyze every individual test to obtain the result from each, combine the individual results to form regressions, write a report, and follow the report through the editing process. While keeping on top of my other projects. That sounded stressful. I solved the problem by writing a computer program to perform all of the data analysis for me, leaving me with that same time period and only the report to take care of.
This was my first ever attempt creating an automated data analysis tool with Python so obviously it didn't go as smoothly as my newer, more refined scripts, but I saw incredible power in that experience. I had created a tool that could perform months worth of mind numbing, expensive data analysis in seconds (Assuming every goes well) and all it cost me was time that I would have spent waiting for data anyway.
Since then, these tools have provided me a huge advantage. I've now completed three different projects studying drain water heat recovery using those same scripts. A few hours of modification at the start of a project avoids weeks or months of manual data analysis. The cost savings on any given project have given me tremendous competitive advantage; because my data analysis is automated I can deliver more thorough, higher quality results than my competitors at lower cost.
Beyond that, they've freed up a significant amount of time in the office. Instead of spending my days at work doing that data analysis, I have more time available for building relationships with clients, brainstorming new ideas, writing proposals, mentoring junior staff, and so on. These are all activities that will doubtlessly give me an advantage in my career going forward, but they're also things that I find deeply rewarding on a human level.
What Other Benefits Are There?
That story provided some benefits to automating data analysis, but there are certainly others. Here's a list of possible advantages that I've thought of:
As previously mentioned, competitive advantage through higher efficiency. Automated data analysis means that I can analyze more tests than my competitors leading to higher quality results, with less time investment and therefore less cost. The proposal that delivers better results at lower cost is likely to win.
Also previously mentioned, higher efficiency opened time to focus on other things. In my case, it meant that I could grow my role in maintaining client relationships, bringing in new projects, and mentoring junior staff. This leads to new skills, and career advancement.
The time to focus on other things also gave me more control over my career, and how I spend my days. Instead of giving responsibility for business development to management, I had the opportunity to take charge and bring in projects that I wanted to do. It gave me a chance to bring in projects that I find interesting, working with people that I like. And that's invaluable.
It has also proven valuable for maintaining relationships with lab testers. Oftentimes the people who thrive in a lab and the people who thrive doing data analysis are completely different. Or maybe people like both, but are busy and would love to have some responsibility taken off their shoulders. Regardless of the reason, automating data analysis meant that I could take checking the quality of tests off the hands of my laboratory partners. They'd send the data to me in an e-mail, and I'd send back plots showing them how well/poorly the test ran within seconds. They no longer had to worry about it, and our relationships are stronger because of it.
They can lead to higher profit margins in a routine service business. Maybe you have a business based around performing a routine service for a routine price. A client sends you data, you find the results and send it back for a set fee. Automating data analysis lets you perform that work faster, increasing your hourly rate and opening up time to either accept more clients or do things other than work.
Using a script to perform the analysis, generate plots, and generate tables is also a great way to improve the quality of the results. Every time a human interacts with something there's potential for mistakes. Typos. Incorrect units. Inconsistent data set coloring between plots. When the data analysis is automated, all of these potential errors are contained in a single spot. This means that you only have to ensure that there are no mistakes once, in a single instance and the results will be correct in every instance. The days of digging through every plot and table searching for errors are gone.
Writing the methods section of a report, or remembering how a project was done years later suddenly gets much easier. Spreadsheets are rarely highly organized, and understanding how a project was done can get complicated. Code, on the other hand, proceeds in a logical linear fashion. The methods of a project can be read one line at a time, without ever having to scan around or push the "Trace Dependents" button.
Sharing work with others can also be much easier with computer code than with spreadsheets. Assuming that the other understands Python (And this is an important assumption), then sharing the code with thorough comments is typically a much easier way to share work than a cryptically organized spreadsheet.
And last, but not least, human beings generally find creativity satisfying. Since writing a computer program is much more creative than manually analyzing data, automating data analysis may be a more pleasant and rewarding way to work.
These are some benefits I've identified of automated data analysis. How could it improve your life?
This blog is an honest attempt to teach you valuable skills that you can use in your career, ranging from computer programming and Python packages, to scientific data analysis and automation, and all the way to simulation model calibration and validation. Since I provide this to you free of charge, I can only do it with support from the readers. Please consider supporting the blog through my Patreon account. I know that asking is obnoxious, but it really helps me keep this blog in operation so I can continue helping you.
* Those who want more information on drain water heat recovery can find a relatively non-nerdy article I wrote about it here: https://www.linkedin.com/pulse/drain-water-heat-recovery-why-you-should-care-how-much-peter-grant/