Now that the main data file has been split into individual data files, the next step in our tutorial automatically creating a performance map for heat pump water heats (HPWHs) in Python is to analyze the individual files. This will include the following steps:
Filtering the data in each file to include only the data needed in each test,
Calculating regressions representing the coefficient of performance (COP) of the HPWH in each test,
Checking the accuracy of the regression,
And storing the results.
If any of the terms above are foreign to you, see Performance Map Tutorial: Creating a Performance Map of Heat Pump Water Heaters for an introduction.
This post will walk through each of the above steps. If you’re following along with the companion data set then, after following these steps, you will have plotted data and regressions showing the COP of the HPWH as a function of water temperature at each of the three different ambient temperatures in the data set.
Package Import Statements
As with in the previous post, the first step is to import all of the required packages. The recommended packages were all described in An Introduction to Python Packages that are Useful for Automating Data Analysis. For this part of the project, the recommended packages are:
pandas: This package was already described in Performance Map Tutorial: Splitting the Data Set into Individual Files, and will not be described in detail here.
glob: This package was already described in Performance Map Tutorial: Splitting the Data Set into Individual Files, and will not be described in detail here.
os: This package was already described in Performance Map Tutorial: Splitting the Data Set into Individual Files, and will not be described in detail here.
NumPy: NumPy is a numerical package commonly used in scientific computing. It contains several functions that are useful for understanding large data sets. For this tutorial, the most important functions are polyfit and poly1d. Polyfit generates a regression that matches a data set with a user-specified order. Poly1d uses coefficients generated from polyfit and a user-specified condition to identify the value of the regression.
To import these packages and make use of their capabilities for this project, use the following lines of code:
import glob
import pandas as pd
import numpy as np
Reading the Data Files
The next step in the process is to read the data files. This will be done using a method that will be familiar to those who followed Performance Map Tutorial: Splitting the Data Set into Individual Files; We’ll use glob to create a list of all files in the folder, than use a for loop to run through each file in the list. It can be done using four steps:
First set the path. This path tells glob where the files are located, so it searches the correct folder,
Second use glob to create a list of all of the appropriate files in that folder.
Third create a for loop iterating through each of the files stored in the glob list,
Fourth add a line within the for loop reading the data files sequentially.
These steps can be accomplished with the following code. Note that this code assumes the files are located in a certain folder, and the code defining the path will need to be updated to match the location of your data files.
Path = r'C:\Users\JSmith\Desktop\AutomatedDataAnalysisWithPython\Tutorial-HPWHPerformanceMap\DataSet'
Filenames = glob.glob(Path + '/*.csv') #Creates a list of all filenames in the "path" folder
for Filename in Filenames:
Data = pd.read_csv(Filename)
Note that the rest of the steps in this process will be contained within the for loop, and will be indented accordingly.
Filtering the Data Set
The split data files include data from the conditioning period at the start of each test, as well as extra data at the end of the final test. This data is not useful for analyzing data, and will actually lead to errors in the regressions. It is still in the files because the split data files script did not include code to remove it. To correctly analyze the data, we must remove the extraneous data now.
If you open PerformanceMap_HPWH_55.csv and look at the initial data you can see the conditioning period quite clearly. At first, the flow rate of water is 5 gal/min, and the water temperatures in the tank are changing dramatically. This means that cold water is being added to the tank, pushing the hot water out, to prepare for the test. When all of the 125 deg F water is pushed out of the tank and replaced with 72 deg F water, the tank will be ready to begin the test. This state can be seen when the water flow rate reduces to 0 gal/min. The second issue is the ambient air temperature. At the start of the data file, the ambient temperature is close to 72 deg F when the test is supposed to occur at 55 deg F. Continuing to peruse the data files shows that the ambient temperature gradually decreases after the tank reaches the desired temperature. It gradually decreases from roughly 72 deg F to roughly 55 deg F, as specified in the test.
Fortunately, in this data set the imagined tester creating the data did their job correctly. We can see from the P_Elec (W) column that the HPWH started drawing electricity only after the tank temperatures and ambient temperature reached the desired starting conditions. This both gives us some initial confidence in the data, and provide a clear filtering point. Since the only relevant data for this test is when the HPWH is drawing electricity, we can filter our data set to only include that data.
This can be done with a single line of code. Pandas includes capabilities to reduce a data frame to include only the data that meets a boolean condition. The data frame must then be saved to a new data frame, as pandas won’t overwrite the data unless instructed to do so. This can be achieved using the following line of code:
Data = Data[Data['P_Elec (W)'] > 0]
Note that this filter works very well for our example data set, but would work poorly with real experimental data. No measurement is ever 100% precise, meaning that some electricity readings will be greater than 0 W even when no electricity is being consumed. When using real data, the filter should be set to accept data when electricity flow is greater than some larger number, like 50 or 100 W.
Filtering the data like this will result in a data frame with an index that starts at a value greater than zero. This makes it hard to manipulate the data later, and can be corrected by resetting the index. The index can be reset with the following two lines of code.
Data = Data.reset_index()
del Data['index']
Analyzing Each Data Set
Analyzing the data set requires several different calculations. As it stands, the data set provides the electricity consumption and temperature of the tank at eight different locations. However, to calculate the COP of the heat pump we need the change in average temperature of the tank and the amount of electricity consumed during each time stamp. They almost must have the same units. These objectives can be achieved with the following steps:
First calculate the average temperature of the tank. Since each temperature in the data set represents 1/8th of the tank, the average can be calculated by summing the temperatures and dividing by 8,
Second create a new column representing the average temperature of the tank at the previous timestep. This makes it easy to calculate the change in average tank temperature between timesteps. Do this by shifting the average tank temperature data by one row and assigning it to a new column,
Third enter a value in the first row of the new column. This is necessary because pandas didn’t have a value to fill that cell with when performing the shift. Since we know the test starts with the tank at 72 deg F, we can use the .loc function to fill that cell with 72.0 deg F,
Fourth calculate the change in stored energy in the tank between two timesteps. This is done with the equation (Change_Energy) = (Mass_Water) * (SpecificHeat) * (Change_Temperature). Since it’s a HPWH with an 80 gal storage tank, the mass is 80 gal * 8.3176 lb/gal. The specific heat of water in IP units is 0.998 Btu/(lb-F). The change in temperature can be calculated using the two average tank temperature columns.
The electricity consumed during a timestep is equal to the rate of electricity consumption times the duration of the timestep. Our data points are 10 seconds apart in this data set. To calculate the COP we also must convert the electricity consumption from W to Btu, matching the change in stored energy in the tank. This is done with the conversion 1 W = 3.412142 Btu/hr and then converting from hours to seconds,
Finally, the COP is equal to the change in stored energy divided by the electricity used to cause that change in energy.
All of this can be accomplished with the following code. Keep in mind that this all occurs within the for loop and must be indented accordingly.
Data['Average Tank Temperature (deg F)'] = (1./8.) * (Data['T1 (deg F)'] + Data['T2 (deg F)'] + Data['T3 (deg F)'] + Data['T4 (deg F)'] + Data['T5 (deg F)'] + Data['T6 (deg F)'] + Data['T7 (deg F)'] + Data['T8 (deg F)'])
Data['Previous Average Tank Temperature (deg F)'] = Data['Average Tank Temperature (deg F)'].shift(periods = 1)
Data.loc[0, 'Previous Average Tank Temperature (deg F)'] = 72.0
Data['Change in Stored Energy (Btu)'] = (80 * 8.3176) * (0.998) * (Data['Average Tank Temperature (deg F)'] - Data['Previous Average Tank Temperature (deg F)'])
Data['P_Elec (Btu/10s)'] = Data['P_Elec (W)'] * (3.412142/60/60) * 10
Data['COP (-)'] = Data['Change in Stored Energy (Btu)'] / Data['P_Elec (Btu/10s)']
Generating a Regression for the Individual Data Files
The entire point of this process is creating regressions of the data. We will use the data from each individual test to create regressions showing the COP of the heat pump as a function of water temperature at a specific ambient temperature. Since there are three tests with three different ambient temperatures, this gives us the ability to create a rough performance map showing the COP of the HPWH as a function of both temperatures. To do this, we need to generate a regression showing the COP of the heat pump as a function of temperature in each test. This can be done with the following line of code calling the NumPy function polyfit.
Coefficients = np.polyfit(Data['Average Tank Temperature (deg F)'], Data['COP'], 2)
That code calls Numpy.polyfit and tells it to store the regression coefficients in the variable Regression. It uses the average tank temperature calculated above as the x data, and the COP calculated above as the y data. Finally, the “2” at the end tells polyfit to make it a 2nd order equation.
It’s important to check and make sure that this process was performed correctly. This can be done in two ways.
First, use the terminal window to examine the coefficients of Coefficients. Depending on the terminal used you may see different formatting, but the results should be: array([ 1.39999998e-04, -1.34000000e-01, 1.65000000e+01]).
Second, use the NumPy.poly1d function to test the results of the regression. This can be done by first converting the coefficients into a regression, and second evaluating the regression to determine the value of the regression at that value. Use values from the dataframe to test and ensure that the values from the regression match the values calculated in the data set. This can be done with the following code.
Regression = np.poly1d(Coefficients)
COP_72 = Regression(72.0)
COP_140 = Regression(140.0)
If you’re following along with the companion data set, COP_72 should be 9.3277 and COP_140 should be 2.2339.
Saving Results
The file step in the process is storing the results. In this case, we’ll store both the dataframes with the newly calculated values and the coefficients from the regressions. To do this we’ll need to write code that 1) Creates a new folder to store the values, 2) Provides filenames stating what’s in each file, and 3) Saves the files.
Creating a new folder was covered in Performance Map Tutorial: Splitting the Data Set into Individual Files, so it should be familiar. In this case, we’ll create a new folder name by taking the current path and adding a new folder called “Analyzed” in the structure. Since we already have the existing folder in a variable named Path, we can simply add “\Analyzed” to the end of that. Then we use the same code as before to see if that folder exists, and create it if it does not. The following lines of code accomplish that objective.
Folder = Path + '\Analyzed'
if not os.path.exists(Folder):
os.makedirs(Folder)
The next step is providing names for the individual tests themselves. We want the dataframes to be stored in the new folder, and have “_Analyzed” at the end of the filename to distinguish them from the raw data files. This can be done by combining the Folder variable, which puts it in the new folder, with a section from the Filename variable from our for loop, and an ending of “_Analyzed.csv”. For the coefficients, we want to save them in the correct folder with a name of “Coefficients_” plus the temperature at the end of the filename, and .csv at the end. These filenames can be created using the following two lines of code.
Filename_Test = Folder + '\\' + Filename[-26:-4] + '_Analyzed.csv'
Filename_Coefficients = Folder + '\Coefficients_' + Filename[-6:]
Once those filenames are created, the last step is to actually save them. This can be using two lines of code calling the pandas dataframe function to_csv, and the NumPy array function tofile. Make sure to specify that you don’t want to save the index of the dataframe, and that you want the separator for the coefficients to be a comma. This makes it easier to call the data later. The files can be saved with the following two lines of code.
Data.to_csv(Filename_Test, index = False)
Coefficients.tofile(Filename_Coefficients, sep = ',')
Next Steps
In this post we’ve learned how to automatically perform calculations on several data files, create regressions for each file, and save the results. We did it with the example of three data files, which saves some time and tedium. Imagine the potential if using these methods on tests projects that contained hundreds of data files, instead of only three.
In the next module, we will discuss ways to visually ensure that the data and data analysis methods were performed correctly. This will include plotting the data to see what’s contained in each file and adding the regression to the plot so you can ensure that it closely fits the data set.