In this post, I jot down some useful thing need to notice when trying to test a model with Python and Keras for newbies like me.
First, segment data with python command
Second, organization data to fit the built-in model in Keras
Third, experience with training model with preprocessing data with MinMaxScaler.
1. Segment Data with Python
In python to import data, the easy and regular way is employing `read_csv` function from pandas package.
File format often in .txt, .csv, or .xls but prefer to save and load .csv file because I can explore, verify, and manipulate on Excel software confidently.
Let's start by read a csv file name "InvertPhaseLong2.csv" as below
from pandas import read_csv
dataset = read_csv('drive/My Drive/InvertPhaseLong2.csv', header=0)
header= 0 mean the file contains a header in the first rows, in case we do not care header then let set it as
header=None
The file dataset is in the format of pandas as DataFrame so in this form, it permits us to make some useful
checking function like .header to read the first serveral lines of data table, or .describe() give a
basic statistics of data including the number of data, mean, std, min, max etc.
To segment data first, we got there values from the dataset by command below:
data = dataset.values
data now in ndarray format and we cand start segment data in to many sections with window length as we wish.
Below is an example of function to segment data in with many part and store in a list
def load_data(filename):
dataset = read_csv(filename, header=0);
data = dataset.values;
return data
def segment_data(signal, distance, overlap):
i = 1;
s = list()
while i < len(signal):
a = signal[int(i):int(i+distance)];
#a = np.split(a,int(distance-1)
s.append(a)
i = i + distance*overlap
return s
data = load_data('drive/My Drive/InvertPhaseLong2.csv')
ppg = segment_data(data[:,0],70,0.5)
ppg1 = segment_data(data[:,2],70,0.5)
To display the values of some segment use matplotlib as below
from matplotlib import pyplot
pyplot.subplot(121)
pyplot.plot(ppg[1])
pyplot.subplot(122)
pyplot.plot(ppg1[302])
pyplot.show()
2. Fit the data to model
From the raw, the value is very big so it will be over floating if we use raw data to apply to machiner learning model
sklearn provide several functions to re-organize data like Standardise() or MinMaxScaler()
In this example, I use MinMaxScaler() to scale data into range (0,1).
Because two datasets for In-Phase and Invert-Phase, we will combine to a unique training dataset and lable of it.
# label for first group of Data y1 = [1]*ppg.shape[0]; y2 = [0]*ppg1.shape[0] y1 = np.asarray(y1) y2 = np.asarray(y2) y1.shape, y2.shape #ydf1 = DataFrame(y1) #ydf2 = DataFrame(y2) # Concatenate data X = np.concatenate((ppg, ppg1)) y = np.concatenate((y1, y2))
Here, concatenate() function permits us to combine data and y1, y2 are output labels with 1 for in-phase and 0 for invert-phase
np.asarray() is a function for converting data from list to ndarray.
_________________________________________________________________
The model for classifying will be used 1D-CNN model with Keras as below summary
Layer (type) Output Shape Param # ================================================================= conv1d_122 (Conv1D) (None, 68, 64) 256 _________________________________________________________________ conv1d_123 (Conv1D) (None, 66, 64) 12352 _________________________________________________________________ dropout_61 (Dropout) (None, 66, 64) 0 _________________________________________________________________ max_pooling1d_61 (MaxPooling (None, 33, 64) 0 _________________________________________________________________ flatten_61 (Flatten) (None, 2112) 0 _________________________________________________________________ dense_122 (Dense) (None, 100) 211300 _________________________________________________________________ dense_123 (Dense) (None, 2) 202 ================================================================= Total params: 224,110 Trainable params: 224,110 Non-trainable params: 0
_________________________________________________________________
The input data in this case have the shape (number_of_segment, size_of_segment, number_of_feature)
To know the number of segment let check by shape function X, y
X.shape, y.shape
(1178, 70), (2278,)
will show the size of (number_of_segment, size_of_segment) without the number_of_feature Nf.
In our case, we use single channel so Nf =1; in case use apply for multi-channel then use will repalce with suitable type of data.
With this shape we need to reshape the data input to match with the model.
Two function will be used are reshape() from numpy and to_categorical() from Keras as shown below:
from sklearn.model_selection import train_test_split from keras.utils import to_categorical # split the dataset into train and test sets X_train, X_test, y_train, y_test = train_test_split(Xscaled, y, test_size=0.33, random_state=1) X_test, X_val, y_test, y_val = train_test_split(X_test, y_test, test_size=0.5, random_state=1) X_train = X_train.reshape((X_train.shape[0],X_train.shape[1],1)) X_test = X_test.reshape((X_test.shape[0],X_test.shape[1],1)) X_val = X_val.reshape((X_val.shape[0],X_val.shape[1],1)) y_train = to_categorical(y_train, num_classes=2) y_test = to_categorical(y_test, num_classes= 2) y_val = to_categorical(y_val, num_classes= 2)
Here to train model we also split data into {train, validation and test} set for training by train_test_split from sklearn
to_categorical will mapping the outputs [0] and [1] to vector [1 0] and [0 1] respectively.
3. Scaling data with right MinMaxScaler()
[a00 a01 a02] and [a10 a11 a12] respectively.
Scaling will be group between [(a00,a10), (a01, a11), (a02, a12)]
In our case, we want to scale data in each segment rather segment by segment we can do by
transpose-MinMaxScaler-transpose as the code below
from sklearn.preprocessing import MinMaxScaler # define min max scaler def scaling(signal): scaler = MinMaxScaler() s_trans = signal.transpose() Xscaled = scaler.fit_transform(s_trans) Xscaled = Xscaled.transpose() return Xscaled Xscaled = scaling(X)
Full code can be refer from my Github:
https://github.com/LongNguyen1984/DeepLearning/blob/master/DataProcessing.ipynb
Comments
Post a Comment