-
Notifications
You must be signed in to change notification settings - Fork 1
nocode.py
The nocode.py
file contains functionalities related to data processing and machine learning in the Flask application. It provides methods for handling file uploads, data cleaning, model training, and evaluation.
The file begins by importing necessary modules and libraries. It includes modules like os.path
, pickle
, pandas
, numpy
, seaborn
, and matplotlib.pyplot
for file operations, data manipulation, and visualization.
The file defines a few helper functions and variables to assist in data processing and model evaluation:
-
allowed_ext
function: This function returns a list of allowed file extensions for different dataset types (e.g., CSV, Excel). -
static_dir
,plot_img_path
, andmodel_file_path
: These variables store the paths for saving plot images and model files.
The available_models
function returns a list of available machine learning models. It retrieves the names of the models from the model_dict
dictionary defined in mllib.py
.
The get_plot_image
function generates a plot of actual vs. predicted values and saves it as an image file. It takes a dataframe and a model name as input, and it uses seaborn
and matplotlib.pyplot
for plotting.
The save_model_to_file
function saves the trained model to a file. It takes a model instance and an optional file type as input. The file type can be either 'h5' for Keras models or 'pkl' for other models. The saved model file is stored in the appropriate directory based on the current date and time.
The Nocode
class represents the uploaded dataset and provides methods for data processing and model training.
The Nocode
class constructor takes a filename and a filepath as parameters. It stores these values in instance variables for later use.
The read_file
method reads the uploaded dataset file based on the file extension. It supports CSV and Excel file formats and uses pandas
to read the data. If a header is present in the file, it is considered; otherwise, the data is read without a header. The data is stored in the class's dataframe
variable, and it is converted to lowercase for consistency.
The drop_cols
method allows dropping specific columns from the dataset. It takes a list of column names as input and removes the columns from the dataframe
.
The reset_index
method resets the index of the dataframe
. If a column name is provided, the dataframe
is set to use that column as the index; otherwise, the index is reset to the default.
The cleaning_data
method handles missing values in the dataframe
. It replaces any '?' values with NaN
and performs column-wise handling of missing values. Columns with more than 15% missing values are dropped, and columns with less than or equal to 15% missing values are filled with the mode of the column.
The split_data
method splits the data into training and testing sets based on the provided target variable and a randomization flag. It uses the ML_lib
class from mllib.py
to perform data preprocessing and split the data. The processed data is then returned.
The predict_by_model
method trains a machine learning model and makes predictions on the testing set. The specific model is determined based on the model name provided. If the model is an LSTM model, a neural network architecture is constructed using keras
and trained. For other models, the model is trained using the fit
method. Predictions are made on the testing set, and the results are returned along with the trained model.
The error_metric
method calculates evaluation metrics (MAE, MSE, RMSE) based on the predicted values and the actual values.
Overall, the nocode.py
file provides functionalities for reading, processing, and cleaning datasets, as well as training machine learning models and evaluating their performance. These functionalities are utilized by the Flask application for data analysis and prediction.