Skip to content

nocode.py

Vishal Tyagi edited this page Jul 9, 2023 · 1 revision

Explaination - nocode.py

The nocode.py file contains functionalities related to data processing and machine learning in the Flask application. It provides methods for handling file uploads, data cleaning, model training, and evaluation.

Importing Required Modules and Libraries

The file begins by importing necessary modules and libraries. It includes modules like os.path, pickle, pandas, numpy, seaborn, and matplotlib.pyplot for file operations, data manipulation, and visualization.

Helper Functions and Variables

The file defines a few helper functions and variables to assist in data processing and model evaluation:

  • allowed_ext function: This function returns a list of allowed file extensions for different dataset types (e.g., CSV, Excel).
  • static_dir, plot_img_path, and model_file_path: These variables store the paths for saving plot images and model files.

Available Models and Plotting

The available_models function returns a list of available machine learning models. It retrieves the names of the models from the model_dict dictionary defined in mllib.py.

The get_plot_image function generates a plot of actual vs. predicted values and saves it as an image file. It takes a dataframe and a model name as input, and it uses seaborn and matplotlib.pyplot for plotting.

The save_model_to_file function saves the trained model to a file. It takes a model instance and an optional file type as input. The file type can be either 'h5' for Keras models or 'pkl' for other models. The saved model file is stored in the appropriate directory based on the current date and time.

Nocode Class

The Nocode class represents the uploaded dataset and provides methods for data processing and model training.

Initialization

The Nocode class constructor takes a filename and a filepath as parameters. It stores these values in instance variables for later use.

Reading and Processing Data

The read_file method reads the uploaded dataset file based on the file extension. It supports CSV and Excel file formats and uses pandas to read the data. If a header is present in the file, it is considered; otherwise, the data is read without a header. The data is stored in the class's dataframe variable, and it is converted to lowercase for consistency.

The drop_cols method allows dropping specific columns from the dataset. It takes a list of column names as input and removes the columns from the dataframe.

The reset_index method resets the index of the dataframe. If a column name is provided, the dataframe is set to use that column as the index; otherwise, the index is reset to the default.

The cleaning_data method handles missing values in the dataframe. It replaces any '?' values with NaN and performs column-wise handling of missing values. Columns with more than 15% missing values are dropped, and columns with less than or equal to 15% missing values are filled with the mode of the column.

Model Training and Evaluation

The split_data method splits the data into training and testing sets based on the provided target variable and a randomization flag. It uses the ML_lib class from mllib.py to perform data preprocessing and split the data. The processed data is then returned.

The predict_by_model method trains a machine learning model and makes predictions on the testing set. The specific model is determined based on the model name provided. If the model is an LSTM model, a neural network architecture is constructed using keras and trained. For other models, the model is trained using the fit method. Predictions are made on the testing set, and the results are returned along with the trained model.

The error_metric method calculates evaluation metrics (MAE, MSE, RMSE) based on the predicted values and the actual values.

Overall, the nocode.py file provides functionalities for reading, processing, and cleaning datasets, as well as training machine learning models and evaluating their performance. These functionalities are utilized by the Flask application for data analysis and prediction.