The planned neural network (NN) for the price prediction project is a feedforward neural network with a regression-focused architecture, designed to map input features directly to a continuous target variable, which is the price. The rationale behind using this type of NN includes its flexibility in capturing non-linear relationships and interactions between features, which are often present in complex data sets such as those used in price forecasting.
An overview of the architecture and the reasoning behind its design:
Input Layer:
The first layer accepts the raw input features. The number of neurons in this layer corresponds to the number of input features after preprocessing, like one-hot encoding and scaling.
Hidden Layers:
The network includes multiple hidden layers with varying numbers of neurons. These layers enable the network to learn progressively more complex representations of the input data. The exact number of hidden layers and neurons will be determined through experimentation and validation, aiming for a balance between model complexity and overfitting risks.
Dropout Layers:
Included intermittently between hidden layers, dropout layers help in preventing overfitting by randomly setting a fraction of the input units to zero during training. This encourages the network to learn redundant representations and ensures that no single feature or interaction overly influences the output.
Output Layer:
A single neuron without an activation function constitutes the output layer, which provides the final price prediction. Since this is a regression task, no activation function is used here; we want the network to be able to predict a full range of continuous price values.
Activation Functions:
For hidden layers, rectified linear unit (ReLU) activation functions are typically used due to their efficiency and effectiveness. However, variations like Leaky ReLU or Exponential Linear Unit (ELU) might be considered to counteract the 'dying ReLU' problem if it arises.
Loss Function:
The mean squared error (MSE) serves as the loss function since it penalizes larger errors more heavily than smaller ones, which is often desirable in regression tasks.
Optimizer:
The Adam optimizer is chosen for its adaptive learning rate capabilities, helping the network to converge more quickly and effectively during training.
This neural network architecture is crafted to handle the intricacies of price prediction by learning from historical data and identifying the underlying patterns that influence prices.
For supervised learning models, labeled data is essential. It includes inputs along with corresponding correct outputs, which allows the model to learn from examples. The data must be split into two separate sets:
1. Training Set:
This is the bulk of the data used to train the model. It teaches the model to recognize patterns and make predictions. Typically, about 70-80% of the labeled data is allocated to the training set.
2. Testing Set:
This smaller subset, usually 20-30% of the labeled data, is used to evaluate the performance of the model. It is crucial that this data is not used during training so that it serves as a reliable indicator of the model's ability to generalize to unseen data.
The training and testing sets must be disjoint to ensure an unbiased evaluation of the model’s performance. If there were an overlap, the model might simply memorize specific examples rather than learning the underlying pattern, which could lead to overfitting. Overfitting occurs when a model performs well on the training data but poorly on new, unseen data because it has not learned generalizable patterns
Here is a small image representing the concept of splitting the data into training and testing sets, showing that they are distinct
When creating the training and testing sets, random sampling is often used to avoid any potential bias. It's important to maintain the distribution of classes (in classification tasks) or the range of values (in regression tasks) in both the training and testing sets, a process known as stratification.
You can find a sample of the data that will be used for this project HERE, which reflects the preprocessing and structure suitable for feeding into the neural network model.
Discussing and Visualizing the Results:
The evaluation of the neural network through Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) metrics revealed a moderate prediction error, indicating room for further model optimization. Specifically, an MAE of approximately 65.98 and an RMSE of around 85.61 suggest that the model, while consistent in its predictions within the central range of values, may not be adequately capturing the complex patterns influencing prices, especially at the extreme ends of the scale.
The scatter plot visualizing actual versus predicted values demonstrates that the predictions are closely clustered around the line of perfect agreement but deviate for higher and lower values. This clustering pattern highlights a potential underfitting issue, where the model might be too simplistic to grasp the nuances in the data.
Given that the model has a tendency to under-predict in many different situations, the line plot that compares actual values to predicted values provides a striking illustration of the disparity that exists between the two. This highlights the necessity of either increasing the complexity of the model or enriching the features in order to capture a more accurate representation of the variable that is being attempted to be captured.
During the training phase, the loss metrics did not indicate significant overfitting, as evidenced by the validation loss mirroring the training loss. However, the persistent plateau in loss values underscores the need for a strategic review of the model architecture or the feature set.
Neural Network Architecture:
The employed neural network architecture for the regression task consists of several key components, which include:
An Input Layer designed to match the dimensionality of the feature set post-preprocessing.
A series of Hidden Layers with ReLU activation functions to capture non-linearities, interspersed with dropout layers for regularization.
A single-unit Output Layer with a linear activation function suitable for continuous variable prediction.
The weight matrices connecting these layers are dimensioned according to the number of neurons, with their values representing the strength of connections and learned during training. For instance, if a hidden layer has 100 neurons and it follows an input layer with 10 features, then the connecting weight matrix would be of the dimension 10x100.
Disjoint Training and Testing Sets:
The principle of using disjoint datasets for training and testing is paramount to avoid overfitting and ensure that the model's performance is a true reflection of its predictive capabilities. By evaluating the model against a testing set that it has not been exposed to during the training phase, confidence in the model's ability to generalize to unseen data is established.
Neural Network Architecture:
The employed neural network architecture for the regression task consists of several key components, which include:
An Input Layer designed to match the dimensionality of the feature set post-preprocessing.
A series of Hidden Layers with ReLU activation functions to capture non-linearities, interspersed with dropout layers for regularization.
A single-unit Output Layer with a linear activation function suitable for continuous variable prediction.
The weight matrices connecting these layers are dimensioned according to the number of neurons, with their values representing the strength of connections and learned during training. For instance, if a hidden layer has 100 neurons and it follows an input layer with 10 features, then the connecting weight matrix would be of the dimension 10x100.
Disjoint Training and Testing Sets:
The principle of using disjoint datasets for training and testing is paramount to avoid overfitting and ensure that the model's performance is a true reflection of its predictive capabilities. By evaluating the model against a testing set that it has not been exposed to during the training phase, confidence in the model's ability to generalize to unseen data is established.
Below is a conceptual representation of the neural network architecture tailored to this regression project. In practice, the model may incorporate a more intricate structure with additional layers, varied activation functions, and advanced regularization techniques to enhance its predictive accuracy.
CONCLUSION:
Through this assignment, I gained valuable understanding of the intricate dynamics of price prediction by exploring various neural network architectures for regression tasks. Striking a balance between the model's complexity and the engineering of its features proved to be a crucial factor in attaining success. The utilization of regularization techniques, such as dropout layers, played a vital role in preventing overfitting and ensuring the model's ability to generalize effectively beyond the training data. Metrics such as mean absolute error (MAE) and root mean square error (RMSE) were crucial in assessing the effectiveness of iterative enhancements. These metrics revealed the intricate correlation between the model's structure and the data it receives. The assignment showcased the effectiveness of meticulous data preparation and model tuning in harnessing the potential of neural networks for predictive analysis. It revealed that neural networks can provide a reliable approach for solving regression problems in data-driven fields.