Skip to content

Commit

Permalink
Added Machine Learning Models section pt2
Browse files Browse the repository at this point in the history
  • Loading branch information
Raquel James committed Sep 20, 2024
1 parent 846f494 commit 4a07dc6
Showing 1 changed file with 18 additions and 6 deletions.
24 changes: 18 additions & 6 deletions Data Science/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,27 +26,39 @@ Data Scientists build predictive models using machine learning algorithms. This

### Machine Learning Models

There are two main types of models used for analyzing patterns and predicting behavior in data: Supervised learning and Unsupervised learning models. The supervised models contain data that come with associated labels for identification. Unsupervised models, on the other hand, do not have labels and instead rely on deep learning methods that read patterns from groups of data and distinguish different elements and commonalities accordingly. Through the methods of regression and classification, behaviors can be extracted from training datasets and tested on smaller datasets to predict a previously unknown value.
There are two main types of models used for analyzing patterns and predicting behavior in data: Supervised learning and Unsupervised learning models.

The supervised models contain data that come with associated labels for identification. Unsupervised models, on the other hand, do not have labels and instead rely on deep learning methods that read patterns from groups of data and distinguish different elements and commonalities accordingly.

Through the methods of regression and classification, behaviors can be extracted from training datasets and tested on smaller datasets to predict a previously unknown value.

#### KNN - K Nearest Neighbor

This model is one of the most straightforward, although relatively inefficient, supervised learning models. It uses proximity to classify a data point. After the training data is labeled with their class or numerical value, the distance is calculated and the algorithm identifies the k closest labeled examples to the queried data point. It then returns the *plurality* label of the k closest examples.
The **KNN** model is one of the most straightforward, although relatively inefficient, supervised learning models. It uses *proximity* to classify a data point. After the training data is labeled with their class or numerical value, the distance is calculated and the algorithm identifies the **k** closest labeled examples to the queried data point. It then returns the *plurality* label of the k closest examples.

#### Decision Trees

A decision tree is another supervised machine learning algorithm that relies on the relationship between nodes with dataset feature values. These work by splitting data into subsets based on feature values in a way that stems down and creates a tree-like visualization. Each node represents a decision point while each branch that stems from a node is that decision made. A node with no branches holds the final outcome and is referred to as a leaf node; the node with no branches leading to it is the root node and is the starting decision. While can handle both numerical and categorical data and are fairly easy to visualize, they are prone to overfitting when not regulated properly.
A **Decision tree** is another supervised machine learning algorithm that relies on the relationship between nodes with dataset *feature values*. These work by splitting data into subsets based on feature values in a way that stems down and creates a tree-like visualization. Each node represents a decision point while each branch that stems from a node is that decision made.

A node with no branches holds the final outcome and is referred to as a *leaf node*; the node with no branches leading to it is the *root node* and is the starting decision. While can handle both numerical and categorical data and are fairly easy to visualize, they are prone to overfitting when not regulated properly.

#### Linear Regression

Linear regression is one of the most fundamental methods used for modeling a relationship between two or more variables, attempting to express that relationship as a linear equation. This method determines the coefficients that best minimize the sum squared difference between the observed values and the values predicted by a model. It is very handy for interpreting trends within data and ultimately making predictions.
**Linear regression** is one of the most fundamental methods used for modeling a relationship between two or more variables, attempting to express that relationship as a linear equation.

This method determines the coefficients that best minimize the *sum squared difference* between the observed values and the values predicted by a model. It is very handy for interpreting trends within data and ultimately making predictions.

#### Clustering

Another relatively straightforward algorithm and close cousin of the KNN is the clustering method, particularly the k-means method. This unsupervised machine learning model finds commonalities among subsets in data based on their characteristics without any previously established labels or categories. There are other algorithms that work similarly, like the hierarchical clustering and DBSCAN methods, that approach the process of measuring similarity between data points differently.
Another relatively straightforward algorithm and close cousin of the KNN is **clustering**, particularly the *k-means method*. This unsupervised machine learning model finds commonalities among subsets in data based on their characteristics without any previously established labels or categories.

There are other algorithms that work similarly, like the hierarchical clustering and DBSCAN methods, that approach the process of measuring similarity between data points differently.

#### Neural Networks

One of the more commonly used methods today are deep learning neural network models. These machine learning models, inspired by the function and structure of the brain, consist of interconnected layers of nodes with their own associated weights. Data passes through these layers, transforming as they pass through mathematical functions, and eventually extracting complex patterns and predictions from datasets. These algorithms work particularly well for large and high-dimensional data, like image and speech recognition as well as language processing. A method called backpropagation is needed to adjust the weights based on the error of the networks predictions and with enough iterations and training, neural networks can produce amazing levels of accuracy in their predictions.
One of the more commonly used methods today are deep learning **neural network models**. These machine learning models, inspired by the function and structure of the brain, consist of interconnected layers of nodes with their own associated weights. Data passes through these layers, transforming as they pass through mathematical functions, and eventually extracting complex patterns and predictions from datasets.

These algorithms work* particularly well for *large and high-dimensional data*, like image and speech recognition as well as language processing. A method called *backpropagation* is needed to adjust the weights based on the error of the networks predictions and with enough iterations and training, neural networks can produce amazing levels of accuracy in their predictions.

## Data Visualization

Expand Down

0 comments on commit 4a07dc6

Please sign in to comment.