Artificial Intelligence (AI) and Machine Learning (ML) are deeply intertwined, with ML representing a crucial subset of AI. AI is a broad field focused on creating intelligent machines capable of performing tasks that would typically require human intelligence, encompassing a wide range of capabilities such as problem-solving, learning, planning, and language understanding. ML, on the other hand, is specifically concerned with the development of algorithms and statistical models that enable computers to learn and make decisions based on data, effectively "training" the machine to perform tasks without being explicitly programmed for each task.
In this series of articles, we'll go deeper into machine learning algorithms and concentrate on their types and methods. In general, machine learning algorithms fall into four broad categories:
There are more varieties, such "Learn to Learn" and "Deductive Reasoning," but we only go over the main four here. We'll delve farther into each one.
Supervised Learning :
Supervised learning is one of the primary categories of machine learning algorithms. In supervised learning, the algorithm learns from a labeled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data. Supervised learning is about an algorithm learning a rule from labeled training data, and then applying this rule to make predictions on new unseen data.
Types of Main Tasks:
** FOR KNOWLEDGE : SOME OF RESEARCH PAPERS CONSIDER A TIME-SERIES FORECASTING IS A THIRD TYPE OF SUPERVISED LEARNING.
Unsupervised Learning :
Unsupervised learning is a type of machine learning where algorithms are trained on and infer from data that is Not labeled. Unlike supervised learning where the training data is accompanied by correct answers, unsupervised learning involves working with data that does not have explicit instructions on what to do with it. This makes it more challenging but also very useful for discovering hidden patterns in data.
Unsupervised learning is a powerful tool in data science, used to draw inferences and find patterns in input data without the need for labeled outcomes. It's particularly useful for exploratory data analysis, complex problem-solving, and scenarios where manual labeling of data is impractical.
Types of Main Tasks:
** FOR KNOWLEDGE : SOME OF RESEARCH PAPERS CONSIDER A DIMENSIONALITY REDUCTION IS A THIRD TYPE OF UNSUPERVISED LEARNING.
Semi-supervised :
Semi-supervised learning is a type of machine learning that falls between supervised and unsupervised learning. It involves algorithms that learn from a dataset that includes both labeled and unlabeled data, typically with a much larger portion of unlabeled data.
The value of semi-supervised learning lies in its ability to leverage a large amount of unlabeled data, which is often easier and less costly to acquire than labeled data. This is particularly beneficial in situations where labeling data is expensive or requires expert knowledge.
Types of Tasks (The same types of Supervised & Unsupervised learning):
Reinforcement Learning (RL)
Reinforcement learning is a type of machine learning that focuses how agents should take actions in an environment to maximize some notion of cumulative reward. It is distinct from the supervised learning paradigm, as it is based on learning from the consequences of actions, rather than from direct instruction.
Types of Tasks:
Reinforcement learning is well-suited for a variety of tasks, including:
- Regression
You decide to create a Regression model when:
If all this in your case so it’s below under the Regression field, the regression divide into two domain depends of the nature of data.
1. Linear Data
The concept of linear data extends from the broader principle of linear relationships in mathematics and statistics. It typically means that there is a straight-line relationship between two variables. This relationship is predictable and can usually be described by a simple equation of the form y=mx+b, where changes in one variable are proportional and consistent with changes in another.
Let’s filter it now you decide the type is Supervised learning because your data is labeled then due the data and task feature you considered under Regression field, after that if your data is Linear so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
2. Non-Linear Data
Non-linear data refers to situations where there is not a straight-line relationship between variables. In non-linear relationships, the change in one variable does not result in a proportional and consistent change in another variable across the range of data. The relationship can be complex and is often modeled using more sophisticated functions that capture the variability in the relationship, such as exponential, logarithmic, or trigonometric functions.
Let’s filter it now you decide the type is Supervised learning because your data is labeled then due the data and task feature you considered under Regression field, after that if your data is Non-Linear so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
Also we can use alternative algorithms in non-linear data, such as :
- Classification
You decide to create a Classification model when:
If all this in your case so it’s below under the Classification field, the classification as regression divide into two domain depends of the nature of data.
1. Linear Data
a Linear classifier makes predictions based on a linear decision boundary. This means the classifier separates classes using a line (in two dimensions), a plane (in three dimensions), or a hyperplane (in higher dimensions). The decision boundary is determined based on the linear combination of features.
Let’s filter it now you decide the type is Supervised learning because your data is labeled then due the data and task feature you considered under Classification field, after that if your data is Linear so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
2. Non-Linear Data
Non-linear classifiers, on the other hand, use decision boundaries that are not straight lines or hyperplanes. These classifiers can model more complex patterns by incorporating polynomial, radial basis function (RBF), or other non-linear kernels in SVMs, or by using methods like decision trees or neural networks, which inherently model non-linear relationships among features.
Let’s filter it now you decide the type is Supervised learning because your data is labeled then due the data and task feature you considered under Classification field, after that if your data is Non-Linear so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
- Clustering
You decide create a Clustering model when:
If all this in your case so it’s below under the Clustering field, the clustering divide into two domain depends of the nature of data.
1. High Dimensional
High dimensional data involves datasets with a large number of features (dozens, hundreds, or even thousands). This is common in areas like genomics, text processing, and image recognition, where each data point can have a vast array of attributes.
Let’s filter it now you decide the type is Unsupervised learning because your data is unlabeled then due the data and task feature you considered under Clustering field, after that if your data is High dimensional so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
2. Low Dimensional
Low dimensional data refers to datasets with a relatively small number of features or dimensions (usually just a few). For example, a dataset with features like height and weight only would be considered low dimensional.
Let’s filter it now you decide the type is Unsupervised learning because your data is unlabeled then due the data and task feature you considered under Clustering field, after that if your data is Low dimensional so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
- Association
You decide create a Association model when:
If all this in your case so it’s below under the Association field, the association divide into two domain depends of the nature of data.
1. High Support Threshold
Setting a high support threshold means that only itemsets occurring very frequently in the dataset are considered for further analysis. This approach is typically used to ensure that the discovered rules are robust and representative of general trends in the data.
Let’s filter it now you decide the type is Unsupervised learning because your data is unlabeled then due the data and task feature you considered under Association field, after that if your data is High Support Threshold so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
2. Low Support Threshold
A low support threshold allows itemsets with relatively lower frequencies to qualify as significant. This setting is useful in domains where even rare associations can provide valuable insights.
Let’s filter it now you decide the type is Unsupervised learning because your data is labeled then due the data and task feature you considered under Association field, after that if your data is Low Support Threshold so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
- Semi-Supervised Learning
You decide to create a Semi-Supervised Learning model when:
If all this in your case so it’s below under the Semi-Supervised Learning field, we can use in this type the Supervised & Unsupervised types's algorithms depends on data and task.
- Policy Based Methods
You decide to create a Policy Based Methods model when:
If all this in your case so it’s below under the Policy Based Methods field, the policy based methods divide into two domain depends of the nature of data.
1. Model Based RL
Model-Based RL involves an explicit model of the environment. This model predicts the next state and the rewards given the current state and action. In the context of policy-based methods, this model is used to simulate the outcomes of various actions to inform the policy update process.
Let’s filter it now you decide the type is Reinforcement Learning because your data is obtained from environment then due the data and task feature you considered under Policy Based Methods field, after that if your data is Model Based RL so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
2. Model Free RL
Model-Free RL, in contrast, does not use a model of the environment's dynamics. Instead, it relies directly on actual interactions with the environment to learn and optimize its policy. In policy-based methods, this means learning the policy by directly evaluating the rewards received from the actions taken, without any prediction or simulation of future states.
Let’s filter it now you decide the type is Reinforcement Learning because your data is obtained from environment then due the data and task feature you considered under Policy Based Methods field, after that if your data is Model Free RL so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
- Value Based Methods
You decide to create a Value Based Methods model when:
If all this in your case so it’s below under the Value Based Methods field, the value based method as policy based method divide into two domain depends of the nature of data.
1. Model Based RL
Model-Based RL in the context of value-based methods involves using a model of the environment to estimate future states and rewards. This model is utilized to compute the values of states or actions without needing extensive real-world interactions.
Let’s filter it now you decide the type is Reinforcement learning because the data is obtained from the environment due the data and task feature you considered under Value Based Methods field, after that if your task in Model Based RL so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
2. Model Free RL
Model-Free RL, when applied to value-based methods, focuses on learning from the actual interactions with the environment, using observed rewards and state transitions to update the value estimates directly without any model of the environment's dynamics.
Let’s filter it now you decide the type is Reinforcement learning because the data is obtained from the environment due the data and task feature you considered under Value Based Methods field, after that if your task in Model Free RL so you’ll use one of these algorithms depends on some factors i mentioned it in articles for each one.
Data is the foundation upon which all algorithms in the field of machine learning work and develop. Fundamentally, the purpose of machine learning algorithms is to use the data they process to find patterns, forecast outcomes, or produce insights. The quantity and quality of data have a direct impact on the efficacy and precision of machine learning models; this is a mutually beneficial connection. These algorithms can learn more efficiently, adjust to new situations, and produce more accurate answers because to large and diverse datasets. In contrast, biassed or erroneous models might result from inadequate or missing data. Advancements in disciplines like artificial intelligence, where data is not just fuel but the essential basis upon which intelligent systems are created, are driven by the ongoing interaction between data and machine learning algorithms.
Types of data:-
-------------------------
1. Structured Data (Tabular Data)
Structured data refers to data that is organized and formatted in a specific way to make it easily readable and understandable by both humans and machines. This is typically achieved through the use of a well-defined schema or data model, which provides a structure for the data.
Tabular data is a type of structured data that is organized into rows and columns, much like a table. It's one of the most common and straightforward ways to represent data for easy understanding and analysis, we can consider it as Tabular Data if it follows a consistent format. In statistics, Tabular data refers to data that is organized in a table with rows and columns. Tabular Data divided into:
A) Quantitative Data (Numerical): (e.g., speed of cars).
B) Qualitative Data (Categorical): (e.g., product categories).
A CSV file is a type of plain text file that uses specific structuring to arrange tabular data. Each line in a CSV file corresponds to a row in the table, and commas separate the individual cells in the row. For example, a simple CSV file might look like this:
In the context of machine learning or data analysis:
This is the Structured Data type and its shapes, typically found in databases and spreadsheets, is organized and easily quantifiable, encompassing both numerical and categorical data.
2. Unstructured Data
Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. This type of data is typically text-heavy, but may contain data such as dates, numbers, and facts as well. Unstructured data can be in a few shapes:
3. Semi - Structured Data
Semi-structured data is a type of data that is not purely structured, but also not completely unstructured. It contains some level of organization or structure, but does not conform to a rigid schema or data model, and may contain elements that are not easily categorized or classified.
NOTE: As you notice, all types of unstructured data are transformed in one way or another into structured data. For example, text data is qualitative data, and image data is transformed into features that are numerical or categorical, and voice data is transformed into text.
Embarking on a journey through the realm of machine learning, you've now traversed the initial path, gaining a comprehensive overview of various machine learning algorithms and the diverse shapes of data they interact with. This foundational knowledge is akin to a map in hand, guiding you through the intricate landscape of data science. With this understanding, you are well-equipped to delve into the next critical phase: the pre-processing blog. Here, you'll unlock the true potential of your data, transforming raw information into a refined format, ready to be harnessed by the powerful algorithms you've just learned about. Your adventure into the depths of machine learning continues, and the pre-processing page is your gateway to turning theoretical knowledge into practical expertise.
HINTS: