diff --git a/docs/md/HEADER.md b/docs/md/HEADER.md new file mode 100644 index 0000000..195b397 --- /dev/null +++ b/docs/md/HEADER.md @@ -0,0 +1,164 @@ +[PyGAD](https://github.com/ahmedfgad/GeneticAlgorithmPython) is an open-source Python library for building the genetic algorithm and optimizing machine learning algorithms. It works with [Keras](https://keras.io) and [PyTorch](https://pytorch.org). + +> Try the [Optimization Gadget](https://optimgadget.com), a free cloud-based tool powered by PyGAD. It simplifies optimization by reducing or eliminating the need for coding while providing insightful visualizations. + +[PyGAD](https://github.com/ahmedfgad/GeneticAlgorithmPython) supports different types of crossover, mutation, and parent selection operators. [PyGAD](https://github.com/ahmedfgad/GeneticAlgorithmPython) allows different types of problems to be optimized using the genetic algorithm by customizing the fitness function. It works with both single-objective and multi-objective optimization problems. + +![PYGAD-LOGO](https://user-images.githubusercontent.com/16560492/101267295-c74c0180-375f-11eb-9ad0-f8e37bd796ce.png) + +*Logo designed by [Asmaa Kabil](https://www.linkedin.com/in/asmaa-kabil-9901b7b6)* + +Besides building the genetic algorithm, it builds and optimizes machine learning algorithms. Currently, [PyGAD](https://pypi.org/project/pygad) supports building and training (using genetic algorithm) artificial neural networks for classification problems. + +The library is under active development and more features added regularly. Please contact us if you want a feature to be supported. + +# Donation & Support + +You can donate to PyGAD via: + +- [Credit/Debit Card](https://donate.stripe.com/eVa5kO866elKgM0144): https://donate.stripe.com/eVa5kO866elKgM0144 +- [Open Collective](https://opencollective.com/pygad): [opencollective.com/pygad](https://opencollective.com/pygad) +- PayPal: Use either this link: [paypal.me/ahmedfgad](https://paypal.me/ahmedfgad) or the e-mail address ahmed.f.gad@gmail.com +- Interac e-Transfer: Use e-mail address ahmed.f.gad@gmail.com +- Buy a product at [Teespring](https://pygad.creator-spring.com/): [pygad.creator-spring.com](https://pygad.creator-spring.com) + +# Installation + +To install [PyGAD](https://pypi.org/project/pygad), simply use pip to download and install the library from [PyPI](https://pypi.org/project/pygad) (Python Package Index). The library lives a PyPI at this page https://pypi.org/project/pygad. + +Install PyGAD with the following command: + +```python +pip3 install pygad +``` + +# Quick Start + +To get started with [PyGAD](https://pypi.org/project/pygad), simply import it. + +```python +import pygad +``` + +Using [PyGAD](https://pypi.org/project/pygad), a wide range of problems can be optimized. A quick and simple problem to be optimized using the [PyGAD](https://pypi.org/project/pygad) is finding the best set of weights that satisfy the following function: + +``` +y = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6 +where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=44 +``` + +The first step is to prepare the inputs and the outputs of this equation. + +```python +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 +``` + +A very important step is to implement the fitness function that will be used for calculating the fitness value for each solution. Here is one. + +If the fitness function returns a number, then the problem is single-objective. If a `list`, `tuple`, or `numpy.ndarray` is returned, then it is a multi-objective problem (applicable even if a single element exists). + +```python +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / numpy.abs(output - desired_output) + return fitness +``` + +Next is to prepare the parameters of [PyGAD](https://pypi.org/project/pygad). Here is an example for a set of parameters. + +```python +fitness_function = fitness_func + +num_generations = 50 +num_parents_mating = 4 + +sol_per_pop = 8 +num_genes = len(function_inputs) + +init_range_low = -2 +init_range_high = 5 + +parent_selection_type = "sss" +keep_parents = 1 + +crossover_type = "single_point" + +mutation_type = "random" +mutation_percent_genes = 10 +``` + +After the parameters are prepared, an instance of the **pygad.GA** class is created. + +```python +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + fitness_func=fitness_function, + sol_per_pop=sol_per_pop, + num_genes=num_genes, + init_range_low=init_range_low, + init_range_high=init_range_high, + parent_selection_type=parent_selection_type, + keep_parents=keep_parents, + crossover_type=crossover_type, + mutation_type=mutation_type, + mutation_percent_genes=mutation_percent_genes) +``` + +After creating the instance, the `run()` method is called to start the optimization. + +```python +ga_instance.run() +``` + +After the `run()` method completes, information about the best solution found by PyGAD can be accessed. + +```python +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print("Parameters of the best solution : {solution}".format(solution=solution)) +print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness)) + +prediction = numpy.sum(numpy.array(function_inputs)*solution) +print("Predicted output based on the best solution : {prediction}".format(prediction=prediction)) +``` + +``` +Parameters of the best solution : [3.92692328 -0.11554946 2.39873381 3.29579039 -0.74091476 1.05468517] +Fitness value of the best solution = 157.37320042925006 +Predicted output based on the best solution : 44.00635432206546 +``` + +There is more to do using PyGAD. Read its documentation to explore the features of PyGAD. + +# PyGAD's Modules + +[PyGAD](https://pypi.org/project/pygad) has the following modules: + +1. The main module has the same name as the library `pygad` which is the main interface to build the genetic algorithm. +2. The `nn` module builds artificial neural networks. +3. The `gann` module optimizes neural networks (for classification and regression) using the genetic algorithm. +4. The `cnn` module builds convolutional neural networks. +5. The `gacnn` module optimizes convolutional neural networks using the genetic algorithm. +6. The `kerasga` module to train [Keras](https://keras.io) models using the genetic algorithm. +7. The `torchga` module to train [PyTorch](https://pytorch.org) models using the genetic algorithm. +8. The `visualize` module to visualize the results. +9. The `utils` module contains the operators (crossover, mutation, and parent selection) and the NSGA-II code. +10. The `helper` module has some helper functions. + +The documentation discusses these modules. + +# PyGAD Citation - Bibtex Formatted + +If you used PyGAD, please consider citing its paper with the following details: + +``` +@article{gad2023pygad, + title={Pygad: An intuitive genetic algorithm python library}, + author={Gad, Ahmed Fawzy}, + journal={Multimedia Tools and Applications}, + pages={1--14}, + year={2023}, + publisher={Springer} +} +``` + diff --git a/docs/md/cnn.md b/docs/md/cnn.md new file mode 100644 index 0000000..c57ab00 --- /dev/null +++ b/docs/md/cnn.md @@ -0,0 +1,510 @@ +# `pygad.cnn` Module + +This section of the PyGAD's library documentation discusses the **pygad.cnn** module. + +Using the **pygad.cnn** module, convolutional neural networks (CNNs) are created. The purpose of this module is to only implement the **forward pass** of a convolutional neural network without using a training algorithm. The **pygad.cnn** module builds the network layers, implements the activations functions, trains the network, makes predictions, and more. + +Later, the **pygad.gacnn** module is used to train the **pygad.cnn** network using the genetic algorithm built in the **pygad** module. + +# Supported Layers + +Each layer supported by the **pygad.cnn** module has a corresponding class. The layers and their classes are: + +1. **Input**: Implemented using the `pygad.cnn.Input2D` class. +2. **Convolution**: Implemented using the `pygad.cnn.Conv2D` class. +3. **Max Pooling**: Implemented using the `pygad.cnn.MaxPooling2D` class. +4. **Average Pooling**: Implemented using the `pygad.cnn.AveragePooling2D` class. +5. **Flatten**: Implemented using the `pygad.cnn.Flatten` class. +6. **ReLU**: Implemented using the `pygad.cnn.ReLU` class. +7. **Sigmoid**: Implemented using the `pygad.cnn.Sigmoid` class. +8. **Dense** (Fully Connected): Implemented using the `pygad.cnn.Dense` class. + +In the future, more layers will be added. + +Except for the input layer, all of listed layers has 4 instance attributes that do the same function which are: + +1. `previous_layer`: A reference to the previous layer in the CNN architecture. +2. `layer_input_size`: The size of the input to the layer. +3. `layer_output_size`: The size of the output from the layer. +4. `layer_output`: The latest output generated from the layer. It default to `None`. + +In addition to such attributes, the layers may have some additional attributes. The next subsections discuss such layers. + +## `pygad.cnn.Input2D` Class + +The `pygad.cnn.Input2D` class creates the input layer for the convolutional neural network. For each network, there is only a single input layer. The network architecture must start with an input layer. + +This class has no methods or class attributes. All it has is a constructor that accepts a parameter named `input_shape` representing the shape of the input. + +The instances from the `Input2D` class has the following attributes: + +1. `input_shape`: The shape of the input to the pygad.cnn. +2. `layer_output_size` + +Here is an example of building an input layer with shape `(50, 50, 3)`. + +```python +input_layer = pygad.cnn.Input2D(input_shape=(50, 50, 3)) +``` + +Here is how to access the attributes within the instance of the `pygad.cnn.Input2D` class. + +```python +input_shape = input_layer.input_shape +layer_output_size = input_layer.layer_output_size + +print("Input2D Input shape =", input_shape) +print("Input2D Output shape =", layer_output_size) +``` + +This is everything about the input layer. + +## `pygad.cnn.Conv2D` Class + +Using the `pygad.cnn.Conv2D` class, convolution (conv) layers can be created. To create a convolution layer, just create a new instance of the class. The constructor accepts the following parameters: + +- `num_filters`: Number of filters. +- `kernel_size`: Filter kernel size. +- `previous_layer`: A reference to the previous layer. Using the `previous_layer` attribute, a linked list is created that connects all network layers. For more information about this attribute, please check the **previous_layer** attribute section of the `pygad.nn` module documentation. +- `activation_function=None`: A string representing the activation function to be used in this layer. Defaults to `None` which means no activation function is applied while applying the convolution layer. An activation layer can be added separately in this case. The supported activation functions in the conv layer are `relu` and `sigmoid`. + +Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: + +- `filter_bank_size`: Size of the filter bank in this layer. +- `initial_weights`: The initial weights for the conv layer. +- `trained_weights`: The trained weights of the conv layer. This attribute is initialized by the value in the `initial_weights` attribute. +- `layer_input_size` +- `layer_output_size` +- `layer_output` + +Here is an example for creating a conv layer with 2 filters and a kernel size of 3. Note that the `previous_layer` parameter is assigned to the input layer `input_layer`. + +```python +conv_layer = pygad.cnn.Conv2D(num_filters=2, + kernel_size=3, + previous_layer=input_layer, + activation_function=None) +``` + +Here is how to access some attributes in the dense layer: + +```python +filter_bank_size = conv_layer.filter_bank_size +conv_initail_weights = conv_layer.initial_weights + +print("Filter bank size attributes =", filter_bank_size) +print("Initial weights of the conv layer :", conv_initail_weights) +``` + +Because `conv_layer` holds a reference to the input layer, then the number of input neurons can be accessed. + +```python +input_layer = conv_layer.previous_layer +input_shape = input_layer.num_neurons + +print("Input shape =", input_shape) +``` + +Here is another conv layer where its `previous_layer` attribute points to the previously created conv layer and it uses the `ReLU` activation function. + +```python +conv_layer2 = pygad.cnn.Conv2D(num_filters=2, + kernel_size=3, + previous_layer=conv_layer, + activation_function="relu") +``` + +Because `conv_layer2` holds a reference to `conv_layer` in its `previous_layer` attribute, then the attributes in `conv_layer` can be accessed. + +```python +conv_layer = conv_layer2.previous_layer +filter_bank_size = conv_layer.filter_bank_size + +print("Filter bank size attributes =", filter_bank_size) +``` + +After getting the reference to `conv_layer`, we can use it to access the number of input neurons. + +```python +conv_layer = conv_layer2.previous_layer +input_layer = conv_layer.previous_layer +input_shape = input_layer.num_neurons + +print("Input shape =", input_shape) +``` + +## `pygad.cnn.MaxPooling2D` Class + +The `pygad.cnn.MaxPooling2D` class builds a max pooling layer for the CNN architecture. The constructor of this class accepts the following parameter: + +- `pool_size`: Size of the window. +- `previous_layer`: A reference to the previous layer in the CNN architecture. +- `stride=2`: A stride that default to 2. + +Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: + +- `layer_input_size` +- `layer_output_size` +- `layer_output` + +## `pygad.cnn.AveragePooling2D` Class + +The `pygad.cnn.AveragePooling2D` class is similar to the `pygad.cnn.MaxPooling2D` class except that it applies the max pooling operation rather than average pooling. + +## `pygad.cnn.Flatten` Class + +The `pygad.cnn.Flatten` class implements the flatten layer which converts the output of the previous layer into a 1D vector. The constructor accepts only the `previous_layer` parameter. + +The following instance attributes exist: + +* `previous_layer` +* `layer_input_size` +* `layer_output_size` +* `layer_output` + +## `pygad.cnn.ReLU` Class + +The `pygad.cnn.ReLU` class implements the ReLU layer which applies the ReLU activation function to the output of the previous layer. + +The constructor accepts only the `previous_layer` parameter. + +The following instance attributes exist: + +* `previous_layer` +* `layer_input_size` +* `layer_output_size` +* `layer_output` + +## `pygad.cnn.Sigmoid` Class + +The `pygad.cnn.Sigmoid` class is similar to the `pygad.cnn.ReLU` class except that it applies the sigmoid function rather than the ReLU function. + +## `pygad.cnn.Dense` Class + +The `pygad.cnn.Dense` class implement the dense layer. Its constructor accepts the following parameters: + +- `num_neurons`: Number of neurons in the dense layer. +- `previous_layer`: A reference to the previous layer. +- `activation_function`: A string representing the activation function to be used in this layer. Defaults to `"sigmoid"`. Currently, the supported activation functions in the dense layer are `"sigmoid"`, `"relu"`, and `softmax`. + +Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: + +* `initial_weights`: The initial weights for the dense layer. +* `trained_weights`: The trained weights of the dense layer. This attribute is initialized by the value in the `initial_weights` attribute. +* `layer_input_size` +* `layer_output_size` +* `layer_output` + +# `pygad.cnn.Model` Class + +An instance of the `pygad.cnn.Model` class represents a CNN model. The constructor of this class accepts the following parameters: + +- `last_layer`: A reference to the last layer in the CNN architecture (i.e. dense layer). +- `epochs=10`: Number of epochs. +- `learning_rate=0.01`: Learning rate. + +Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, a new instance attribute named `network_layers` is created which holds a list with references to the CNN layers. Such a list is returned using the `get_layers()` method in the `pygad.cnn.Model` class. + +There are a number of methods in the `pygad.cnn.Model` class which serves in training, testing, and retrieving information about the model. These methods are discussed in the next subsections. + +### `get_layers()` + +Creates a list of all layers in the CNN model. It accepts no parameters. + +### `train()` + +Trains the CNN model. + +Accepts the following parameters: + +* `train_inputs`: Training data inputs. + +* `train_outputs`: Training data outputs. + +This method trains the CNN model according to the number of epochs specified in the constructor of the `pygad.cnn.Model` class. + +It is important to note that no learning algorithm is used for training the pygad.cnn. Just the learning rate is used for making some changes which is better than leaving the weights unchanged. + +### `feed_sample()` + +Feeds a sample in the CNN layers and returns results of the last layer in the pygad.cnn. + +### `update_weights()` + +Updates the CNN weights using the learning rate. It is important to note that no learning algorithm is used for training the pygad.cnn. Just the learning rate is used for making some changes which is better than leaving the weights unchanged. + +### `predict()` + +Uses the trained CNN for making predictions. + +Accepts the following parameter: + +* `data_inputs`: The inputs to predict their label. + +It returns a list holding the samples predictions. + +### `summary()` + +Prints a summary of the CNN architecture. + +# Supported Activation Functions + +The supported activation functions in the convolution layer are: + +1. Sigmoid: Implemented using the `pygad.cnn.sigmoid()` function. +2. Rectified Linear Unit (ReLU): Implemented using the `pygad.cnn.relu()` function. + +The dense layer supports these functions besides the `softmax` function implemented in the `pygad.cnn.softmax()` function. + +# Steps to Build a Neural Network + +This section discusses how to use the `pygad.cnn` module for building a neural network. The summary of the steps are as follows: + +- Reading the Data +- Building the CNN Architecture +- Building Model +- Model Summary +- Training the CNN +- Making Predictions +- Calculating Some Statistics + +## Reading the Data + +Before building the network architecture, the first thing to do is to prepare the data that will be used for training the network. + +In this example, 4 classes of the **Fruits360** dataset are used for preparing the training data. The 4 classes are: + +1. [**Apple Braeburn**](https://github.com/ahmedfgad/NumPyANN/tree/master/apple): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/apple +2. [**Lemon Meyer**](https://github.com/ahmedfgad/NumPyANN/tree/master/lemon): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/lemon +3. [**Mango**](https://github.com/ahmedfgad/NumPyANN/tree/master/mango): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/mango +4. [**Raspberry**](https://github.com/ahmedfgad/NumPyANN/tree/master/raspberry): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/raspberry + +Just 20 samples from each of the 4 classes are saved into a NumPy array available in the [**dataset_inputs.npy**](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy) file: https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy + +The shape of this array is `(80, 100, 100, 3)` where the shape of the single image is `(100, 100, 3)`. + +The [**dataset_outputs.npy**](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy) file (https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy) has the class labels for the 4 classes: + +1. [**Apple Braeburn**](https://github.com/ahmedfgad/NumPyANN/tree/master/apple): Class label is **0** +2. [**Lemon Meyer**](https://github.com/ahmedfgad/NumPyANN/tree/master/lemon): Class label is **1** +3. [**Mango**](https://github.com/ahmedfgad/NumPyANN/tree/master/mango): Class label is **2** +4. [**Raspberry**](https://github.com/ahmedfgad/NumPyANN/tree/master/raspberry): Class label is **3** + +Simply, download and reach the 2 files to return the NumPy arrays according to the next 2 lines: + +```python +train_inputs = numpy.load("dataset_inputs.npy") +train_outputs = numpy.load("dataset_outputs.npy") +``` + +After the data is prepared, next is to create the network architecture. + +## Building the Network Architecture + +The input layer is created by instantiating the `pygad.cnn.Input2D` class according to the next code. A network can only have a single input layer. + +```python +import pygad.cnn +sample_shape = train_inputs.shape[1:] + +input_layer = pygad.cnn.Input2D(input_shape=sample_shape) +``` + +After the input layer is created, next is to create a number of layers layers according to the next code. Normally, the last dense layer is regarded as the output layer. Note that the output layer has a number of neurons equal to the number of classes in the dataset which is 4. + +```python +conv_layer1 = pygad.cnn.Conv2D(num_filters=2, + kernel_size=3, + previous_layer=input_layer, + activation_function=None) +relu_layer1 = pygad.cnn.Sigmoid(previous_layer=conv_layer1) +average_pooling_layer = pygad.cnn.AveragePooling2D(pool_size=2, + previous_layer=relu_layer1, + stride=2) + +conv_layer2 = pygad.cnn.Conv2D(num_filters=3, + kernel_size=3, + previous_layer=average_pooling_layer, + activation_function=None) +relu_layer2 = pygad.cnn.ReLU(previous_layer=conv_layer2) +max_pooling_layer = pygad.cnn.MaxPooling2D(pool_size=2, + previous_layer=relu_layer2, + stride=2) + +conv_layer3 = pygad.cnn.Conv2D(num_filters=1, + kernel_size=3, + previous_layer=max_pooling_layer, + activation_function=None) +relu_layer3 = pygad.cnn.ReLU(previous_layer=conv_layer3) +pooling_layer = pygad.cnn.AveragePooling2D(pool_size=2, + previous_layer=relu_layer3, + stride=2) + +flatten_layer = pygad.cnn.Flatten(previous_layer=pooling_layer) +dense_layer1 = pygad.cnn.Dense(num_neurons=100, + previous_layer=flatten_layer, + activation_function="relu") +dense_layer2 = pygad.cnn.Dense(num_neurons=4, + previous_layer=dense_layer1, + activation_function="softmax") +``` + +After the network architecture is prepared, the next step is to create a CNN model. + +## Building Model + +The CNN model is created as an instance of the `pygad.cnn.Model` class. Here is an example. + +```python +model = pygad.cnn.Model(last_layer=dense_layer2, + epochs=5, + learning_rate=0.01) +``` + +After the model is created, a summary of the model architecture can be printed. + +## Model Summary + +The `summary()` method in the `pygad.cnn.Model` class prints a summary of the CNN model. + +```python +model.summary() +``` + +```python +----------Network Architecture---------- + + + + + + + + + + + + +---------------------------------------- +``` + +## Training the Network + +After the model and the data are prepared, then the model can be trained using the the `pygad.cnn.train()` method. + +```python +model.train(train_inputs=train_inputs, + train_outputs=train_outputs) +``` + +After training the network, the next step is to make predictions. + +## Making Predictions + +The `pygad.cnn.predict()` method uses the trained network for making predictions. Here is an example. + +```python +predictions = model.predict(data_inputs=train_inputs) +``` + +It is not expected to have high accuracy in the predictions because no training algorithm is used. + +## Calculating Some Statistics + +Based on the predictions the network made, some statistics can be calculated such as the number of correct and wrong predictions in addition to the classification accuracy. + +```python +num_wrong = numpy.where(predictions != train_outputs)[0] +num_correct = train_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/train_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +It is very important to note that it is not expected that the classification accuracy is high because no training algorithm is used. Please check the documentation of the `pygad.gacnn` module for training the CNN using the genetic algorithm. + +# Examples + +This section gives the complete code of some examples that build neural networks using `pygad.cnn`. Each subsection builds a different network. + +## Image Classification + +This example is discussed in the **Steps to Build a Convolutional Neural Network** section and its complete code is listed below. + +Remember to either download or create the [dataset_features.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_features.npy) and [dataset_outputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy) files before running this code. + +```python +import numpy +import pygad.cnn + +""" +Convolutional neural network implementation using NumPy +A tutorial that helps to get started (Building Convolutional Neural Network using NumPy from Scratch) available in these links: + https://www.linkedin.com/pulse/building-convolutional-neural-network-using-numpy-from-ahmed-gad + https://towardsdatascience.com/building-convolutional-neural-network-using-numpy-from-scratch-b30aac50e50a + https://www.kdnuggets.com/2018/04/building-convolutional-neural-network-numpy-scratch.html +It is also translated into Chinese: http://m.aliyun.com/yunqi/articles/585741 +""" + +train_inputs = numpy.load("dataset_inputs.npy") +train_outputs = numpy.load("dataset_outputs.npy") + +sample_shape = train_inputs.shape[1:] +num_classes = 4 + +input_layer = pygad.cnn.Input2D(input_shape=sample_shape) +conv_layer1 = pygad.cnn.Conv2D(num_filters=2, + kernel_size=3, + previous_layer=input_layer, + activation_function=None) +relu_layer1 = pygad.cnn.Sigmoid(previous_layer=conv_layer1) +average_pooling_layer = pygad.cnn.AveragePooling2D(pool_size=2, + previous_layer=relu_layer1, + stride=2) + +conv_layer2 = pygad.cnn.Conv2D(num_filters=3, + kernel_size=3, + previous_layer=average_pooling_layer, + activation_function=None) +relu_layer2 = pygad.cnn.ReLU(previous_layer=conv_layer2) +max_pooling_layer = pygad.cnn.MaxPooling2D(pool_size=2, + previous_layer=relu_layer2, + stride=2) + +conv_layer3 = pygad.cnn.Conv2D(num_filters=1, + kernel_size=3, + previous_layer=max_pooling_layer, + activation_function=None) +relu_layer3 = pygad.cnn.ReLU(previous_layer=conv_layer3) +pooling_layer = pygad.cnn.AveragePooling2D(pool_size=2, + previous_layer=relu_layer3, + stride=2) + +flatten_layer = pygad.cnn.Flatten(previous_layer=pooling_layer) +dense_layer1 = pygad.cnn.Dense(num_neurons=100, + previous_layer=flatten_layer, + activation_function="relu") +dense_layer2 = pygad.cnn.Dense(num_neurons=num_classes, + previous_layer=dense_layer1, + activation_function="softmax") + +model = pygad.cnn.Model(last_layer=dense_layer2, + epochs=1, + learning_rate=0.01) + +model.summary() + +model.train(train_inputs=train_inputs, + train_outputs=train_outputs) + +predictions = model.predict(data_inputs=train_inputs) +print(predictions) + +num_wrong = numpy.where(predictions != train_outputs)[0] +num_correct = train_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/train_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` diff --git a/docs/md/gacnn.md b/docs/md/gacnn.md new file mode 100644 index 0000000..132a625 --- /dev/null +++ b/docs/md/gacnn.md @@ -0,0 +1,478 @@ +# `pygad.gacnn` Module + +This section of the PyGAD's library documentation discusses the **pygad.gacnn** module. + +The `pygad.gacnn` module trains convolutional neural networks using the genetic algorithm. It makes use of the 2 modules `pygad` and `pygad.cnn`. + +# `pygad.gacnn.GACNN` Class + +The `pygad.gacnn` module has a class named `pygad.gacnn.GACNN` for training convolutional neural networks (CNNs) using the genetic algorithm. The constructor, methods, function, and attributes within the class are discussed in this section. + +## `__init__()` + +In order to train a CNN using the genetic algorithm, the first thing to do is to create an instance of the `pygad.gacnn.GACNN` class. + +The `pygad.gacnn.GACNN` class constructor accepts the following parameters: + +- `model`: model: An instance of the pygad.cnn.Model class representing the architecture of all solutions in the population. +- `num_solutions`: Number of CNNs (i.e. solutions) in the population. Based on the value passed to this parameter, a number of identical CNNs are created where their parameters are optimized using the genetic algorithm. + +## Instance Attributes + +All the parameters in the `pygad.gacnn.GACNN` class constructor are used as instance attributes. Besides such attributes, there is an extra attribute added to the instances from the `pygad.gacnn.GACNN` class which is: + +- `population_networks`: A list holding references to all the solutions (i.e. CNNs) used in the population. + +## Methods in the GACNN Class + +This section discusses the methods available for instances of the `pygad.gacnn.GACNN` class. + +### `create_population()` + +The `create_population()` method creates the initial population of the genetic algorithm as a list of CNNs (i.e. solutions). All the networks are copied from the CNN model passed to constructor of the GACNN class. + +The list of networks is assigned to the `population_networks` attribute of the instance. + +### `update_population_trained_weights()` + +The `update_population_trained_weights()` method updates the `trained_weights` attribute of the layers of each network (check the documentation of the `pygad.cnn` module) for more information) according to the weights passed in the `population_trained_weights` parameter. + +Accepts the following parameters: + +- `population_trained_weights`: A list holding the trained weights of all networks as matrices. Such matrices are to be assigned to the `trained_weights` attribute of all layers of all networks. + +# Functions in the `pygad.gacnn` Module + +This section discusses the functions in the `pygad.gacnn` module. + +## `pygad.gacnn.population_as_vectors()` + +Accepts the population as a list of references to the `pygad.cnn.Model` class and returns a list holding all weights of the layers of each solution (i.e. network) in the population as a vector. + +For example, if the population has 6 solutions (i.e. networks), this function accepts references to such networks and returns a list with 6 vectors, one for each network (i.e. solution). Each vector holds the weights for all layers for a single network. + +Accepts the following parameters: + +- `population_networks`: A list holding references to the `pygad.cnn.Model` class of the networks used in the population. + +Returns a list holding the weights vectors for all solutions (i.e. networks). + +## `pygad.gacnn.population_as_matrices()` + +Accepts the population as both networks and weights vectors and returns the weights of all layers of each solution (i.e. network) in the population as a matrix. + +For example, if the population has 6 solutions (i.e. networks), this function returns a list with 6 matrices, one for each network holding its weights for all layers. + +Accepts the following parameters: + +- `population_networks`: A list holding references to the `pygad.cnn.Model` class of the networks used in the population. +- `population_vectors`: A list holding the weights of all networks as vectors. Such vectors are to be converted into matrices. + +Returns a list holding the weights matrices for all solutions (i.e. networks). + +# Steps to Build and Train CNN using Genetic Algorithm + +The steps to use this project for building and training a neural network using the genetic algorithm are as follows: + +- Prepare the training data. +- Create an instance of the `pygad.gacnn.GACNN` class. +- Fetch the population weights as vectors. +- Prepare the fitness function. +- Prepare the generation callback function. +- Create an instance of the `pygad.GA` class. +- Run the created instance of the `pygad.GA` class. +- Plot the Fitness Values +- Information about the best solution. +- Making predictions using the trained weights. +- Calculating some statistics. + +Let's start covering all of these steps. + +## Prepare the Training Data + +Before building and training neural networks, the training data (input and output) is to be prepared. The inputs and the outputs of the training data are NumPy arrays. + +The data used in this example is available as 2 files: + +1. [dataset_inputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy): Data inputs. https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy +2. [dataset_outputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy): Class labels. https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy + +The data consists of 4 classes of images. The image shape is `(100, 100, 3)` and there are 20 images per class. For more information about the dataset, check the **Reading the Data** section of the `pygad.cnn` module. + +Simply download these 2 files and read them according to the next code. + +```python +import numpy + +train_inputs = numpy.load("dataset_inputs.npy") +train_outputs = numpy.load("dataset_outputs.npy") +``` + +For the output array, each element must be a single number representing the class label of the sample. The class labels must start at `0`. So, if there are 80 samples, then the shape of the output array is `(80)`. If there are 5 classes in the data, then the values of all the 200 elements in the output array must range from 0 to 4 inclusive. Generally, the class labels start from `0` to `N-1` where `N` is the number of classes. + +Note that the project only supports that each sample is assigned to only one class. + +## Building the Network Architecture + +Here is an example for a CNN architecture. + +```python +import pygad.cnn + +input_layer = pygad.cnn.Input2D(input_shape=(80, 80, 3)) +conv_layer = pygad.cnn.Conv2D(num_filters=2, + kernel_size=3, + previous_layer=input_layer, + activation_function="relu") +average_pooling_layer = pygad.cnn.AveragePooling2D(pool_size=5, + previous_layer=conv_layer, + stride=3) + +flatten_layer = pygad.cnn.Flatten(previous_layer=average_pooling_layer) +dense_layer = pygad.cnn.Dense(num_neurons=4, + previous_layer=flatten_layer, + activation_function="softmax") +``` + +After the network architecture is prepared, the next step is to create a CNN model. + +## Building Model + +The CNN model is created as an instance of the `pygad.cnn.Model` class. Here is an example. + +```python +model = pygad.cnn.Model(last_layer=dense_layer, + epochs=5, + learning_rate=0.01) +``` + +After the model is created, a summary of the model architecture can be printed. + +## Model Summary + +The `summary()` method in the `pygad.cnn.Model` class prints a summary of the CNN model. + +```python +model.summary() +``` + +```python +----------Network Architecture---------- + + + + +---------------------------------------- +``` + +The next step is to create an instance of the `pygad.gacnn.GACNN` class. + +## Create an Instance of the `pygad.gacnn.GACNN` Class + +After preparing the input data and building the CNN model, an instance of the `pygad.gacnn.GACNN` class is created by passing the appropriate parameters. + +Here is an example where the `num_solutions` parameter is set to 4 which means the genetic algorithm population will have 6 solutions (i.e. networks). All of these 6 CNNs will have the same architectures as specified by the `model` parameter. + +```python +import pygad.gacnn + +GACNN_instance = pygad.gacnn.GACNN(model=model, + num_solutions=4) +``` + +After creating the instance of the `pygad.gacnn.GACNN` class, next is to fetch the weights of the population as a list of vectors. + +## Fetch the Population Weights as Vectors + +For the genetic algorithm, the parameters (i.e. genes) of each solution are represented as a single vector. + +For this task, the weights of each CNN must be available as a single vector. In other words, the weights of all layers of a CNN must be grouped into a vector. + +To create a list holding the population weights as vectors, one for each network, the `pygad.gacnn.population_as_vectors()` function is used. + +```python +population_vectors = gacnn.population_as_vectors(population_networks=GACNN_instance.population_networks) +``` + +Such population of vectors is used as the initial population. + +```python +initial_population = population_vectors.copy() +``` + +After preparing the population weights as a set of vectors, next is to prepare 2 functions which are: + +1. Fitness function. +2. Callback function after each generation. + +## Prepare the Fitness Function + +The PyGAD library works by allowing the users to customize the genetic algorithm for their own problems. Because the problems differ in how the fitness values are calculated, then PyGAD allows the user to use a custom function as a maximization fitness function. This function must accept 2 positional parameters representing the following: + +- The solution. +- The solution index in the population. + +The fitness function must return a single number representing the fitness. The higher the fitness value, the better the solution. + +Here is the implementation of the fitness function for training a CNN. + +It uses the `pygad.cnn.predict()` function to predict the class labels based on the current solution's weights. The `pygad.cnn.predict()` function uses the trained weights available in the `trained_weights` attribute of each layer of the network for making predictions. + +Based on such predictions, the classification accuracy is calculated. This accuracy is used as the fitness value of the solution. Finally, the fitness value is returned. + +```python +def fitness_func(ga_instance, solution, sol_idx): + global GACNN_instance, data_inputs, data_outputs + + predictions = GACNN_instance.population_networks[sol_idx].predict(data_inputs=data_inputs) + correct_predictions = numpy.where(predictions == data_outputs)[0].size + solution_fitness = (correct_predictions/data_outputs.size)*100 + + return solution_fitness +``` + +## Prepare the Generation Callback Function + +After each generation of the genetic algorithm, the fitness function will be called to calculate the fitness value of each solution. Within the fitness function, the `pygad.cnn.predict()` function is used for predicting the outputs based on the current solution's `trained_weights` attribute. Thus, it is required that such an attribute is updated by weights evolved by the genetic algorithm after each generation. + +PyGAD has a parameter accepted by the `pygad.GA` class constructor named `on_generation`. It could be assigned to a function that is called after each generation. The function must accept a single parameter representing the instance of the `pygad.GA` class. + +This callback function can be used to update the `trained_weights` attribute of layers of each network in the population. + +Here is the implementation for a function that updates the `trained_weights` attribute of the layers of the population networks. + +It works by converting the current population from the vector form to the matric form using the `pygad.gacnn.population_as_matrices()` function. It accepts the population as vectors and returns it as matrices. + +The population matrices are then passed to the `update_population_trained_weights()` method in the `pygad.gacnn` module to update the `trained_weights` attribute of all layers for all solutions within the population. + +```python +def callback_generation(ga_instance): + global GACNN_instance, last_fitness + + population_matrices = gacnn.population_as_matrices(population_networks=GACNN_instance.population_networks, population_vectors=ga_instance.population) + GACNN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") +``` + +After preparing the fitness and callback function, next is to create an instance of the `pygad.GA` class. + +## Create an Instance of the `pygad.GA` Class + +Once the parameters of the genetic algorithm are prepared, an instance of the `pygad.GA` class can be created. Here is an example where the number of generations is 10. + +```python +import pygad + +num_parents_mating = 4 + +num_generations = 10 + +mutation_percent_genes = 5 + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + on_generation=callback_generation) +``` + +The last step for training the neural networks using the genetic algorithm is calling the `run()` method. + +## Run the Created Instance of the `pygad.GA` Class + +By calling the `run()` method from the `pygad.GA` instance, the genetic algorithm will iterate through the number of generations specified in its `num_generations` parameter. + +```python +ga_instance.run() +``` + +## Plot the Fitness Values + +After the `run()` method completes, the `plot_fitness()` method can be called to show how the fitness values evolve by generation. + +```python +ga_instance.plot_fitness() +``` + +![GACNN_Fitness](https://user-images.githubusercontent.com/16560492/83429675-ab744580-a434-11ea-8f21-9d3804b50d15.png) + +## Information about the Best Solution + +The following information about the best solution in the last population is returned using the `best_solution()` method in the `pygad.GA` class. + +- Solution +- Fitness value of the solution +- Index of the solution within the population + +Here is how such information is returned. + +```python +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") +``` + +``` +... +Fitness value of the best solution = 83.75 +Index of the best solution : 0 +Best fitness value reached after 4 generations. +``` + +## Making Predictions using the Trained Weights + +The `pygad.cnn.predict()` function can be used to make predictions using the trained network. As printed, the network is able to predict the labels correctly. + +```python +predictions = pygad.cnn.predict(last_layer=GANN_instance.population_networks[solution_idx], data_inputs=data_inputs) +print(f"Predictions of the trained network : {predictions}") +``` + +## Calculating Some Statistics + +Based on the predictions the network made, some statistics can be calculated such as the number of correct and wrong predictions in addition to the classification accuracy. + +```python +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +``` +Number of correct classifications : 67. +Number of wrong classifications : 13. +Classification accuracy : 83.75. +``` + +# Examples + +This section gives the complete code of some examples that build and train neural networks using the genetic algorithm. Each subsection builds a different network. + +## Image Classification + +This example is discussed in the **Steps to Build and Train CNN using Genetic Algorithm** section that builds the an image classifier. Its complete code is listed below. + +```python +import numpy +import pygad.cnn +import pygad.gacnn +import pygad + +""" +Convolutional neural network implementation using NumPy +A tutorial that helps to get started (Building Convolutional Neural Network using NumPy from Scratch) available in these links: + https://www.linkedin.com/pulse/building-convolutional-neural-network-using-numpy-from-ahmed-gad + https://towardsdatascience.com/building-convolutional-neural-network-using-numpy-from-scratch-b30aac50e50a + https://www.kdnuggets.com/2018/04/building-convolutional-neural-network-numpy-scratch.html +It is also translated into Chinese: http://m.aliyun.com/yunqi/articles/585741 +""" + +def fitness_func(ga_instance, solution, sol_idx): + global GACNN_instance, data_inputs, data_outputs + + predictions = GACNN_instance.population_networks[sol_idx].predict(data_inputs=data_inputs) + correct_predictions = numpy.where(predictions == data_outputs)[0].size + solution_fitness = (correct_predictions/data_outputs.size)*100 + + return solution_fitness + +def callback_generation(ga_instance): + global GACNN_instance, last_fitness + + population_matrices = pygad.gacnn.population_as_matrices(population_networks=GACNN_instance.population_networks, + population_vectors=ga_instance.population) + + GACNN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solutions_fitness}") + +data_inputs = numpy.load("dataset_inputs.npy") +data_outputs = numpy.load("dataset_outputs.npy") + +sample_shape = data_inputs.shape[1:] +num_classes = 4 + +data_inputs = data_inputs +data_outputs = data_outputs + +input_layer = pygad.cnn.Input2D(input_shape=sample_shape) +conv_layer1 = pygad.cnn.Conv2D(num_filters=2, + kernel_size=3, + previous_layer=input_layer, + activation_function="relu") +average_pooling_layer = pygad.cnn.AveragePooling2D(pool_size=5, + previous_layer=conv_layer1, + stride=3) + +flatten_layer = pygad.cnn.Flatten(previous_layer=average_pooling_layer) +dense_layer2 = pygad.cnn.Dense(num_neurons=num_classes, + previous_layer=flatten_layer, + activation_function="softmax") + +model = pygad.cnn.Model(last_layer=dense_layer2, + epochs=1, + learning_rate=0.01) + +model.summary() + + +GACNN_instance = pygad.gacnn.GACNN(model=model, + num_solutions=4) + +# GACNN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + +# population does not hold the numerical weights of the network instead it holds a list of references to each last layer of each network (i.e. solution) in the population. A solution or a network can be used interchangeably. +# If there is a population with 3 solutions (i.e. networks), then the population is a list with 3 elements. Each element is a reference to the last layer of each network. Using such a reference, all details of the network can be accessed. +population_vectors = pygad.gacnn.population_as_vectors(population_networks=GACNN_instance.population_networks) + +# To prepare the initial population, there are 2 ways: +# 1) Prepare it yourself and pass it to the initial_population parameter. This way is useful when the user wants to start the genetic algorithm with a custom initial population. +# 2) Assign valid integer values to the sol_per_pop and num_genes parameters. If the initial_population parameter exists, then the sol_per_pop and num_genes parameters are useless. +initial_population = population_vectors.copy() + +num_parents_mating = 2 # Number of solutions to be selected as parents in the mating pool. + +num_generations = 10 # Number of generations. + +mutation_percent_genes = 0.1 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists. + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + on_generation=callback_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness() + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +# Predicting the outputs of the data using the best solution. +predictions = GACNN_instance.population_networks[solution_idx].predict(data_inputs=data_inputs) +print(f"Predictions of the trained network : {predictions}") + +# Calculating some statistics +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` diff --git a/docs/md/gann.md b/docs/md/gann.md new file mode 100644 index 0000000..19fc73d --- /dev/null +++ b/docs/md/gann.md @@ -0,0 +1,970 @@ +# `pygad.gann` Module + +This section of the PyGAD's library documentation discusses the **pygad.gann** module. + +The `pygad.gann` module trains neural networks (for either classification or regression) using the genetic algorithm. It makes use of the 2 modules `pygad` and `pygad.nn`. + +# `pygad.gann.GANN` Class + +The `pygad.gann` module has a class named `pygad.gann.GANN` for training neural networks using the genetic algorithm. The constructor, methods, function, and attributes within the class are discussed in this section. + +## `__init__()` + +In order to train a neural network using the genetic algorithm, the first thing to do is to create an instance of the `pygad.gann.GANN` class. + +The `pygad.gann.GANN` class constructor accepts the following parameters: + +- `num_solutions`: Number of neural networks (i.e. solutions) in the population. Based on the value passed to this parameter, a number of identical neural networks are created where their parameters are optimized using the genetic algorithm. +- `num_neurons_input`: Number of neurons in the input layer. +- `num_neurons_output`: Number of neurons in the output layer. +- `num_neurons_hidden_layers=[]`: A list holding the number of neurons in the hidden layer(s). If empty `[]`, then no hidden layers are used. For each `int` value it holds, then a hidden layer is created with a number of hidden neurons specified by the corresponding `int` value. For example, `num_neurons_hidden_layers=[10]` creates a single hidden layer with **10** neurons. `num_neurons_hidden_layers=[10, 5]` creates 2 hidden layers with 10 neurons for the first and 5 neurons for the second hidden layer. +- `output_activation="softmax"`: The name of the activation function of the output layer which defaults to `"softmax"`. +- `hidden_activations="relu"`: The name(s) of the activation function(s) of the hidden layer(s). It defaults to `"relu"`. If passed as a string, this means the specified activation function will be used across all the hidden layers. If passed as a list, then it must have the same length as the length of the `num_neurons_hidden_layers` list. An exception is raised if their lengths are different. When `hidden_activations` is a list, a one-to-one mapping between the `num_neurons_hidden_layers` and `hidden_activations` lists occurs. + +In order to validate the parameters passed to the `pygad.gann.GANN` class constructor, the `pygad.gann.validate_network_parameters()` function is called. + +## Instance Attributes + +All the parameters in the `pygad.gann.GANN` class constructor are used as instance attributes. Besides such attributes, there are other attributes added to the instances from the `pygad.gann.GANN` class which are: + +- `parameters_validated`: If `True`, then the parameters passed to the GANN class constructor are valid. Its initial value is `False`. + +- `population_networks`: A list holding references to all the solutions (i.e. neural networks) used in the population. + +## Methods in the GANN Class + +This section discusses the methods available for instances of the `pygad.gann.GANN` class. + +### `create_population()` + +The `create_population()` method creates the initial population of the genetic algorithm as a list of neural networks (i.e. solutions). For each network to be created, the `pygad.gann.create_network()` function is called. + +Each element in the list holds a reference to the last (i.e. output) layer for the network. The method does not accept any parameter and it accesses all the required details from the `pygad.gann.GANN` instance. + +The method returns the list holding the references to the networks. This list is later assigned to the `population_networks` attribute of the instance. + +### `update_population_trained_weights()` + +The `update_population_trained_weights()` method updates the `trained_weights` attribute of the layers of each network (check the [documentation of the pygad.nn.DenseLayer class](https://github.com/ahmedfgad/NumPyANN#nndenselayer-class) for more information) according to the weights passed in the `population_trained_weights` parameter. + +Accepts the following parameters: + +- `population_trained_weights`: A list holding the trained weights of all networks as matrices. Such matrices are to be assigned to the `trained_weights` attribute of all layers of all networks. + +# Functions in the `pygad.gann` Module + +This section discusses the functions in the `pygad.gann` module. + +## `pygad.gann.validate_network_parameters()` +Validates the parameters passed to the constructor of the `pygad.gann.GANN` class. If at least one an invalid parameter exists, an exception is raised and the execution stops. + +The function accepts the same parameters passed to the constructor of the `pygad.gann.GANN` class. Please check the documentation of such parameters in the section discussing the class constructor. + +The reason why this function sets a default value to the `num_solutions` parameter is differentiating whether a population of networks or a single network is to be created. If `None`, then a single network will be created. If not `None`, then a population of networks is to be created. + +If the value passed to the `hidden_activations` parameter is a string, not a list, then a list is created by replicating the passed name of the activation function a number of times equal to the number of hidden layers (i.e. the length of the `num_neurons_hidden_layers` parameter). + +Returns a list holding the name(s) of the activation function(s) of the hidden layer(s). + +## `pygad.gann.create_network()` + +Creates a neural network as a linked list between the input, hidden, and output layers where the layer at index N (which is the last/output layer) references the layer at index N-1 (which is a hidden layer) using its previous_layer attribute. The input layer does not reference any layer because it is the last layer in the linked list. + +In addition to the `parameters_validated` parameter, this function accepts the same parameters passed to the constructor of the `pygad.gann.GANN` class except for the `num_solutions` parameter because only a single network is created out of the `create_network()` function. + +`parameters_validated`: If `False`, then the parameters are not validated and a call to the `validate_network_parameters()` function is made. + +Returns the reference to the last layer in the network architecture which is the output layer. Based on such a reference, all network layers can be fetched. + +## `pygad.gann.population_as_vectors()` + +Accepts the population as networks and returns a list holding all weights of the layers of each solution (i.e. network) in the population as a vector. + +For example, if the population has 6 solutions (i.e. networks), this function accepts references to such networks and returns a list with 6 vectors, one for each network (i.e. solution). Each vector holds the weights for all layers for a single network. + +Accepts the following parameters: + +- `population_networks`: A list holding references to the output (last) layers of the neural networks used in the population. + +Returns a list holding the weights vectors for all solutions (i.e. networks). + +## `pygad.gann.population_as_matrices()` + +Accepts the population as both networks and weights vectors and returns the weights of all layers of each solution (i.e. network) in the population as a matrix. + +For example, if the population has 6 solutions (i.e. networks), this function returns a list with 6 matrices, one for each network holding its weights for all layers. + +Accepts the following parameters: + +- `population_networks`: A list holding references to the output (last) layers of the neural networks used in the population. +- `population_vectors`: A list holding the weights of all networks as vectors. Such vectors are to be converted into matrices. + +Returns a list holding the weights matrices for all solutions (i.e. networks). + +# Steps to Build and Train Neural Networks using Genetic Algorithm + +The steps to use this project for building and training a neural network using the genetic algorithm are as follows: + +- Prepare the training data. +- Create an instance of the `pygad.gann.GANN` class. +- Fetch the population weights as vectors. +- Prepare the fitness function. +- Prepare the generation callback function. +- Create an instance of the `pygad.GA` class. +- Run the created instance of the `pygad.GA` class. +- Plot the Fitness Values +- Information about the best solution. +- Making predictions using the trained weights. +- Calculating some statistics. + +Let's start covering all of these steps. + +## Prepare the Training Data + +Before building and training neural networks, the training data (input and output) is to be prepared. The inputs and the outputs of the training data are NumPy arrays. + +Here is an example of preparing the training data for the XOR problem. + +For the input array, each element must be a list representing the inputs (i.e. features) for the sample. If there are 200 samples and each sample has 50 features, then the shape of the inputs array is `(200, 50)`. The variable `num_inputs` holds the length of each sample which is 2 in this example. + +```python +data_inputs = numpy.array([[1, 1], + [1, 0], + [0, 1], + [0, 0]]) + +data_outputs = numpy.array([0, + 1, + 1, + 0]) + +num_inputs = data_inputs.shape[1] +``` + +For the output array, each element must be a single number representing the class label of the sample. The class labels must start at `0`. So, if there are 200 samples, then the shape of the output array is `(200)`. If there are 5 classes in the data, then the values of all the 200 elements in the output array must range from 0 to 4 inclusive. Generally, the class labels start from `0` to `N-1` where `N` is the number of classes. + +For the XOR example, there are 2 classes and thus their labels are 0 and 1. The `num_classes` variable is assigned to 2. + +Note that the project only supports classification problems where each sample is assigned to only one class. + +## Create an Instance of the `pygad.gann.GANN` Class + +After preparing the input data, an instance of the `pygad.gann.GANN` class is created by passing the appropriate parameters. + +Here is an example that creates a network for the XOR problem. The `num_solutions` parameter is set to 6 which means the genetic algorithm population will have 6 solutions (i.e. networks). All of these 6 neural networks will have the same architectures as specified by the other parameters. + +The output layer has 2 neurons because there are only 2 classes (0 and 1). + +```python +import pygad.gann +import pygad.nn + +num_solutions = 6 +GANN_instance = pygad.gann.GANN(num_solutions=num_solutions, + num_neurons_input=num_inputs, + num_neurons_hidden_layers=[2], + num_neurons_output=2, + hidden_activations=["relu"], + output_activation="softmax") +``` + +The architecture of the created network has the following layers: + +- An input layer with 2 neurons (i.e. inputs) +- A single hidden layer with 2 neurons. +- An output layer with 2 neurons (i.e. classes). + +The weights of the network are as follows: + +- Between the input and the hidden layer, there is a weights matrix of size equal to `(number inputs x number of hidden neurons) = (2x2)`. +- Between the hidden and the output layer, there is a weights matrix of size equal to `(number of hidden neurons x number of outputs) = (2x2)`. + +The activation function used for the output layer is `softmax`. The `relu` activation function is used for the hidden layer. + +After creating the instance of the `pygad.gann.GANN` class next is to fetch the weights of the population as a list of vectors. + +## Fetch the Population Weights as Vectors + +For the genetic algorithm, the parameters (i.e. genes) of each solution are represented as a single vector. + +For the task of training the network for the XOR problem, the weights of each network in the population are not represented as a vector but 2 matrices each of size 2x2. + +To create a list holding the population weights as vectors, one for each network, the `pygad.gann.population_as_vectors()` function is used. + +```python +population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks) +``` + +After preparing the population weights as a set of vectors, next is to prepare 2 functions which are: + +1. Fitness function. +2. Callback function after each generation. + +## Prepare the Fitness Function + +The PyGAD library works by allowing the users to customize the genetic algorithm for their own problems. Because the problems differ in how the fitness values are calculated, then PyGAD allows the user to use a custom function as a maximization fitness function. This function must accept 2 positional parameters representing the following: + +- The solution. +- The solution index in the population. + +The fitness function must return a single number representing the fitness. The higher the fitness value, the better the solution. + +Here is the implementation of the fitness function for training a neural network. It uses the `pygad.nn.predict()` function to predict the class labels based on the current solution's weights. The `pygad.nn.predict()` function uses the trained weights available in the `trained_weights` attribute of each layer of the network for making predictions. + +Based on such predictions, the classification accuracy is calculated. This accuracy is used as the fitness value of the solution. Finally, the fitness value is returned. + +```python +def fitness_func(ga_instance, solution, sol_idx): + global GANN_instance, data_inputs, data_outputs + + predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], + data_inputs=data_inputs) + correct_predictions = numpy.where(predictions == data_outputs)[0].size + solution_fitness = (correct_predictions/data_outputs.size)*100 + + return solution_fitness +``` + +## Prepare the Generation Callback Function + +After each generation of the genetic algorithm, the fitness function will be called to calculate the fitness value of each solution. Within the fitness function, the `pygad.nn.predict()` function is used for predicting the outputs based on the current solution's `trained_weights` attribute. Thus, it is required that such an attribute is updated by weights evolved by the genetic algorithm after each generation. + +PyGAD 2.0.0 and higher has a new parameter accepted by the `pygad.GA` class constructor named `on_generation`. It could be assigned to a function that is called after each generation. The function must accept a single parameter representing the instance of the `pygad.GA` class. + +This callback function can be used to update the `trained_weights` attribute of layers of each network in the population. + +Here is the implementation for a function that updates the `trained_weights` attribute of the layers of the population networks. + +It works by converting the current population from the vector form to the matric form using the `pygad.gann.population_as_matrices()` function. It accepts the population as vectors and returns it as matrices. + +The population matrices are then passed to the `update_population_trained_weights()` method in the `pygad.gann` module to update the `trained_weights` attribute of all layers for all solutions within the population. + +```python +def callback_generation(ga_instance): + global GANN_instance + + population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, population_vectors=ga_instance.population) + GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") +``` + +After preparing the fitness and callback function, next is to create an instance of the `pygad.GA` class. + +## Create an Instance of the `pygad.GA` Class + +Once the parameters of the genetic algorithm are prepared, an instance of the `pygad.GA` class can be created. + +Here is an example. + +```python +initial_population = population_vectors.copy() + +num_parents_mating = 4 + +num_generations = 500 + +mutation_percent_genes = 5 + +parent_selection_type = "sss" + +crossover_type = "single_point" + +mutation_type = "random" + +keep_parents = 1 + +init_range_low = -2 +init_range_high = 5 + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + init_range_low=init_range_low, + init_range_high=init_range_high, + parent_selection_type=parent_selection_type, + crossover_type=crossover_type, + mutation_type=mutation_type, + keep_parents=keep_parents, + on_generation=callback_generation) +``` + +The last step for training the neural networks using the genetic algorithm is calling the `run()` method. + +## Run the Created Instance of the `pygad.GA` Class + +By calling the `run()` method from the `pygad.GA` instance, the genetic algorithm will iterate through the number of generations specified in its `num_generations` parameter. + +```python +ga_instance.run() +``` + +## Plot the Fitness Values + +After the `run()` method completes, the `plot_fitness()` method can be called to show how the fitness values evolve by generation. A fitness value (i.e. accuracy) of 100 is reached after around 180 generations. + +```python +ga_instance.plot_fitness() +``` + +![XOR_Fitness](https://user-images.githubusercontent.com/16560492/82078638-c11e0700-96e1-11ea-8aa9-c36761c5e9c7.png) + +By running the code again, a different initial population is created and thus a classification accuracy of 100 can be reached using a less number of generations. On the other hand, a different initial population might cause 100% accuracy to be reached using more generations or not reached at all. + +## Information about the Best Solution + +The following information about the best solution in the last population is returned using the `best_solution()` method in the `pygad.GA` class. + +- Solution +- Fitness value of the solution +- Index of the solution within the population + +Here is how such information is returned. The fitness value (i.e. accuracy) is 100. + +```python +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") +``` + +``` +Parameters of the best solution : [3.55081391 -3.21562011 -14.2617784 0.68044231 -1.41258145 -3.2979315 1.58136006 -7.83726169] +Fitness value of the best solution = 100.0 +Index of the best solution : 0 +``` + +Using the `best_solution_generation` attribute of the instance from the `pygad.GA` class, the generation number at which the **best fitness** is reached could be fetched. According to the result, the best fitness value is reached after 182 generations. + +```python +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") +``` + +``` +Best solution reached after 182 generations. +``` + +## Making Predictions using the Trained Weights + +The `pygad.nn.predict()` function can be used to make predictions using the trained network. As printed, the network is able to predict the labels correctly. + +```python +predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[solution_idx], data_inputs=data_inputs) +print(f"Predictions of the trained network : {predictions}") +``` + +``` +Predictions of the trained network : [0. 1. 1. 0.] +``` + +## Calculating Some Statistics + +Based on the predictions the network made, some statistics can be calculated such as the number of correct and wrong predictions in addition to the classification accuracy. + +```python +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +``` +Number of correct classifications : 4 +print("Number of wrong classifications : 0 +Classification accuracy : 100 +``` + +# Examples + +This section gives the complete code of some examples that build and train neural networks using the genetic algorithm. Each subsection builds a different network. + +## XOR Classification + +This example is discussed in the **Steps to Build and Train Neural Networks using Genetic Algorithm** section that builds the XOR gate and its complete code is listed below. + +```python +import numpy +import pygad +import pygad.nn +import pygad.gann + +def fitness_func(ga_instance, solution, sol_idx): + global GANN_instance, data_inputs, data_outputs + + # If adaptive mutation is used, sometimes sol_idx is None. + if sol_idx == None: + sol_idx = 1 + + predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], + data_inputs=data_inputs) + correct_predictions = numpy.where(predictions == data_outputs)[0].size + solution_fitness = (correct_predictions/data_outputs.size)*100 + + return solution_fitness + +def callback_generation(ga_instance): + global GANN_instance, last_fitness + + population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, + population_vectors=ga_instance.population) + + GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + print(f"Change = {ga_instance.best_solution()[1] - last_fitness}") + + last_fitness = ga_instance.best_solution()[1].copy() + +# Holds the fitness value of the previous generation. +last_fitness = 0 + +# Preparing the NumPy array of the inputs. +data_inputs = numpy.array([[1, 1], + [1, 0], + [0, 1], + [0, 0]]) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.array([0, + 1, + 1, + 0]) + +# The length of the input vector for each sample (i.e. number of neurons in the input layer). +num_inputs = data_inputs.shape[1] +# The number of neurons in the output layer (i.e. number of classes). +num_classes = 2 + +# Creating an initial population of neural networks. The return of the initial_population() function holds references to the networks, not their weights. Using such references, the weights of all networks can be fetched. +num_solutions = 6 # A solution or a network can be used interchangeably. +GANN_instance = pygad.gann.GANN(num_solutions=num_solutions, + num_neurons_input=num_inputs, + num_neurons_hidden_layers=[2], + num_neurons_output=num_classes, + hidden_activations=["relu"], + output_activation="softmax") + +# population does not hold the numerical weights of the network instead it holds a list of references to each last layer of each network (i.e. solution) in the population. A solution or a network can be used interchangeably. +# If there is a population with 3 solutions (i.e. networks), then the population is a list with 3 elements. Each element is a reference to the last layer of each network. Using such a reference, all details of the network can be accessed. +population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks) + +# To prepare the initial population, there are 2 ways: +# 1) Prepare it yourself and pass it to the initial_population parameter. This way is useful when the user wants to start the genetic algorithm with a custom initial population. +# 2) Assign valid integer values to the sol_per_pop and num_genes parameters. If the initial_population parameter exists, then the sol_per_pop and num_genes parameters are useless. +initial_population = population_vectors.copy() + +num_parents_mating = 4 # Number of solutions to be selected as parents in the mating pool. + +num_generations = 500 # Number of generations. + +mutation_percent_genes = [5, 10] # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists. + +parent_selection_type = "sss" # Type of parent selection. + +crossover_type = "single_point" # Type of the crossover operator. + +mutation_type = "adaptive" # Type of the mutation operator. + +keep_parents = 1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing. + +init_range_low = -2 +init_range_high = 5 + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + init_range_low=init_range_low, + init_range_high=init_range_high, + parent_selection_type=parent_selection_type, + crossover_type=crossover_type, + mutation_type=mutation_type, + keep_parents=keep_parents, + suppress_warnings=True, + on_generation=callback_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness() + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +# Predicting the outputs of the data using the best solution. +predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[solution_idx], + data_inputs=data_inputs) +print(f"Predictions of the trained network : {predictions}") + +# Calculating some statistics +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +## Image Classification + +In the documentation of the `pygad.nn` module, a neural network is created for classifying images from the Fruits360 dataset without being trained using an optimization algorithm. This section discusses how to train such a classifier using the genetic algorithm with the help of the `pygad.gann` module. + +Please make sure that the training data files [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy) and [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy) are available. For downloading them, use these links: + +1. [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy): The features https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy +2. [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy): The class labels https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy + +After the data is available, here is the complete code that builds and trains a neural network using the genetic algorithm for classifying images from 4 classes of the Fruits360 dataset. + +Because there are 4 classes, the output layer is assigned has 4 neurons according to the `num_neurons_output` parameter of the `pygad.gann.GANN` class constructor. + +```python +import numpy +import pygad +import pygad.nn +import pygad.gann + +def fitness_func(ga_instance, solution, sol_idx): + global GANN_instance, data_inputs, data_outputs + + predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], + data_inputs=data_inputs) + correct_predictions = numpy.where(predictions == data_outputs)[0].size + solution_fitness = (correct_predictions/data_outputs.size)*100 + + return solution_fitness + +def callback_generation(ga_instance): + global GANN_instance, last_fitness + + population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, + population_vectors=ga_instance.population) + + GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + print(f"Change = {ga_instance.best_solution()[1] - last_fitness}") + + last_fitness = ga_instance.best_solution()[1].copy() + +# Holds the fitness value of the previous generation. +last_fitness = 0 + +# Reading the input data. +data_inputs = numpy.load("dataset_features.npy") # Download from https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy + +# Optional step of filtering the input data using the standard deviation. +features_STDs = numpy.std(a=data_inputs, axis=0) +data_inputs = data_inputs[:, features_STDs>50] + +# Reading the output data. +data_outputs = numpy.load("outputs.npy") # Download from https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy + +# The length of the input vector for each sample (i.e. number of neurons in the input layer). +num_inputs = data_inputs.shape[1] +# The number of neurons in the output layer (i.e. number of classes). +num_classes = 4 + +# Creating an initial population of neural networks. The return of the initial_population() function holds references to the networks, not their weights. Using such references, the weights of all networks can be fetched. +num_solutions = 8 # A solution or a network can be used interchangeably. +GANN_instance = pygad.gann.GANN(num_solutions=num_solutions, + num_neurons_input=num_inputs, + num_neurons_hidden_layers=[150, 50], + num_neurons_output=num_classes, + hidden_activations=["relu", "relu"], + output_activation="softmax") + +# population does not hold the numerical weights of the network instead it holds a list of references to each last layer of each network (i.e. solution) in the population. A solution or a network can be used interchangeably. +# If there is a population with 3 solutions (i.e. networks), then the population is a list with 3 elements. Each element is a reference to the last layer of each network. Using such a reference, all details of the network can be accessed. +population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks) + +# To prepare the initial population, there are 2 ways: +# 1) Prepare it yourself and pass it to the initial_population parameter. This way is useful when the user wants to start the genetic algorithm with a custom initial population. +# 2) Assign valid integer values to the sol_per_pop and num_genes parameters. If the initial_population parameter exists, then the sol_per_pop and num_genes parameters are useless. +initial_population = population_vectors.copy() + +num_parents_mating = 4 # Number of solutions to be selected as parents in the mating pool. + +num_generations = 500 # Number of generations. + +mutation_percent_genes = 10 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists. + +parent_selection_type = "sss" # Type of parent selection. + +crossover_type = "single_point" # Type of the crossover operator. + +mutation_type = "random" # Type of the mutation operator. + +keep_parents = -1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing. + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + parent_selection_type=parent_selection_type, + crossover_type=crossover_type, + mutation_type=mutation_type, + keep_parents=keep_parents, + on_generation=callback_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness() + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +# Predicting the outputs of the data using the best solution. +predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[solution_idx], + data_inputs=data_inputs) +print(f"Predictions of the trained network : {predictions}") + +# Calculating some statistics +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +After training completes, here are the outputs of the print statements. The number of wrong classifications is only 1 and the accuracy is 99.949%. This accuracy is reached after 482 generations. + +``` +Fitness value of the best solution = 99.94903160040775 +Index of the best solution : 0 +Best fitness value reached after 482 generations. +Number of correct classifications : 1961. +Number of wrong classifications : 1. +Classification accuracy : 99.94903160040775. +``` + +The next figure shows how fitness value evolves by generation. + +![Training Neural Networks using Genetic Algorithm](https://user-images.githubusercontent.com/16560492/82152993-21898180-9865-11ea-8387-b995f88b83f7.png) + +## Regression Example 1 + +To train a neural network for regression, follow these instructions: + +1. Set the `output_activation` parameter in the constructor of the `pygad.gann.GANN` class to `"None"`. It is possible to use the ReLU function if all outputs are nonnegative. + +```python +GANN_instance = pygad.gann.GANN(... + output_activation="None") +``` + +2. Wherever the `pygad.nn.predict()` function is used, set the `problem_type` parameter to `"regression"`. + +```python +predictions = pygad.nn.predict(..., + problem_type="regression") +``` + +3. Design the fitness function to calculate the error (e.g. mean absolute error). + +```python +def fitness_func(ga_instance, solution, sol_idx): + ... + + predictions = pygad.nn.predict(..., + problem_type="regression") + + solution_fitness = 1.0/numpy.mean(numpy.abs(predictions - data_outputs)) + + return solution_fitness +``` + +The next code builds a complete example for building a neural network for regression. + +```python +import numpy +import pygad +import pygad.nn +import pygad.gann + +def fitness_func(ga_instance, solution, sol_idx): + global GANN_instance, data_inputs, data_outputs + + predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], + data_inputs=data_inputs, problem_type="regression") + solution_fitness = 1.0/numpy.mean(numpy.abs(predictions - data_outputs)) + + return solution_fitness + +def callback_generation(ga_instance): + global GANN_instance, last_fitness + + population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, + population_vectors=ga_instance.population) + + GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}") + print(f"Change = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1] - last_fitness}") + + last_fitness = ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1].copy() + +# Holds the fitness value of the previous generation. +last_fitness = 0 + +# Preparing the NumPy array of the inputs. +data_inputs = numpy.array([[2, 5, -3, 0.1], + [8, 15, 20, 13]]) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.array([[0.1, 0.2], + [1.8, 1.5]]) + +# The length of the input vector for each sample (i.e. number of neurons in the input layer). +num_inputs = data_inputs.shape[1] + +# Creating an initial population of neural networks. The return of the initial_population() function holds references to the networks, not their weights. Using such references, the weights of all networks can be fetched. +num_solutions = 6 # A solution or a network can be used interchangeably. +GANN_instance = pygad.gann.GANN(num_solutions=num_solutions, + num_neurons_input=num_inputs, + num_neurons_hidden_layers=[2], + num_neurons_output=2, + hidden_activations=["relu"], + output_activation="None") + +# population does not hold the numerical weights of the network instead it holds a list of references to each last layer of each network (i.e. solution) in the population. A solution or a network can be used interchangeably. +# If there is a population with 3 solutions (i.e. networks), then the population is a list with 3 elements. Each element is a reference to the last layer of each network. Using such a reference, all details of the network can be accessed. +population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks) + +# To prepare the initial population, there are 2 ways: +# 1) Prepare it yourself and pass it to the initial_population parameter. This way is useful when the user wants to start the genetic algorithm with a custom initial population. +# 2) Assign valid integer values to the sol_per_pop and num_genes parameters. If the initial_population parameter exists, then the sol_per_pop and num_genes parameters are useless. +initial_population = population_vectors.copy() + +num_parents_mating = 4 # Number of solutions to be selected as parents in the mating pool. + +num_generations = 500 # Number of generations. + +mutation_percent_genes = 5 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists. + +parent_selection_type = "sss" # Type of parent selection. + +crossover_type = "single_point" # Type of the crossover operator. + +mutation_type = "random" # Type of the mutation operator. + +keep_parents = 1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing. + +init_range_low = -1 +init_range_high = 1 + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + init_range_low=init_range_low, + init_range_high=init_range_high, + parent_selection_type=parent_selection_type, + crossover_type=crossover_type, + mutation_type=mutation_type, + keep_parents=keep_parents, + on_generation=callback_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness() + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness) +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +# Predicting the outputs of the data using the best solution. +predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[solution_idx], + data_inputs=data_inputs, + problem_type="regression") +print(f"Predictions of the trained network : {predictions}") + +# Calculating some statistics +abs_error = numpy.mean(numpy.abs(predictions - data_outputs)) +print(f"Absolute error : {abs_error}.") +``` + +The next figure shows how the fitness value changes for the generations used. + +![example_regression](https://user-images.githubusercontent.com/16560492/92948154-3cf24b00-f459-11ea-94ea-952b66ab2145.png) + +## Regression Example 2 - Fish Weight Prediction + +This example uses the Fish Market Dataset available at Kaggle (https://www.kaggle.com/aungpyaeap/fish-market). Simply download the CSV dataset from [this link](https://www.kaggle.com/aungpyaeap/fish-market/download) (https://www.kaggle.com/aungpyaeap/fish-market/download). The dataset is also available at the [GitHub project of the pygad.gann module](https://github.com/ahmedfgad/NeuralGenetic): https://github.com/ahmedfgad/NeuralGenetic + +Using the Pandas library, the dataset is read using the `read_csv()` function. + +```python +data = numpy.array(pandas.read_csv("Fish.csv")) +``` + +The last 5 columns in the dataset are used as inputs and the **Weight** column is used as output. + +```python +# Preparing the NumPy array of the inputs. +data_inputs = numpy.asarray(data[:, 2:], dtype=numpy.float32) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.asarray(data[:, 1], dtype=numpy.float32) # Fish Weight +``` + +Note how the activation function at the last layer is set to `"None"`. Moreover, the `problem_type` parameter in the `pygad.nn.train()` and `pygad.nn.predict()` functions is set to `"regression"`. Remember to design an appropriate fitness function for the regression problem. In this example, the fitness value is calculated based on the mean absolute error. + +```python +solution_fitness = 1.0/numpy.mean(numpy.abs(predictions - data_outputs)) +``` + +Here is the complete code. + +```python +import numpy +import pygad +import pygad.nn +import pygad.gann +import pandas + +def fitness_func(ga_instance, solution, sol_idx): + global GANN_instance, data_inputs, data_outputs + + predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], + data_inputs=data_inputs, problem_type="regression") + solution_fitness = 1.0/numpy.mean(numpy.abs(predictions - data_outputs)) + + return solution_fitness + +def callback_generation(ga_instance): + global GANN_instance, last_fitness + + population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, + population_vectors=ga_instance.population) + + GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices) + + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}") + print(f"Change = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1] - last_fitness}") + + last_fitness = ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1].copy() + +# Holds the fitness value of the previous generation. +last_fitness = 0 + +data = numpy.array(pandas.read_csv("../data/Fish.csv")) + +# Preparing the NumPy array of the inputs. +data_inputs = numpy.asarray(data[:, 2:], dtype=numpy.float32) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.asarray(data[:, 1], dtype=numpy.float32) + +# The length of the input vector for each sample (i.e. number of neurons in the input layer). +num_inputs = data_inputs.shape[1] + +# Creating an initial population of neural networks. The return of the initial_population() function holds references to the networks, not their weights. Using such references, the weights of all networks can be fetched. +num_solutions = 6 # A solution or a network can be used interchangeably. +GANN_instance = pygad.gann.GANN(num_solutions=num_solutions, + num_neurons_input=num_inputs, + num_neurons_hidden_layers=[2], + num_neurons_output=1, + hidden_activations=["relu"], + output_activation="None") + +# population does not hold the numerical weights of the network instead it holds a list of references to each last layer of each network (i.e. solution) in the population. A solution or a network can be used interchangeably. +# If there is a population with 3 solutions (i.e. networks), then the population is a list with 3 elements. Each element is a reference to the last layer of each network. Using such a reference, all details of the network can be accessed. +population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks) + +# To prepare the initial population, there are 2 ways: +# 1) Prepare it yourself and pass it to the initial_population parameter. This way is useful when the user wants to start the genetic algorithm with a custom initial population. +# 2) Assign valid integer values to the sol_per_pop and num_genes parameters. If the initial_population parameter exists, then the sol_per_pop and num_genes parameters are useless. +initial_population = population_vectors.copy() + +num_parents_mating = 4 # Number of solutions to be selected as parents in the mating pool. + +num_generations = 500 # Number of generations. + +mutation_percent_genes = 5 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists. + +parent_selection_type = "sss" # Type of parent selection. + +crossover_type = "single_point" # Type of the crossover operator. + +mutation_type = "random" # Type of the mutation operator. + +keep_parents = 1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing. + +init_range_low = -1 +init_range_high = 1 + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + mutation_percent_genes=mutation_percent_genes, + init_range_low=init_range_low, + init_range_high=init_range_high, + parent_selection_type=parent_selection_type, + crossover_type=crossover_type, + mutation_type=mutation_type, + keep_parents=keep_parents, + on_generation=callback_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness() + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness) +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +# Predicting the outputs of the data using the best solution. +predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[solution_idx], + data_inputs=data_inputs, + problem_type="regression") +print(f"Predictions of the trained network : {predictions}") + +# Calculating some statistics +abs_error = numpy.mean(numpy.abs(predictions - data_outputs)) +print(f"Absolute error : {abs_error}.") +``` + +The next figure shows how the fitness value changes for the 500 generations used. + +![example_regression_fish](https://user-images.githubusercontent.com/16560492/92948486-bbe78380-f459-11ea-9e31-0d4c7269d606.png) \ No newline at end of file diff --git a/docs/md/helper.md b/docs/md/helper.md new file mode 100644 index 0000000..39e67fe --- /dev/null +++ b/docs/md/helper.md @@ -0,0 +1,41 @@ +# `pygad.helper` Module + +This section of the PyGAD's library documentation discusses the `pygad.helper` module. + +The `pygad.helper` module has 2 submodules: + +1. `pygad.helper.unique`: A module of methods for creating unique genes. +2. `pygad.helper.misc`: A module of miscellaneous helper methods. + +## `pygad.helper.unique` Module + +The `pygad.helper.unique` module has a class named `Unique` with the following helper methods. Such methods help to check and fix duplicate values in the genes of a solution. + +1. `solve_duplicate_genes_randomly()`: Solves the duplicates in a solution by randomly selecting new values for the duplicating genes. +2. `solve_duplicate_genes_by_space()`: Solves the duplicates in a solution by selecting values for the duplicating genes from the gene space +3. `unique_int_gene_from_range()`: Finds a unique integer value for the gene out of a range defined by start and end points. +4. `unique_float_gene_from_range()`: Finds a unique float value for the gene out of a range defined by start and end points. +5. `select_unique_value()`: Selects a unique value (if possible) from a list of gene values. +6. `unique_genes_by_space()`: Loops through all the duplicating genes to find unique values that from their gene spaces to solve the duplicates. For each duplicating gene, a call to the `unique_gene_by_space()` is made. +7. `unique_gene_by_space()`: Returns a unique gene value for a single gene based on its value space to solve the duplicates. +8. `find_two_duplicates()`: Identifies the first occurrence of a duplicate gene in the solution. +9. `unpack_gene_space()`: Unpacks the gene space for selecting a value to resolve duplicates by converting ranges into lists of values. +10. `solve_duplicates_deeply()`: Sometimes it is impossible to solve the duplicate genes by simply randomly selecting another value for either genes. This function solve the duplicates between 2 genes by searching for a third gene that can make assist in the solution. + +## `pygad.helper.misc` Module + +The `pygad.helper.misc` module has a class called `Helper` with some methods to help in different stages of the GA pipeline. It is introduced in [PyGAD 3.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-5-0). + +1. `change_population_dtype_and_round()`: For each gene in the population, round the gene value and change the data type. +2. `change_gene_dtype_and_round()`: Round the change the data type of a single gene. +3. `mutation_change_gene_dtype_and_round()`: Decides whether mutation is done by replacement or not. Then it rounds and change the data type of the new gene value. +4. `validate_gene_constraint_callable_output()`: Validates the output of the user-defined callable/function that checks whether the gene constraint defined in the `gene_constraint` parameter is satisfied or not. +5. `get_gene_dtype()`: Returns the gene data type from the `gene_type` instance attribute. +6. `get_random_mutation_range()`: Returns the random mutation range using the `random_mutation_min_val` and `random_mutation_min_val` instance attributes. +7. `get_initial_population_range()`: Returns the initial population values range using the `init_range_low` and `init_range_high` instance attributes. +8. `generate_gene_value_from_space()`: Generates/selects a value for a gene using the `gene_space` instance attribute. +9. `generate_gene_value_randomly()`: Generates a random value for the gene. Only used if `gene_space` is `None`. +10. `generate_gene_value()`: Generates a value for the gene. It checks whether `gene_space` is `None` and calls either `generate_gene_value_randomly()` or `generate_gene_value_from_space()`. +11. `filter_gene_values_by_constraint()`: Receives a list of values for a gene. Then it filters such values using the gene constraint. +12. `get_valid_gene_constraint_values()`: Selects one valid gene value that satisfy the gene constraint. It simply calls `generate_gene_value()` to generate some gene values then it filters such values using `filter_gene_values_by_constraint()`. + diff --git a/docs/md/kerasga.md b/docs/md/kerasga.md new file mode 100644 index 0000000..cdf143a --- /dev/null +++ b/docs/md/kerasga.md @@ -0,0 +1,900 @@ +# `pygad.kerasga` Module + +This section of the PyGAD's library documentation discusses the [**pygad.kerasga**](https://pygad.readthedocs.io/en/latest/kerasga.html) module. + +The `pygad.kerarsga` module has helper a class and 2 functions to train Keras models using the genetic algorithm (PyGAD). The Keras model can be built either using the [Sequential Model](https://keras.io/guides/sequential_model) or the [Functional API](https://keras.io/guides/functional_api). + +The contents of this module are: + +1. `KerasGA`: A class for creating an initial population of all parameters in the Keras model. +2. `model_weights_as_vector()`: A function to reshape the Keras model weights to a single vector. +3. `model_weights_as_matrix()`: A function to restore the Keras model weights from a vector. +4. `predict()`: A function to make predictions based on the Keras model and a solution. + +More details are given in the next sections. + +# Steps Summary + +The summary of the steps used to train a Keras model using PyGAD is as follows: + +1. Create a Keras model. +2. Create an instance of the `pygad.kerasga.KerasGA` class. +4. Prepare the training data. +5. Build the fitness function. +6. Create an instance of the `pygad.GA` class. +8. Run the genetic algorithm. + +# Create Keras Model + +Before discussing training a Keras model using PyGAD, the first thing to do is to create the Keras model. + +According to the [Keras library documentation](https://keras.io/api/models), there are 3 ways to build a Keras model: + +1. [Sequential Model](https://keras.io/guides/sequential_model) + +2. [Functional API](https://keras.io/guides/functional_api) + +3. [Model Subclassing](https://keras.io/guides/model_subclassing) + +PyGAD supports training the models created either using the Sequential Model or the Functional API. + +Here is an example of a model created using the Sequential Model. + +```python +import tensorflow.keras + +input_layer = tensorflow.keras.layers.Input(3) +dense_layer1 = tensorflow.keras.layers.Dense(5, activation="relu") +output_layer = tensorflow.keras.layers.Dense(1, activation="linear") + +model = tensorflow.keras.Sequential() +model.add(input_layer) +model.add(dense_layer1) +model.add(output_layer) +``` + +This is the same model created using the Functional API. + +```python +input_layer = tensorflow.keras.layers.Input(3) +dense_layer1 = tensorflow.keras.layers.Dense(5, activation="relu")(input_layer) +output_layer = tensorflow.keras.layers.Dense(1, activation="linear")(dense_layer1) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) +``` + +Feel free to add the layers of your choice. + +# `pygad.kerasga.KerasGA` Class + +The `pygad.kerasga` module has a class named `KerasGA` for creating an initial population for the genetic algorithm based on a Keras model. The constructor, methods, and attributes within the class are discussed in this section. + +## `__init__()` + +The `pygad.kerasga.KerasGA` class constructor accepts the following parameters: + +- `model`: An instance of the Keras model. +- `num_solutions`: Number of solutions in the population. Each solution has different parameters of the model. + +## Instance Attributes + +All parameters in the `pygad.kerasga.KerasGA` class constructor are used as instance attributes in addition to adding a new attribute called `population_weights`. + +Here is a list of all instance attributes: + +- `model` +- `num_solutions` +- `population_weights`: A nested list holding the weights of all solutions in the population. + +## Methods in the `KerasGA` Class + +This section discusses the methods available for instances of the `pygad.kerasga.KerasGA` class. + +### `create_population()` + +The `create_population()` method creates the initial population of the genetic algorithm as a list of solutions where each solution represents different model parameters. The list of networks is assigned to the `population_weights` attribute of the instance. + +# Functions in the `pygad.kerasga` Module + +This section discusses the functions in the `pygad.kerasga` module. + +## `pygad.kerasga.model_weights_as_vector()` + +The `model_weights_as_vector()` function accepts a single parameter named `model` representing the Keras model. It returns a vector holding all model weights. The reason for representing the model weights as a vector is that the genetic algorithm expects all parameters of any solution to be in a 1D vector form. + +This function filters the layers based on the `trainable` attribute to see whether the layer weights are trained or not. For each layer, if its `trainable=False`, then its weights will not be evolved using the genetic algorithm. Otherwise, it will be represented in the chromosome and evolved. + +The function accepts the following parameters: + +- `model`: The Keras model. + +It returns a 1D vector holding the model weights. + +## `pygad.kerasga.model_weights_as_matrix()` + +The `model_weights_as_matrix()` function accepts the following parameters: + +1. `model`: The Keras model. +2. `weights_vector`: The model parameters as a vector. + +It returns the restored model weights after reshaping the vector. + +## `pygad.kerasga.predict()` + +The `predict()` function makes a prediction based on a solution. It accepts the following parameters: + +1. `model`: The Keras model. +2. `solution`: The solution evolved. +3. `data`: The test data inputs. +4. `batch_size=None`: The batch size (i.e. number of samples per step or batch). +5. `verbose=None`: Verbosity mode. +6. `steps=None`: The total number of steps (batches of samples). + +Check documentation of the [Keras Model.predict()](https://keras.io/api/models/model_training_apis) method for more information about the `batch_size`, `verbose`, and `steps` parameters. + +It returns the predictions of the data samples. + +# Examples + +This section gives the complete code of some examples that build and train a Keras model using PyGAD. Each subsection builds a different network. + +## Example 1: Regression Example + +The next code builds a simple Keras model for regression. The next subsections discuss each part in the code. + +```python +import tensorflow.keras +import pygad.kerasga +import numpy +import pygad + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, keras_ga, model + + predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) + + mae = tensorflow.keras.losses.MeanAbsoluteError() + abs_error = mae(data_outputs, predictions).numpy() + 0.00000001 + solution_fitness = 1.0/abs_error + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +input_layer = tensorflow.keras.layers.Input(3) +dense_layer1 = tensorflow.keras.layers.Dense(5, activation="relu")(input_layer) +output_layer = tensorflow.keras.layers.Dense(1, activation="linear")(dense_layer1) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) + +keras_ga = pygad.kerasga.KerasGA(model=model, + num_solutions=10) + +# Data inputs +data_inputs = numpy.array([[0.02, 0.1, 0.15], + [0.7, 0.6, 0.8], + [1.5, 1.2, 1.7], + [3.2, 2.9, 3.1]]) + +# Data outputs +data_outputs = numpy.array([[0.1], + [0.6], + [1.3], + [2.5]]) + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 250 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = keras_ga.population_weights # Initial population of network weights + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make prediction based on the best solution. +predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) +print(f"Predictions : \n{predictions}") + +mae = tensorflow.keras.losses.MeanAbsoluteError() +abs_error = mae(data_outputs, predictions).numpy() +print(f"Absolute Error : {abs_error}") +``` + +### Create a Keras Model + +According to the steps mentioned previously, the first step is to create a Keras model. Here is the code that builds the model using the Functional API. + +```python +import tensorflow.keras + +input_layer = tensorflow.keras.layers.Input(3) +dense_layer1 = tensorflow.keras.layers.Dense(5, activation="relu")(input_layer) +output_layer = tensorflow.keras.layers.Dense(1, activation="linear")(dense_layer1) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) +``` + +The model can also be build using the Keras Sequential Model API. + +```python +input_layer = tensorflow.keras.layers.Input(3) +dense_layer1 = tensorflow.keras.layers.Dense(5, activation="relu") +output_layer = tensorflow.keras.layers.Dense(1, activation="linear") + +model = tensorflow.keras.Sequential() +model.add(input_layer) +model.add(dense_layer1) +model.add(output_layer) +``` + +### Create an Instance of the `pygad.kerasga.KerasGA` Class + +The second step is to create an instance of the `pygad.kerasga.KerasGA` class. There are 10 solutions per population. Change this number according to your needs. + +```python +import pygad.kerasga + +keras_ga = pygad.kerasga.KerasGA(model=model, + num_solutions=10) +``` + +### Prepare the Training Data + +The third step is to prepare the training data inputs and outputs. Here is an example where there are 4 samples. Each sample has 3 inputs and 1 output. + +```python +import numpy + +# Data inputs +data_inputs = numpy.array([[0.02, 0.1, 0.15], + [0.7, 0.6, 0.8], + [1.5, 1.2, 1.7], + [3.2, 2.9, 3.1]]) + +# Data outputs +data_outputs = numpy.array([[0.1], + [0.6], + [1.3], + [2.5]]) +``` + +### Build the Fitness Function + +The fourth step is to build the fitness function. This function must accept 2 parameters representing the solution and its index within the population. + +The next fitness function returns the model predictions based on the current solution using the `predict()` function. Then, it calculates the mean absolute error (MAE) of the Keras model based on the parameters in the solution. The reciprocal of the MAE is used as the fitness value. Feel free to use any other loss function to calculate the fitness value. + +```python +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, keras_ga, model + + predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) + + mae = tensorflow.keras.losses.MeanAbsoluteError() + abs_error = mae(data_outputs, predictions).numpy() + 0.00000001 + solution_fitness = 1.0/abs_error + + return solution_fitness +``` + +### Create an Instance of the `pygad.GA` Class + +The fifth step is to instantiate the `pygad.GA` class. Note how the `initial_population` parameter is assigned to the initial weights of the Keras models. + +For more information, please check the [parameters this class accepts](https://pygad.readthedocs.io/en/latest/pygad.html#init). + +```python +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 250 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = keras_ga.population_weights # Initial population of network weights + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) +``` + +### Run the Genetic Algorithm + +The sixth and last step is to run the genetic algorithm by calling the `run()` method. + +```python +ga_instance.run() +``` + +After the PyGAD completes its execution, then there is a figure that shows how the fitness value changes by generation. Call the `plot_fitness()` method to show the figure. + +```python +ga_instance.plot_fitness(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4) +``` + +Here is the figure. + +![pygad_keras_image_regression](https://user-images.githubusercontent.com/16560492/93722638-ac261880-fb98-11ea-95d3-e773deb034f4.png) + +To get information about the best solution found by PyGAD, use the `best_solution()` method. + +```python +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") +``` + +```python +Fitness value of the best solution = 72.77768757825352 +Index of the best solution : 0 +``` + +The next code makes prediction using the `predict()` function to return the model predictions based on the best solution. + +```python +# Fetch the parameters of the best solution. +predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) +print(f"Predictions : \n{predictions}") +``` + +```python +Predictions : +[[0.09935353] + [0.63082725] + [1.2765523 ] + [2.4999595 ]] +``` + +The next code measures the trained model error. + +```python +mae = tensorflow.keras.losses.MeanAbsoluteError() +abs_error = mae(data_outputs, predictions).numpy() +print(f"Absolute Error : {abs_error}") +``` + +``` +Absolute Error : 0.013740465 +``` + +## Example 2: XOR Binary Classification + +The next code creates a Keras model to build the XOR binary classification problem. Let's highlight the changes compared to the previous example. + +```python +import tensorflow.keras +import pygad.kerasga +import numpy +import pygad + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, keras_ga, model + + predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) + + bce = tensorflow.keras.losses.BinaryCrossentropy() + solution_fitness = 1.0 / (bce(data_outputs, predictions).numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Build the keras model using the functional API. +input_layer = tensorflow.keras.layers.Input(2) +dense_layer = tensorflow.keras.layers.Dense(4, activation="relu")(input_layer) +output_layer = tensorflow.keras.layers.Dense(2, activation="softmax")(dense_layer) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) + +# Create an instance of the pygad.kerasga.KerasGA class to build the initial population. +keras_ga = pygad.kerasga.KerasGA(model=model, + num_solutions=10) + +# XOR problem inputs +data_inputs = numpy.array([[0, 0], + [0, 1], + [1, 0], + [1, 1]]) + +# XOR problem outputs +data_outputs = numpy.array([[1, 0], + [0, 1], + [0, 1], + [1, 0]]) + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 250 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = keras_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make predictions based on the best solution. +predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) +print(f"Predictions : \n{predictions}") + +# Calculate the binary crossentropy for the trained model. +bce = tensorflow.keras.losses.BinaryCrossentropy() +print("Binary Crossentropy : ", bce(data_outputs, predictions).numpy()) + +# Calculate the classification accuracy for the trained model. +ba = tensorflow.keras.metrics.BinaryAccuracy() +ba.update_state(data_outputs, predictions) +accuracy = ba.result().numpy() +print(f"Accuracy : {accuracy}") +``` + +Compared to the previous regression example, here are the changes: + +* The Keras model is changed according to the nature of the problem. Now, it has 2 inputs and 2 outputs with an in-between hidden layer of 4 neurons. + +```python +# Build the keras model using the functional API. +input_layer = tensorflow.keras.layers.Input(2) +dense_layer = tensorflow.keras.layers.Dense(4, activation="relu")(input_layer) +output_layer = tensorflow.keras.layers.Dense(2, activation="softmax")(dense_layer) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) +``` + +* The train data is changed. Note that the output of each sample is a 1D vector of 2 values, 1 for each class. + +```python +# XOR problem inputs +data_inputs = numpy.array([[0, 0], + [0, 1], + [1, 0], + [1, 1]]) + +# XOR problem outputs +data_outputs = numpy.array([[1, 0], + [0, 1], + [0, 1], + [1, 0]]) +``` + +* The fitness value is calculated based on the binary cross entropy. + +```python +bce = tensorflow.keras.losses.BinaryCrossentropy() +solution_fitness = 1.0 / (bce(data_outputs, predictions).numpy() + 0.00000001) +``` + +After the previous code completes, the next figure shows how the fitness value change by generation. + +![pygad_keras_image_classification_XOR](https://user-images.githubusercontent.com/16560492/93722639-b811da80-fb98-11ea-8951-f13a7a266c04.png) + +Here is some information about the trained model. Its fitness value is `739.24`, loss is `0.0013527311` and accuracy is 100%. + +```python +Fitness value of the best solution = 739.2397344644013 +Index of the best solution : 7 + +Predictions : +[[9.9694413e-01 3.0558957e-03] + [5.0176249e-04 9.9949825e-01] + [1.8470541e-03 9.9815291e-01] + [9.9999976e-01 2.0538971e-07]] + +Binary Crossentropy : 0.0013527311 + +Accuracy : 1.0 +``` + +## Example 3: Image Multi-Class Classification (Dense Layers) + +Here is the code. + +```python +import tensorflow.keras +import pygad.kerasga +import numpy +import pygad + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, keras_ga, model + + predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) + + cce = tensorflow.keras.losses.CategoricalCrossentropy() + solution_fitness = 1.0 / (cce(data_outputs, predictions).numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Build the keras model using the functional API. +input_layer = tensorflow.keras.layers.Input(360) +dense_layer = tensorflow.keras.layers.Dense(50, activation="relu")(input_layer) +output_layer = tensorflow.keras.layers.Dense(4, activation="softmax")(dense_layer) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) + +# Create an instance of the pygad.kerasga.KerasGA class to build the initial population. +keras_ga = pygad.kerasga.KerasGA(model=model, + num_solutions=10) + +# Data inputs +data_inputs = numpy.load("../data/dataset_features.npy") + +# Data outputs +data_outputs = numpy.load("../data/outputs.npy") +data_outputs = tensorflow.keras.utils.to_categorical(data_outputs) + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 100 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = keras_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make predictions based on the best solution. +predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) +# print(f"Predictions : \n{predictions}") + +# Calculate the categorical crossentropy for the trained model. +cce = tensorflow.keras.losses.CategoricalCrossentropy() +print(f"Categorical Crossentropy : {cce(data_outputs, predictions).numpy()}") + +# Calculate the classification accuracy for the trained model. +ca = tensorflow.keras.metrics.CategoricalAccuracy() +ca.update_state(data_outputs, predictions) +accuracy = ca.result().numpy() +print(f"Accuracy : {accuracy}") +``` + +Compared to the previous binary classification example, this example has multiple classes (4) and thus the loss is measured using categorical cross entropy. + +```python +cce = tensorflow.keras.losses.CategoricalCrossentropy() +solution_fitness = 1.0 / (cce(data_outputs, predictions).numpy() + 0.00000001) +``` + +### Prepare the Training Data + +Before building and training neural networks, the training data (input and output) needs to be prepared. The inputs and the outputs of the training data are NumPy arrays. + +The data used in this example is available as 2 files: + +1. [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy): Data inputs. https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy +2. [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy): Class labels. https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy + +The data consists of 4 classes of images. The image shape is `(100, 100, 3)`. The number of training samples is 1962. The feature vector extracted from each image has a length 360. + +Simply download these 2 files and read them according to the next code. Note that the class labels are one-hot encoded using the `tensorflow.keras.utils.to_categorical()` function. + +```python +import numpy + +data_inputs = numpy.load("../data/dataset_features.npy") + +data_outputs = numpy.load("../data/outputs.npy") +data_outputs = tensorflow.keras.utils.to_categorical(data_outputs) +``` + +The next figure shows how the fitness value changes. + +![pygad_keras_image_classification](https://user-images.githubusercontent.com/16560492/93722649-c2cc6f80-fb98-11ea-96e7-3f6ce3cfe1cf.png) + +Here are some statistics about the trained model. + +``` +Fitness value of the best solution = 4.197464252185969 +Index of the best solution : 0 +Categorical Crossentropy : 0.23823906 +Accuracy : 0.9852192 +``` + +## Example 4: Image Multi-Class Classification (Conv Layers) + +Compared to the previous example that uses only dense layers, this example uses convolutional layers to classify the same dataset. + +Here is the complete code. + +```python +import tensorflow.keras +import pygad.kerasga +import numpy +import pygad + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, keras_ga, model + + predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) + + cce = tensorflow.keras.losses.CategoricalCrossentropy() + solution_fitness = 1.0 / (cce(data_outputs, predictions).numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Build the keras model using the functional API. +input_layer = tensorflow.keras.layers.Input(shape=(100, 100, 3)) +conv_layer1 = tensorflow.keras.layers.Conv2D(filters=5, + kernel_size=7, + activation="relu")(input_layer) +max_pool1 = tensorflow.keras.layers.MaxPooling2D(pool_size=(5,5), + strides=5)(conv_layer1) +conv_layer2 = tensorflow.keras.layers.Conv2D(filters=3, + kernel_size=3, + activation="relu")(max_pool1) +flatten_layer = tensorflow.keras.layers.Flatten()(conv_layer2) +dense_layer = tensorflow.keras.layers.Dense(15, activation="relu")(flatten_layer) +output_layer = tensorflow.keras.layers.Dense(4, activation="softmax")(dense_layer) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) + +# Create an instance of the pygad.kerasga.KerasGA class to build the initial population. +keras_ga = pygad.kerasga.KerasGA(model=model, + num_solutions=10) + +# Data inputs +data_inputs = numpy.load("../data/dataset_inputs.npy") + +# Data outputs +data_outputs = numpy.load("../data/dataset_outputs.npy") +data_outputs = tensorflow.keras.utils.to_categorical(data_outputs) + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 200 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = keras_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make predictions based on the best solution. +predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=data_inputs) +# print(f"Predictions : \n{predictions}") + +# Calculate the categorical crossentropy for the trained model. +cce = tensorflow.keras.losses.CategoricalCrossentropy() +print(f"Categorical Crossentropy : {cce(data_outputs, predictions).numpy()}") + +# Calculate the classification accuracy for the trained model. +ca = tensorflow.keras.metrics.CategoricalAccuracy() +ca.update_state(data_outputs, predictions) +accuracy = ca.result().numpy() +print(f"Accuracy : {accuracy}") +``` + +Compared to the previous example, the only change is that the architecture uses convolutional and max-pooling layers. The shape of each input sample is 100x100x3. + +```python +# Build the keras model using the functional API. +input_layer = tensorflow.keras.layers.Input(shape=(100, 100, 3)) +conv_layer1 = tensorflow.keras.layers.Conv2D(filters=5, + kernel_size=7, + activation="relu")(input_layer) +max_pool1 = tensorflow.keras.layers.MaxPooling2D(pool_size=(5,5), + strides=5)(conv_layer1) +conv_layer2 = tensorflow.keras.layers.Conv2D(filters=3, + kernel_size=3, + activation="relu")(max_pool1) +flatten_layer = tensorflow.keras.layers.Flatten()(conv_layer2) +dense_layer = tensorflow.keras.layers.Dense(15, activation="relu")(flatten_layer) +output_layer = tensorflow.keras.layers.Dense(4, activation="softmax")(dense_layer) + +model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer) +``` + +### Prepare the Training Data + +The data used in this example is available as 2 files: + +1. [dataset_inputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy): Data inputs. https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy +2. [dataset_outputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy): Class labels. https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy + +The data consists of 4 classes of images. The image shape is `(100, 100, 3)` and there are 20 images per class for a total of 80 training samples. For more information about the dataset, check the [Reading the Data](https://pygad.readthedocs.io/en/latest/cnn.html#reading-the-data) section of the `pygad.cnn` module. + +Simply download these 2 files and read them according to the next code. Note that the class labels are one-hot encoded using the `tensorflow.keras.utils.to_categorical()` function. + +```python +import numpy + +data_inputs = numpy.load("../data/dataset_inputs.npy") + +data_outputs = numpy.load("../data/dataset_outputs.npy") +data_outputs = tensorflow.keras.utils.to_categorical(data_outputs) +``` + +The next figure shows how the fitness value changes. + +![pygad_keras_image_classification_Conv](https://user-images.githubusercontent.com/16560492/93722654-cc55d780-fb98-11ea-8f95-7b65dc67f5c8.png) + +Here are some statistics about the trained model. The model accuracy is 75% after the 200 generations. Note that just running the code again may give different results. + +``` +Fitness value of the best solution = 2.7462310258668805 +Index of the best solution : 0 +Categorical Crossentropy : 0.3641354 +Accuracy : 0.75 +``` + +To improve the model performance, you can do the following: + +- Add more layers +- Modify the existing layers. +- Use different parameters for the layers. +- Use different parameters for the genetic algorithm (e.g. number of solution, number of generations, etc) + +## Example 5: Image Classification using Data Generator + +This example uses the image data generator `tensorflow.keras.preprocessing.image.ImageDataGenerator` to feed data to the model. Instead of reading all the data in the memory, the data generator generates the data needed by the model and only save it in the memory instead of saving all the data. This frees the memory but adds more computational time. + +```python +import tensorflow as tf +import tensorflow.keras +import pygad.kerasga +import pygad + +def fitness_func(ga_instanse, solution, sol_idx): + global train_generator, data_outputs, keras_ga, model + + predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=train_generator) + + cce = tensorflow.keras.losses.CategoricalCrossentropy() + solution_fitness = 1.0 / (cce(data_outputs, predictions).numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print("Generation = {ga_instance.generations_completed}") + print("Fitness = {ga_instance.best_solution(ga_instance.last_generation_fitness)[1]}") + +# The dataset path. +dataset_path = r'../data/Skin_Cancer_Dataset' + +num_classes = 2 +img_size = 224 + +# Create a simple CNN. This does not gurantee high classification accuracy. +model = tf.keras.models.Sequential() +model.add(tf.keras.layers.Input(shape=(img_size, img_size, 3))) +model.add(tf.keras.layers.Conv2D(32, (3,3), activation="relu", padding="same")) +model.add(tf.keras.layers.MaxPooling2D((2, 2))) +model.add(tf.keras.layers.Flatten()) +model.add(tf.keras.layers.Dropout(rate=0.2)) +model.add(tf.keras.layers.Dense(num_classes, activation="softmax")) + +# Create an instance of the pygad.kerasga.KerasGA class to build the initial population. +keras_ga = pygad.kerasga.KerasGA(model=model, + num_solutions=10) + +data_generator = tf.keras.preprocessing.image.ImageDataGenerator() +train_generator = data_generator.flow_from_directory(dataset_path, + class_mode='categorical', + target_size=(224, 224), + batch_size=32, + shuffle=False) +# train_generator.class_indices +data_outputs = tf.keras.utils.to_categorical(train_generator.labels) + +# Check the documentation for more information about the parameters: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +initial_population = keras_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=10, + num_parents_mating=5, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & Keras - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness) +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +predictions = pygad.kerasga.predict(model=model, + solution=solution, + data=train_generator) +# print(f"Predictions : \n{predictions}") + +# Calculate the categorical crossentropy for the trained model. +cce = tensorflow.keras.losses.CategoricalCrossentropy() +print(f"Categorical Crossentropy : {cce(data_outputs, predictions).numpy()}") + +# Calculate the classification accuracy for the trained model. +ca = tensorflow.keras.metrics.CategoricalAccuracy() +ca.update_state(data_outputs, predictions) +accuracy = ca.result().numpy() +print(f"Accuracy : {accuracy}") +``` + + + diff --git a/docs/md/nn.md b/docs/md/nn.md new file mode 100644 index 0000000..8fcd8b5 --- /dev/null +++ b/docs/md/nn.md @@ -0,0 +1,680 @@ +# `pygad.nn` Module + +This section of the PyGAD's library documentation discusses the **pygad.nn** module. + +Using the **pygad.nn** module, artificial neural networks are created. The purpose of this module is to only implement the **forward pass** of a neural network without using a training algorithm. The **pygad.nn** module builds the network layers, implements the activations functions, trains the network, makes predictions, and more. + +Later, the **pygad.gann** module is used to train the **pygad.nn** network using the genetic algorithm built in the **pygad** module. + +Starting from [PyGAD 2.7.1](https://pygad.readthedocs.io/en/latest/Footer.html#pygad-2-7-1), the **pygad.nn** module supports both classification and regression problems. For more information, check the `problem_type` parameter in the `pygad.nn.train()` and `pygad.nn.predict()` functions. + +# Supported Layers + +Each layer supported by the **pygad.nn** module has a corresponding class. The layers and their classes are: + +1. **Input**: Implemented using the `pygad.nn.InputLayer` class. + +2. **Dense** (Fully Connected): Implemented using the `pygad.nn.DenseLayer` class. + +In the future, more layers will be added. The next subsections discuss such layers. + +## `pygad.nn.InputLayer` Class + +The `pygad.nn.InputLayer` class creates the input layer for the neural network. For each network, there is only a single input layer. The network architecture must start with an input layer. + +This class has no methods or class attributes. All it has is a constructor that accepts a parameter named `num_neurons` representing the number of neurons in the input layer. + +An instance attribute named `num_neurons` is created within the constructor to keep such a number. Here is an example of building an input layer with 20 neurons. + +```python +input_layer = pygad.nn.InputLayer(num_neurons=20) +``` + +Here is how the single attribute `num_neurons` within the instance of the `pygad.nn.InputLayer` class can be accessed. + +```python +num_input_neurons = input_layer.num_neurons + +print("Number of input neurons =", num_input_neurons) +``` + +This is everything about the input layer. + +## `pygad.nn.DenseLayer` Class + +Using the `pygad.nn.DenseLayer` class, dense (fully-connected) layers can be created. To create a dense layer, just create a new instance of the class. The constructor accepts the following parameters: + +- `num_neurons`: Number of neurons in the dense layer. +- `previous_layer`: A reference to the previous layer. Using the `previous_layer` attribute, a linked list is created that connects all network layers. +- `activation_function`: A string representing the activation function to be used in this layer. Defaults to `"sigmoid"`. Currently, the supported values for the activation functions are `"sigmoid"`, `"relu"`, `"softmax"` (supported in PyGAD 2.3.0 and higher), and `"None"` (supported in PyGAD 2.7.0 and higher). When a layer has its activation function set to `"None"`, then it means no activation function is applied. For a **regression problem**, set the activation function of the output (last) layer to `"None"`. If all outputs in the regression problem are nonnegative, then it is possible to use the ReLU function in the output layer. + +Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: + +- `initial_weights`: The initial weights for the dense layer. +- `trained_weights`: The trained weights of the dense layer. This attribute is initialized by the value in the `initial_weights` attribute. + +Here is an example for creating a dense layer with 12 neurons. Note that the `previous_layer` parameter is assigned to the input layer `input_layer`. + +```python +dense_layer = pygad.nn.DenseLayer(num_neurons=12, + previous_layer=input_layer, + activation_function="relu") +``` + +Here is how to access some attributes in the dense layer: + +```python +num_dense_neurons = dense_layer.num_neurons +dense_initail_weights = dense_layer.initial_weights + +print("Number of dense layer attributes =", num_dense_neurons) +print("Initial weights of the dense layer :", dense_initail_weights) +``` + +Because `dense_layer` holds a reference to the input layer, then the number of input neurons can be accessed. + +```python +input_layer = dense_layer.previous_layer +num_input_neurons = input_layer.num_neurons + +print("Number of input neurons =", num_input_neurons) +``` + +Here is another dense layer. This dense layer's `previous_layer` attribute points to the previously created dense layer. + +```python +dense_layer2 = pygad.nn.DenseLayer(num_neurons=5, + previous_layer=dense_layer, + activation_function="relu") +``` + +Because `dense_layer2` holds a reference to `dense_layer` in its `previous_layer` attribute, then the number of neurons in `dense_layer` can be accessed. + +```python +dense_layer = dense_layer2.previous_layer +dense_layer_neurons = dense_layer.num_neurons + +print("Number of dense neurons =", num_input_neurons) +``` + +After getting the reference to `dense_layer`, we can use it to access the number of input neurons. + +```python +dense_layer = dense_layer2.previous_layer +input_layer = dense_layer.previous_layer +num_input_neurons = input_layer.num_neurons + +print("Number of input neurons =", num_input_neurons) +``` + +Assuming that `dense_layer2` is the last dense layer, then it is regarded as the output layer. + +### `previous_layer` Attribute + +The `previous_layer` attribute in the `pygad.nn.DenseLayer` class creates a one way linked list between all the layers in the network architecture as described by the next figure. + +The last (output) layer indexed N points to layer **N-1**, layer **N-1** points to the layer **N-2**, the layer **N-2** points to the layer **N-3**, and so on until reaching the end of the linked list which is layer 1 (input layer). + +![Layers Linked List](https://user-images.githubusercontent.com/16560492/81918975-816af880-95d7-11ea-83e3-34d14c3316db.jpg) + +The one way linked list allows returning all properties of all layers in the network architecture by just passing the last layer in the network. The linked list moves from the output layer towards the input layer. + +Using the `previous_layer` attribute of layer **N**, the layer **N-1** can be accessed. Using the `previous_layer` attribute of layer **N-1**, layer **N-2** can be accessed. The process continues until reaching a layer that does not have a `previous_layer` attribute (which is the input layer). + +The properties of the layers include the weights (initial or trained), activation functions, and more. Here is how a `while` loop is used to iterate through all the layers. The `while` loop stops only when the current layer does not have a `previous_layer` attribute. This layer is the input layer. + +```python +layer = dense_layer2 + +while "previous_layer" in layer.__init__.__code__.co_varnames: + print("Number of neurons =", layer.num_neurons) + + # Go to the previous layer. + layer = layer.previous_layer +``` + +# Functions to Manipulate Neural Networks + +There are a number of functions existing in the `pygad.nn` module that helps to manipulate the neural network. + +## `pygad.nn.layers_weights()` + +Creates and returns a list holding the weights matrices of all layers in the neural network. + +Accepts the following parameters: + +- `last_layer`: A reference to the last (output) layer in the network architecture. +- `initial`: When `True` (default), the function returns the **initial** weights of the layers using the layers' `initial_weights` attribute. When `False`, it returns the **trained** weights of the layers using the layers' `trained_weights` attribute. The initial weights are only needed before network training starts. The trained weights are needed to predict the network outputs. + +The function uses a `while` loop to iterate through the layers using their `previous_layer` attribute. For each layer, either the initial weights or the trained weights are returned based on where the `initial` parameter is `True` or `False`. + +## `pygad.nn.layers_weights_as_vector()` + +Creates and returns a list holding the weights **vectors** of all layers in the neural network. The weights array of each layer is reshaped to get a vector. + +This function is similar to the `layers_weights()` function except that it returns the weights of each layer as a vector, not as an array. + +Accepts the following parameters: + +- `last_layer`: A reference to the last (output) layer in the network architecture. +- `initial`: When `True` (default), the function returns the **initial** weights of the layers using the layers' `initial_weights` attribute. When `False`, it returns the **trained** weights of the layers using the layers' `trained_weights` attribute. The initial weights are only needed before network training starts. The trained weights are needed to predict the network outputs. + +The function uses a `while` loop to iterate through the layers using their `previous_layer` attribute. For each layer, either the initial weights or the trained weights are returned based on where the `initial` parameter is `True` or `False`. + +## `pygad.nn.layers_weights_as_matrix()` + +Converts the network weights from vectors to matrices. + +Compared to the `layers_weights_as_vectors()` function that only accepts a reference to the last layer and returns the network weights as vectors, this function accepts a reference to the last layer in addition to a list holding the weights as vectors. Such vectors are converted into matrices. + +Accepts the following parameters: + +- `last_layer`: A reference to the last (output) layer in the network architecture. +- `vector_weights`: The network weights as vectors where the weights of each layer form a single vector. + +The function uses a `while` loop to iterate through the layers using their `previous_layer` attribute. For each layer, the shape of its weights array is returned. This shape is used to reshape the weights vector of the layer into a matrix. + +## `pygad.nn.layers_activations()` + +Creates and returns a list holding the names of the activation functions of all layers in the neural network. + +Accepts the following parameter: + +- `last_layer`: A reference to the last (output) layer in the network architecture. + +The function uses a `while` loop to iterate through the layers using their `previous_layer` attribute. For each layer, the name of the activation function used is returned using the layer's `activation_function` attribute. + +## `pygad.nn.sigmoid()` + +Applies the sigmoid function and returns its result. + +Accepts the following parameters: + +* `sop`: The input to which the sigmoid function is applied. + +## `pygad.nn.relu()` + +Applies the rectified linear unit (ReLU) function and returns its result. + +Accepts the following parameters: + +* `sop`: The input to which the relu function is applied. + +## `pygad.nn.softmax()` + +Applies the softmax function and returns its result. + +Accepts the following parameters: + +* `sop`: The input to which the softmax function is applied. + +## `pygad.nn.train()` + +Trains the neural network. + +Accepts the following parameters: + +- `num_epochs`: Number of epochs. +- `last_layer`: Reference to the last (output) layer in the network architecture. +- `data_inputs`: Data features. +- `data_outputs`: Data outputs. +- `problem_type`: The type of the problem which can be either `"classification"` or `"regression"`. Added in PyGAD 2.7.0 and higher. +- `learning_rate`: Learning rate. + +For each epoch, all the data samples are fed to the network to return their predictions. After each epoch, the weights are updated using only the learning rate. No learning algorithm is used because the purpose of this project is to only build the forward pass of training a neural network. + +## `pygad.nn.update_weights()` + +Calculates and returns the updated weights. Even no training algorithm is used in this project, the weights are updated using the learning rate. It is not the best way to update the weights but it is better than keeping it as it is by making some small changes to the weights. + +Accepts the following parameters: + +- `weights`: The current weights of the network. +- `network_error`: The network error. +- `learning_rate`: The learning rate. + +## `pygad.nn.update_layers_trained_weights()` + +After the network weights are trained, this function updates the `trained_weights` attribute of each layer by the weights calculated after passing all the epochs (such weights are passed in the `final_weights` parameter) + +By just passing a reference to the last layer in the network (i.e. output layer) in addition to the final weights, this function updates the `trained_weights` attribute of all layers. + +Accepts the following parameters: + +- `last_layer`: A reference to the last (output) layer in the network architecture. +- `final_weights`: An array of weights of all layers in the network after passing through all the epochs. + +The function uses a `while` loop to iterate through the layers using their `previous_layer` attribute. For each layer, its `trained_weights` attribute is assigned the weights of the layer from the `final_weights` parameter. + +## `pygad.nn.predict()` + +Uses the trained weights for predicting the samples' outputs. It returns a list of the predicted outputs for all samples. + +Accepts the following parameters: + +* `last_layer`: A reference to the last (output) layer in the network architecture. +* `data_inputs`: Data features. +* `problem_type`: The type of the problem which can be either `"classification"` or `"regression"`. Added in PyGAD 2.7.0 and higher. + +All the data samples are fed to the network to return their predictions. + +# Helper Functions + +There are functions in the `pygad.nn` module that does not directly manipulate the neural networks. + +## `pygad.nn.to_vector()` + +Converts a passed NumPy array (of any dimensionality) to its `array` parameter into a 1D vector and returns the vector. + +Accepts the following parameters: + +* `array`: The NumPy array to be converted into a 1D vector. + +## `pygad.nn.to_array()` + +Converts a passed vector to its `vector` parameter into a NumPy array and returns the array. + +Accepts the following parameters: + +- `vector`: The 1D vector to be converted into an array. +- `shape`: The target shape of the array. + +# Supported Activation Functions + +The supported activation functions are: + +1. Sigmoid: Implemented using the `pygad.nn.sigmoid()` function. +2. Rectified Linear Unit (ReLU): Implemented using the `pygad.nn.relu()` function. +3. Softmax: Implemented using the `pygad.nn.softmax()` function. + +# Steps to Build a Neural Network + +This section discusses how to use the `pygad.nn` module for building a neural network. The summary of the steps are as follows: + +- Reading the Data +- Building the Network Architecture +- Training the Network +- Making Predictions +- Calculating Some Statistics + +## Reading the Data + +Before building the network architecture, the first thing to do is to prepare the data that will be used for training the network. + +In this example, 4 classes of the **Fruits360** dataset are used for preparing the training data. The 4 classes are: + +1. [**Apple Braeburn**](https://github.com/ahmedfgad/NumPyANN/tree/master/apple): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/apple +2. [**Lemon Meyer**](https://github.com/ahmedfgad/NumPyANN/tree/master/lemon): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/lemon +3. [**Mango**](https://github.com/ahmedfgad/NumPyANN/tree/master/mango): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/mango +4. [**Raspberry**](https://github.com/ahmedfgad/NumPyANN/tree/master/raspberry): This class's data is available at https://github.com/ahmedfgad/NumPyANN/tree/master/raspberry + +The features from such 4 classes are extracted according to the next code. This code reads the raw images of the 4 classes of the dataset, prepares the features and the outputs as NumPy arrays, and saves the arrays in 2 files. + +This code extracts a feature vector from each image representing the color histogram of the HSV space's hue channel. + +```python +import numpy +import skimage.io, skimage.color, skimage.feature +import os + +fruits = ["apple", "raspberry", "mango", "lemon"] +# Number of samples in the datset used = 492+490+490+490=1,962 +# 360 is the length of the feature vector. +dataset_features = numpy.zeros(shape=(1962, 360)) +outputs = numpy.zeros(shape=(1962)) + +idx = 0 +class_label = 0 +for fruit_dir in fruits: + curr_dir = os.path.join(os.path.sep, fruit_dir) + all_imgs = os.listdir(os.getcwd()+curr_dir) + for img_file in all_imgs: + if img_file.endswith(".jpg"): # Ensures reading only JPG files. + fruit_data = skimage.io.imread(fname=os.path.sep.join([os.getcwd(), curr_dir, img_file]), as_gray=False) + fruit_data_hsv = skimage.color.rgb2hsv(rgb=fruit_data) + hist = numpy.histogram(a=fruit_data_hsv[:, :, 0], bins=360) + dataset_features[idx, :] = hist[0] + outputs[idx] = class_label + idx = idx + 1 + class_label = class_label + 1 + +# Saving the extracted features and the outputs as NumPy files. +numpy.save("dataset_features.npy", dataset_features) +numpy.save("outputs.npy", outputs) +``` + +To save your time, the training data is already prepared and 2 files created by the next code are available for download at these links: + +1. [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy): The features https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy +2. [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy): The class labels https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy + +The [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy) file gives the following labels for the 4 classes: + +1. [**Apple Braeburn**](https://github.com/ahmedfgad/NumPyANN/tree/master/apple): Class label is **0** +2. [**Lemon Meyer**](https://github.com/ahmedfgad/NumPyANN/tree/master/lemon): Class label is **1** +3. [**Mango**](https://github.com/ahmedfgad/NumPyANN/tree/master/mango): Class label is **2** +4. [**Raspberry**](https://github.com/ahmedfgad/NumPyANN/tree/master/raspberry): Class label is **3** + +The project has 4 folders holding the images for the 4 classes. + +After the 2 files are created, then just read them to return the NumPy arrays according to the next 2 lines: + +```python +data_inputs = numpy.load("dataset_features.npy") +data_outputs = numpy.load("outputs.npy") +``` + +After the data is prepared, next is to create the network architecture. + +## Building the Network Architecture + +The input layer is created by instantiating the `pygad.nn.InputLayer` class according to the next code. A network can only have a single input layer. + +```python +import pygad.nn +num_inputs = data_inputs.shape[1] + +input_layer = pygad.nn.InputLayer(num_inputs) +``` + +After the input layer is created, next is to create a number of dense layers according to the next code. Normally, the last dense layer is regarded as the output layer. Note that the output layer has a number of neurons equal to the number of classes in the dataset which is 4. + +```python +hidden_layer = pygad.nn.DenseLayer(num_neurons=HL2_neurons, previous_layer=input_layer, activation_function="relu") +output_layer = pygad.nn.DenseLayer(num_neurons=4, previous_layer=hidden_layer2, activation_function="softmax") +``` + +After both the data and the network architecture are prepared, the next step is to train the network. + +## Training the Network + +Here is an example of using the `pygad.nn.train()` function. + +```python +pygad.nn.train(num_epochs=10, + last_layer=output_layer, + data_inputs=data_inputs, + data_outputs=data_outputs, + learning_rate=0.01) +``` + +After training the network, the next step is to make predictions. + +## Making Predictions + +The `pygad.nn.predict()` function uses the trained network for making predictions. Here is an example. + +```python +predictions = pygad.nn.predict(last_layer=output_layer, data_inputs=data_inputs) +``` + +It is not expected to have high accuracy in the predictions because no training algorithm is used. + +## Calculating Some Statistics + +Based on the predictions the network made, some statistics can be calculated such as the number of correct and wrong predictions in addition to the classification accuracy. + +```python +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +It is very important to note that it is not expected that the classification accuracy is high because no training algorithm is used. Please check the documentation of the `pygad.gann` module for training the network using the genetic algorithm. + +# Examples + +This section gives the complete code of some examples that build neural networks using `pygad.nn`. Each subsection builds a different network. + +## XOR Classification + +This is an example of building a network with 1 hidden layer with 2 neurons for building a network that simulates the XOR logic gate. Because the XOR problem has 2 classes (0 and 1), then the output layer has 2 neurons, one for each class. + +```python +import numpy +import pygad.nn + +# Preparing the NumPy array of the inputs. +data_inputs = numpy.array([[1, 1], + [1, 0], + [0, 1], + [0, 0]]) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.array([0, + 1, + 1, + 0]) + +# The number of inputs (i.e. feature vector length) per sample +num_inputs = data_inputs.shape[1] +# Number of outputs per sample +num_outputs = 2 + +HL1_neurons = 2 + +# Building the network architecture. +input_layer = pygad.nn.InputLayer(num_inputs) +hidden_layer1 = pygad.nn.DenseLayer(num_neurons=HL1_neurons, previous_layer=input_layer, activation_function="relu") +output_layer = pygad.nn.DenseLayer(num_neurons=num_outputs, previous_layer=hidden_layer1, activation_function="softmax") + +# Training the network. +pygad.nn.train(num_epochs=10, + last_layer=output_layer, + data_inputs=data_inputs, + data_outputs=data_outputs, + learning_rate=0.01) + +# Using the trained network for predictions. +predictions = pygad.nn.predict(last_layer=output_layer, data_inputs=data_inputs) + +# Calculating some statistics +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +## Image Classification + +This example is discussed in the **Steps to Build a Neural Network** section and its complete code is listed below. + +Remember to either download or create the [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy) and [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy) files before running this code. + +```python +import numpy +import pygad.nn + +# Reading the data features. Check the 'extract_features.py' script for extracting the features & preparing the outputs of the dataset. +data_inputs = numpy.load("dataset_features.npy") # Download from https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy + +# Optional step for filtering the features using the standard deviation. +features_STDs = numpy.std(a=data_inputs, axis=0) +data_inputs = data_inputs[:, features_STDs > 50] + +# Reading the data outputs. Check the 'extract_features.py' script for extracting the features & preparing the outputs of the dataset. +data_outputs = numpy.load("outputs.npy") # Download from https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy + +# The number of inputs (i.e. feature vector length) per sample +num_inputs = data_inputs.shape[1] +# Number of outputs per sample +num_outputs = 4 + +HL1_neurons = 150 +HL2_neurons = 60 + +# Building the network architecture. +input_layer = pygad.nn.InputLayer(num_inputs) +hidden_layer1 = pygad.nn.DenseLayer(num_neurons=HL1_neurons, previous_layer=input_layer, activation_function="relu") +hidden_layer2 = pygad.nn.DenseLayer(num_neurons=HL2_neurons, previous_layer=hidden_layer1, activation_function="relu") +output_layer = pygad.nn.DenseLayer(num_neurons=num_outputs, previous_layer=hidden_layer2, activation_function="softmax") + +# Training the network. +pygad.nn.train(num_epochs=10, + last_layer=output_layer, + data_inputs=data_inputs, + data_outputs=data_outputs, + learning_rate=0.01) + +# Using the trained network for predictions. +predictions = pygad.nn.predict(last_layer=output_layer, data_inputs=data_inputs) + +# Calculating some statistics +num_wrong = numpy.where(predictions != data_outputs)[0] +num_correct = data_outputs.size - num_wrong.size +accuracy = 100 * (num_correct/data_outputs.size) +print(f"Number of correct classifications : {num_correct}.") +print(f"Number of wrong classifications : {num_wrong.size}.") +print(f"Classification accuracy : {accuracy}.") +``` + +## Regression Example 1 + +The next code listing builds a neural network for regression. Here is what to do to make the code works for regression: + +1. Set the `problem_type` parameter in the `pygad.nn.train()` and `pygad.nn.predict()` functions to the string `"regression"`. + +```python +pygad.nn.train(..., + problem_type="regression") + +predictions = pygad.nn.predict(..., + problem_type="regression") +``` + +2. Set the activation function for the output layer to the string `"None"`. + +```python +output_layer = pygad.nn.DenseLayer(num_neurons=num_outputs, previous_layer=hidden_layer1, activation_function="None") +``` + +3. Calculate the prediction error according to your preferred error function. Here is how the mean absolute error is calculated. + +```python +abs_error = numpy.mean(numpy.abs(predictions - data_outputs)) +print(f"Absolute error : {abs_error}.") +``` + +Here is the complete code. Yet, there is no algorithm used to train the network and thus the network is expected to give bad results. Later, the `pygad.gann` module is used to train either a regression or classification networks. + +```python +import numpy +import pygad.nn + +# Preparing the NumPy array of the inputs. +data_inputs = numpy.array([[2, 5, -3, 0.1], + [8, 15, 20, 13]]) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.array([0.1, + 1.5]) + +# The number of inputs (i.e. feature vector length) per sample +num_inputs = data_inputs.shape[1] +# Number of outputs per sample +num_outputs = 1 + +HL1_neurons = 2 + +# Building the network architecture. +input_layer = pygad.nn.InputLayer(num_inputs) +hidden_layer1 = pygad.nn.DenseLayer(num_neurons=HL1_neurons, previous_layer=input_layer, activation_function="relu") +output_layer = pygad.nn.DenseLayer(num_neurons=num_outputs, previous_layer=hidden_layer1, activation_function="None") + +# Training the network. +pygad.nn.train(num_epochs=100, + last_layer=output_layer, + data_inputs=data_inputs, + data_outputs=data_outputs, + learning_rate=0.01, + problem_type="regression") + +# Using the trained network for predictions. +predictions = pygad.nn.predict(last_layer=output_layer, + data_inputs=data_inputs, + problem_type="regression") + +# Calculating some statistics +abs_error = numpy.mean(numpy.abs(predictions - data_outputs)) +print(f"Absolute error : {abs_error}.") +``` + +## Regression Example 2 - Fish Weight Prediction + +This example uses the Fish Market Dataset available at Kaggle (https://www.kaggle.com/aungpyaeap/fish-market). Simply download the CSV dataset from [this link](https://www.kaggle.com/aungpyaeap/fish-market/download) (https://www.kaggle.com/aungpyaeap/fish-market/download). The dataset is also available at the [GitHub project of the pygad.nn module](https://github.com/ahmedfgad/NumPyANN): https://github.com/ahmedfgad/NumPyANN + +Using the Pandas library, the dataset is read using the `read_csv()` function. + +```python +data = numpy.array(pandas.read_csv("Fish.csv")) +``` + +The last 5 columns in the dataset are used as inputs and the **Weight** column is used as output. + +```python +# Preparing the NumPy array of the inputs. +data_inputs = numpy.asarray(data[:, 2:], dtype=numpy.float32) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.asarray(data[:, 1], dtype=numpy.float32) # Fish Weight +``` + +Note how the activation function at the last layer is set to `"None"`. Moreover, the `problem_type` parameter in the `pygad.nn.train()` and `pygad.nn.predict()` functions is set to `"regression"`. + +After the `pygad.nn.train()` function completes, the mean absolute error is calculated. + +```python +abs_error = numpy.mean(numpy.abs(predictions - data_outputs)) +print(f"Absolute error : {abs_error}.") +``` + +Here is the complete code. + +```python +import numpy +import pygad.nn +import pandas + +data = numpy.array(pandas.read_csv("Fish.csv")) + +# Preparing the NumPy array of the inputs. +data_inputs = numpy.asarray(data[:, 2:], dtype=numpy.float32) + +# Preparing the NumPy array of the outputs. +data_outputs = numpy.asarray(data[:, 1], dtype=numpy.float32) # Fish Weight + +# The number of inputs (i.e. feature vector length) per sample +num_inputs = data_inputs.shape[1] +# Number of outputs per sample +num_outputs = 1 + +HL1_neurons = 2 + +# Building the network architecture. +input_layer = pygad.nn.InputLayer(num_inputs) +hidden_layer1 = pygad.nn.DenseLayer(num_neurons=HL1_neurons, previous_layer=input_layer, activation_function="relu") +output_layer = pygad.nn.DenseLayer(num_neurons=num_outputs, previous_layer=hidden_layer1, activation_function="None") + +# Training the network. +pygad.nn.train(num_epochs=100, + last_layer=output_layer, + data_inputs=data_inputs, + data_outputs=data_outputs, + learning_rate=0.01, + problem_type="regression") + +# Using the trained network for predictions. +predictions = pygad.nn.predict(last_layer=output_layer, + data_inputs=data_inputs, + problem_type="regression") + +# Calculating some statistics +abs_error = numpy.mean(numpy.abs(predictions - data_outputs)) +print(f"Absolute error : {abs_error}.") +``` + diff --git a/docs/md/pygad.md b/docs/md/pygad.md new file mode 100644 index 0000000..501916c --- /dev/null +++ b/docs/md/pygad.md @@ -0,0 +1,992 @@ +# `pygad` Module + +This section of the PyGAD's library documentation discusses the `pygad` module. + +Using the `pygad` module, instances of the genetic algorithm can be created, run, saved, and loaded. Single-objective and multi-objective optimization problems can be solved. + +# `pygad.GA` Class + +The first module available in PyGAD is named `pygad` and contains a class named `GA` for building the genetic algorithm. The constructor, methods, function, and attributes within the class are discussed in this section. + +## `__init__()` + +For creating an instance of the `pygad.GA` class, the constructor accepts several parameters that allow the user to customize the genetic algorithm to different types of applications. + +The `pygad.GA` class constructor supports the following parameters: + +- `num_generations`: Number of generations. +- `num_parents_mating `: Number of solutions to be selected as parents. +- `fitness_func`: Accepts a function/method and returns the fitness value(s) of the solution. If a function is passed, then it must accept 3 parameters (1. the instance of the `pygad.GA` class, 2. a single solution, and 3. its index in the population). If method, then it accepts a fourth parameter representing the method's class instance. Check the [Preparing the fitness_func Parameter](https://pygad.readthedocs.io/en/latest/pygad.html#preparing-the-fitness-func-parameter) section for information about creating such a function. In [PyGAD 3.2.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-2-0), multi-objective optimization is supported. To consider the problem as multi-objective, just return a `list`, `tuple`, or `numpy.ndarray` from the fitness function. +- `fitness_batch_size=None`: A new optional parameter called `fitness_batch_size` is supported to calculate the fitness function in batches. If it is assigned the value `1` or `None` (default), then the normal flow is used where the fitness function is called for each individual solution. If the `fitness_batch_size` parameter is assigned a value satisfying this condition `1 < fitness_batch_size <= sol_per_pop`, then the solutions are grouped into batches of size `fitness_batch_size` and the fitness function is called once for each batch. Check the [Batch Fitness Calculation](https://pygad.readthedocs.io/en/latest/pygad_more.html#batch-fitness-calculation) section for more details and examples. Added in from [PyGAD 2.19.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-19-0). +- `initial_population`: A user-defined initial population. It is useful when the user wants to start the generations with a custom initial population. It defaults to `None` which means no initial population is specified by the user. In this case, [PyGAD](https://pypi.org/project/pygad) creates an initial population using the `sol_per_pop` and `num_genes` parameters. An exception is raised if the `initial_population` is `None` while any of the 2 parameters (`sol_per_pop` or `num_genes`) is also `None`. Introduced in [PyGAD 2.0.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-0-0) and higher. +- `sol_per_pop`: Number of solutions (i.e. chromosomes) within the population. This parameter has no action if `initial_population` parameter exists. +- `num_genes`: Number of genes in the solution/chromosome. This parameter is not needed if the user feeds the initial population to the `initial_population` parameter. +- `gene_type=float`: Controls the gene type. It can be assigned to a single data type that is applied to all genes or can specify the data type of each individual gene. It defaults to `float` which means all genes are of `float` data type. Starting from [PyGAD 2.9.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-9-0), the `gene_type` parameter can be assigned to a numeric value of any of these types: `int`, `float`, and `numpy.int/uint/float(8-64)`. Starting from [PyGAD 2.14.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-14-0), it can be assigned to a `list`, `tuple`, or a `numpy.ndarray` which hold a data type for each gene (e.g. `gene_type=[int, float, numpy.int8]`). This helps to control the data type of each individual gene. In [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0), a precision for the `float` data types can be specified (e.g. `gene_type=[float, 2]`. +- `init_range_low=-4`: The lower value of the random range from which the gene values in the initial population are selected. `init_range_low` defaults to `-4`. Available in [PyGAD 1.0.20](https://pygad.readthedocs.io/en/latest/releases.html#pygad-1-0-20) and higher. This parameter has no action if the `initial_population` parameter exists. +- `init_range_high=4`: The upper value of the random range from which the gene values in the initial population are selected. `init_range_high` defaults to `+4`. Available in [PyGAD 1.0.20](https://pygad.readthedocs.io/en/latest/releases.html#pygad-1-0-20) and higher. This parameter has no action if the `initial_population` parameter exists. +- `parent_selection_type="sss"`: The parent selection type. Supported types are `sss` (for steady-state selection), `rws` (for roulette wheel selection), `sus` (for stochastic universal selection), `rank` (for rank selection), `random` (for random selection), and `tournament` (for tournament selection). A custom parent selection function can be passed starting from [PyGAD 2.16.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-16-0). Check the [User-Defined Crossover, Mutation, and Parent Selection Operators](https://pygad.readthedocs.io/en/latest/utils.html#user-defined-crossover-mutation-and-parent-selection-operators) section for more details about building a user-defined parent selection function. +- `keep_parents=-1`: Number of parents to keep in the current population. `-1` (default) means to keep all parents in the next population. `0` means keep no parents in the next population. A value `greater than 0` means keeps the specified number of parents in the next population. Note that the value assigned to `keep_parents` cannot be `< - 1` or greater than the number of solutions within the population `sol_per_pop`. Starting from [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0), this parameter have an effect only when the `keep_elitism` parameter is `0`. Starting from [PyGAD 2.20.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-20-0), the parents' fitness from the last generation will not be re-used if `keep_parents=0`. +- `keep_elitism=1`: Added in [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0). It can take the value `0` or a positive integer that satisfies (`0 <= keep_elitism <= sol_per_pop`). It defaults to `1` which means only the best solution in the current generation is kept in the next generation. If assigned `0`, this means it has no effect. If assigned a positive integer `K`, then the best `K` solutions are kept in the next generation. It cannot be assigned a value greater than the value assigned to the `sol_per_pop` parameter. If this parameter has a value different than `0`, then the `keep_parents` parameter will have no effect. +- `K_tournament=3`: In case that the parent selection type is `tournament`, the `K_tournament` specifies the number of parents participating in the tournament selection. It defaults to `3`. +- `crossover_type="single_point"`: Type of the crossover operation. Supported types are `single_point` (for single-point crossover), `two_points` (for two points crossover), `uniform` (for uniform crossover), and `scattered` (for scattered crossover). Scattered crossover is supported from PyGAD [2.9.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-9-0) and higher. It defaults to `single_point`. A custom crossover function can be passed starting from [PyGAD 2.16.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-16-0). Check the [User-Defined Crossover, Mutation, and Parent Selection Operators](https://pygad.readthedocs.io/en/latest/pygad_more.html#user-defined-crossover-mutation-and-parent-selection-operators) section for more details about creating a user-defined crossover function. Starting from [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher, if `crossover_type=None`, then the crossover step is bypassed which means no crossover is applied and thus no offspring will be created in the next generations. The next generation will use the solutions in the current population. +- `crossover_probability=None`: The probability of selecting a parent for applying the crossover operation. Its value must be between 0.0 and 1.0 inclusive. For each parent, a random value between 0.0 and 1.0 is generated. If this random value is less than or equal to the value assigned to the `crossover_probability` parameter, then the parent is selected. Added in [PyGAD 2.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-5-0) and higher. +- `mutation_type="random"`: Type of the mutation operation. Supported types are `random` (for random mutation), `swap` (for swap mutation), `inversion` (for inversion mutation), `scramble` (for scramble mutation), and `adaptive` (for adaptive mutation). It defaults to `random`. A custom mutation function can be passed starting from [PyGAD 2.16.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-16-0). Check the [User-Defined Crossover, Mutation, and Parent Selection Operators](https://pygad.readthedocs.io/en/latest/pygad_more.html#user-defined-crossover-mutation-and-parent-selection-operators) section for more details about creating a user-defined mutation function. Starting from [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher, if `mutation_type=None`, then the mutation step is bypassed which means no mutation is applied and thus no changes are applied to the offspring created using the crossover operation. The offspring will be used unchanged in the next generation. `Adaptive` mutation is supported starting from [PyGAD 2.10.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-10-0). For more information about adaptive mutation, go the the [Adaptive Mutation](https://pygad.readthedocs.io/en/latest/pygad_more.html#adaptive-mutation) section. For example about using adaptive mutation, check the [Use Adaptive Mutation in PyGAD](https://pygad.readthedocs.io/en/latest/pygad_more.html#use-adaptive-mutation-in-pygad) section. +- `mutation_probability=None`: The probability of selecting a gene for applying the mutation operation. Its value must be between 0.0 and 1.0 inclusive. For each gene in a solution, a random value between 0.0 and 1.0 is generated. If this random value is less than or equal to the value assigned to the `mutation_probability` parameter, then the gene is selected. If this parameter exists, then there is no need for the 2 parameters `mutation_percent_genes` and `mutation_num_genes`. Added in [PyGAD 2.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-5-0) and higher. +- `mutation_by_replacement=False`: An optional bool parameter. It works only when the selected type of mutation is random (`mutation_type="random"`). In this case, `mutation_by_replacement=True` means replace the gene by the randomly generated value. If False, then it has no effect and random mutation works by adding the random value to the gene. Supported in [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher. Check the changes in [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) under the Release History section for an example. +- `mutation_percent_genes="default"`: Percentage of genes to mutate. It defaults to the string `"default"` which is later translated into the integer `10` which means 10% of the genes will be mutated. It must be `>0` and `<=100`. Out of this percentage, the number of genes to mutate is deduced which is assigned to the `mutation_num_genes` parameter. The `mutation_percent_genes` parameter has no action if `mutation_probability` or `mutation_num_genes` exist. Starting from [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher, this parameter has no action if `mutation_type` is `None`. +- `mutation_num_genes=None`: Number of genes to mutate which defaults to `None` meaning that no number is specified. The `mutation_num_genes` parameter has no action if the parameter `mutation_probability` exists. Starting from [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher, this parameter has no action if `mutation_type` is `None`. +- `random_mutation_min_val=-1.0`: For `random` mutation, the `random_mutation_min_val` parameter specifies the start value of the range from which a random value is selected to be added to the gene. It defaults to `-1`. Starting from [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher, this parameter has no action if `mutation_type` is `None`. +- `random_mutation_max_val=1.0`: For `random` mutation, the `random_mutation_max_val` parameter specifies the end value of the range from which a random value is selected to be added to the gene. It defaults to `+1`. Starting from [PyGAD 2.2.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-2-2) and higher, this parameter has no action if `mutation_type` is `None`. +- `gene_space=None`: It is used to specify the possible values for each gene in case the user wants to restrict the gene values. It is useful if the gene space is restricted to a certain range or to discrete values. It accepts a `list`, `range`, or `numpy.ndarray`. When all genes have the same global space, specify their values as a `list`/`tuple`/`range`/`numpy.ndarray`. For example, `gene_space = [0.3, 5.2, -4, 8]` restricts the gene values to the 4 specified values. If each gene has its own space, then the `gene_space` parameter can be nested like `[[0.4, -5], [0.5, -3.2, 8.2, -9], ...]` where the first sublist determines the values for the first gene, the second sublist for the second gene, and so on. If the nested list/tuple has a `None` value, then the gene's initial value is selected randomly from the range specified by the 2 parameters `init_range_low` and `init_range_high` and its mutation value is selected randomly from the range specified by the 2 parameters `random_mutation_min_val` and `random_mutation_max_val`. `gene_space` is added in [PyGAD 2.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-5-0). Check the [Release History of PyGAD 2.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-5-0) section of the documentation for more details. In [PyGAD 2.9.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-9-0), NumPy arrays can be assigned to the `gene_space` parameter. In [PyGAD 2.11.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-11-0), the `gene_space` parameter itself or any of its elements can be assigned to a dictionary to specify the lower and upper limits of the genes. For example, `{'low': 2, 'high': 4}` means the minimum and maximum values are 2 and 4, respectively. In [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0), a new key called `"step"` is supported to specify the step of moving from the start to the end of the range specified by the 2 existing keys `"low"` and `"high"`. +- `gene_constraint=None`: A list of callables (i.e. functions) acting as constraints for the gene values. Before selecting a value for a gene, the callable is called to ensure the candidate value is valid. Added in [PyGAD 3.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-5-0). Check the [Gene Constraint](https://pygad.readthedocs.io/en/latest/pygad_more.html#gene-constraint) section for more information. +- `sample_size=100`: In some cases where a gene value is to be selected, this variable defines the size of the sample from which a value is selected randomly. Useful if either `allow_duplicate_genes` or `gene_constraint` is used. If PyGAD failed to find a unique value or a value that meets a gene constraint, it is recommended to increases this parameter's value. Added in [PyGAD 3.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-5-0). Check the [sample_size Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#sample-size-parameter) section for more information. +- `on_start=None`: Accepts a function/method to be called only once before the genetic algorithm starts its evolution. If function, then it must accept a single parameter representing the instance of the genetic algorithm. If method, then it must accept 2 parameters where the second one refers to the method's object. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `on_fitness=None`: Accepts a function/method to be called after calculating the fitness values of all solutions in the population. If function, then it must accept 2 parameters: 1) a list of all solutions' fitness values 2) the instance of the genetic algorithm. If method, then it must accept 3 parameters where the third one refers to the method's object. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `on_parents=None`: Accepts a function/method to be called after selecting the parents that mates. If function, then it must accept 2 parameters: 1) the selected parents 2) the instance of the genetic algorithm If method, then it must accept 3 parameters where the third one refers to the method's object. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `on_crossover=None`: Accepts a function to be called each time the crossover operation is applied. This function must accept 2 parameters: the first one represents the instance of the genetic algorithm and the second one represents the offspring generated using crossover. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `on_mutation=None`: Accepts a function to be called each time the mutation operation is applied. This function must accept 2 parameters: the first one represents the instance of the genetic algorithm and the second one represents the offspring after applying the mutation. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `on_generation=None`: Accepts a function to be called after each generation. This function must accept a single parameter representing the instance of the genetic algorithm. If the function returned the string `stop`, then the `run()` method stops without completing the other generations. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `on_stop=None`: Accepts a function to be called only once exactly before the genetic algorithm stops or when it completes all the generations. This function must accept 2 parameters: the first one represents the instance of the genetic algorithm and the second one is a list of fitness values of the last population's solutions. Added in [PyGAD 2.6.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-6-0). +- `save_best_solutions=False`: When `True`, then the best solution after each generation is saved into an attribute named `best_solutions`. If `False` (default), then no solutions are saved and the `best_solutions` attribute will be empty. Supported in [PyGAD 2.9.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-9-0). +- `save_solutions=False`: If `True`, then all solutions in each generation are appended into an attribute called `solutions` which is NumPy array. Supported in [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0). +- `suppress_warnings=False`: A bool parameter to control whether the warning messages are printed or not. It defaults to `False`. +- `allow_duplicate_genes=True`: Added in [PyGAD 2.13.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-13-0). If `True`, then a solution/chromosome may have duplicate gene values. If `False`, then each gene will have a unique value in its solution. +- `stop_criteria=None`: Some criteria to stop the evolution. Added in [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0). Each criterion is passed as `str` which has a stop word. The current 2 supported words are `reach` and `saturate`. `reach` stops the `run()` method if the fitness value is equal to or greater than a given fitness value. An example for `reach` is `"reach_40"` which stops the evolution if the fitness is >= 40. `saturate` means stop the evolution if the fitness saturates for a given number of consecutive generations. An example for `saturate` is `"saturate_7"` which means stop the `run()` method if the fitness does not change for 7 consecutive generations. +- `parallel_processing=None`: Added in [PyGAD 2.17.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-17-0). If `None` (Default), this means no parallel processing is applied. It can accept a list/tuple of 2 elements [1) Can be either `'process'` or `'thread'` to indicate whether processes or threads are used, respectively., 2) The number of processes or threads to use.]. For example, `parallel_processing=['process', 10]` applies parallel processing with 10 processes. If a positive integer is assigned, then it is used as the number of threads. For example, `parallel_processing=5` uses 5 threads which is equivalent to `parallel_processing=["thread", 5]`. For more information, check the [Parallel Processing in PyGAD](https://pygad.readthedocs.io/en/latest/pygad_more.html#parallel-processing-in-pygad) section. +- `random_seed=None`: Added in [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0). It defines the random seed to be used by the random function generators (we use random functions in the NumPy and random modules). This helps to reproduce the same results by setting the same random seed (e.g. `random_seed=2`). If given the value `None`, then it has no effect. +- `logger=None`: Accepts an instance of the `logging.Logger` class to log the outputs. Any message is no longer printed using `print()` but logged. If `logger=None`, then a logger is created that uses `StreamHandler` to logs the messages to the console. Added in [PyGAD 3.0.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-0-0). Check the [Logging Outputs](https://pygad.readthedocs.io/en/latest/pygad_more.html#logging-outputs) for more information. + +The user doesn't have to specify all of such parameters while creating an instance of the GA class. A very important parameter you must care about is `fitness_func` which defines the fitness function. + +It is OK to set the value of any of the 2 parameters `init_range_low` and `init_range_high` to be equal, higher, or lower than the other parameter (i.e. `init_range_low` is not needed to be lower than `init_range_high`). The same holds for the `random_mutation_min_val` and `random_mutation_max_val` parameters. + +If the 2 parameters `mutation_type` and `crossover_type` are `None`, this disables any type of evolution the genetic algorithm can make. As a result, the genetic algorithm cannot find a better solution that the best solution in the initial population. + +The parameters are validated within the constructor. If at least a parameter is not correct, an exception is thrown. + +## Plotting Methods in `pygad.GA` Class + +- `plot_fitness()`: Shows how the fitness evolves by generation. +- `plot_genes()`: Shows how the gene value changes for each generation. +- `plot_new_solution_rate()`: Shows the number of new solutions explored in each solution. + +## Class Attributes + +* `supported_int_types`: A list of the supported types for the integer numbers. +* `supported_float_types`: A list of the supported types for the floating-point numbers. +* `supported_int_float_types`: A list of the supported types for all numbers. It just concatenates the previous 2 lists. + +## Other Instance Attributes & Methods + +All the parameters and functions passed to the `pygad.GA` class constructor are used as class attributes and methods in the instances of the `pygad.GA` class. In addition to such attributes, there are other attributes and methods added to the instances of the `pygad.GA` class: + +The next 2 subsections list such attributes and methods. + +### Other Attributes + +- `generations_completed`: Holds the number of the last completed generation. +- `population`: A NumPy array holding the initial population. +- `valid_parameters`: Set to `True` when all the parameters passed in the `GA` class constructor are valid. +- `run_completed`: Set to `True` only after the `run()` method completes gracefully. +- `pop_size`: The population size. +- `best_solutions_fitness`: A list holding the fitness values of the best solutions for all generations. +- `best_solution_generation`: The generation number at which the best fitness value is reached. It is only assigned the generation number after the `run()` method completes. Otherwise, its value is -1. +- `best_solutions`: A NumPy array holding the best solution per each generation. It only exists when the `save_best_solutions` parameter in the `pygad.GA` class constructor is set to `True`. +- `last_generation_fitness`: The fitness values of the solutions in the last generation. [Added in PyGAD 2.12.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-12-0). +- `previous_generation_fitness`: At the end of each generation, the fitness of the most recent population is saved in the `last_generation_fitness` attribute. The fitness of the population exactly preceding this most recent population is saved in the `last_generation_fitness` attribute. This `previous_generation_fitness` attribute is used to fetch the pre-calculated fitness instead of calling the fitness function for already explored solutions. [Added in PyGAD 2.16.2](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-16-2). +- `last_generation_parents`: The parents selected from the last generation. [Added in PyGAD 2.12.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-12-0). +- `last_generation_offspring_crossover`: The offspring generated after applying the crossover in the last generation. [Added in PyGAD 2.12.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-12-0). +- `last_generation_offspring_mutation`: The offspring generated after applying the mutation in the last generation. [Added in PyGAD 2.12.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-12-0). +- `gene_type_single`: A flag that is set to `True` if the `gene_type` parameter is assigned to a single data type that is applied to all genes. If `gene_type` is assigned a `list`, `tuple`, or `numpy.ndarray`, then the value of `gene_type_single` will be `False`. [Added in PyGAD 2.14.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-14-0). +- `last_generation_parents_indices`: This attribute holds the indices of the selected parents in the last generation. Supported in [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0). +- `last_generation_elitism`: This attribute holds the elitism of the last generation. It is effective only if the `keep_elitism` parameter has a non-zero value. Supported in [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0). +- `last_generation_elitism_indices`: This attribute holds the indices of the elitism of the last generation. It is effective only if the `keep_elitism` parameter has a non-zero value. Supported in [PyGAD 2.19.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-19-0). +- `logger`: This attribute holds the logger from the `logging` module. Supported in [PyGAD 3.0.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-0-0). +- `gene_space_unpacked`: This is the unpacked version of the `gene_space` parameter. For example, `range(1, 5)` is unpacked to `[1, 2, 3, 4]`. For an infinite range like `{'low': 2, 'high': 4}`, then it is unpacked to a limited number of values (e.g. 100). Supported in [PyGAD 3.1.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-1-0). +- `pareto_fronts`: A new instance attribute named `pareto_fronts` added to the `pygad.GA` instances that holds the pareto fronts when solving a multi-objective problem. Supported in [PyGAD 3.2.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-2-0). + +Note that the attributes with names starting with `last_generation_` are updated after each generation. + +### Other Methods + +- `cal_pop_fitness()`: A method that calculates the fitness values for all solutions within the population by calling the function passed to the `fitness_func` parameter for each solution. +- `crossover()`: Refers to the method that applies the crossover operator based on the selected type of crossover in the `crossover_type` property. +- `mutation()`: Refers to the method that applies the mutation operator based on the selected type of mutation in the `mutation_type` property. +- `select_parents()`: Refers to a method that selects the parents based on the parent selection type specified in the `parent_selection_type` attribute. +- `adaptive_mutation_population_fitness()`: Returns the average fitness value used in the adaptive mutation to filter the solutions. +- `summary()`: Prints a Keras-like summary of the PyGAD lifecycle. This helps to have an overview of the architecture. Supported in [PyGAD 2.19.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-19-0). Check the [Print Lifecycle Summary](https://pygad.readthedocs.io/en/latest/pygad_more.html#print-lifecycle-summary) section for more details and examples. +- 4 methods with names starting with `run_`. Their purpose is to keep the main loop inside the `run()` method clean. The details inside the loop are moved to 4 individual methods. Generally, any method with a name starting with `run_` is meant to be called by PyGAD from inside the `run()` method. Supported in [PyGAD 3.3.1](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-3-1). + 1. `run_select_parents(call_on_parents=True)`: Select the parents and call the callable `on_parents()` if defined. If `call_on_parents` is `True`, then the callable `on_parents()` is called. It must be `False` when the `run_select_parents()` method is called to update the parents at the end of the `run()` method. + 2. `run_crossover()`: Apply crossover and call the callable `on_crossover()` if defined. + 3. `run_mutation()`: Apply mutation and call the callable `on_mutation()` if defined. + 4. `run_update_population()`: Update the `population` attribute after completing the processes of crossover and mutation. + +There are many methods that are not designed for user usage. Some of them are listed above but this is not a comprehensive list. The [release history](https://pygad.readthedocs.io/en/latest/releases.html) section usually covers them. Moreover, you can check the [PyGAD GitHub repository](https://github.com/ahmedfgad/GeneticAlgorithmPython) to find more. + +The next sections discuss the methods available in the `pygad.GA` class. + +## `initialize_population()` + +It creates an initial population randomly as a NumPy array. The array is saved in the instance attribute named `population`. + +Accepts the following parameters: + +- `low`: The lower value of the random range from which the gene values in the initial population are selected. It defaults to -4. Available in PyGAD 1.0.20 and higher. +- `high`: The upper value of the random range from which the gene values in the initial population are selected. It defaults to -4. Available in PyGAD 1.0.20. + +This method assigns the values of the following 3 instance attributes: + +1. `pop_size`: Size of the population. +2. `population`: Initially, it holds the initial population and later updated after each generation. +3. `initial_population`: Keeping the initial population. + +## `cal_pop_fitness()` + +The `cal_pop_fitness()` method calculates and returns the fitness values of the solutions in the current population. + +This function is optimized to save time by making fewer calls the fitness function. It follows this process: + +1. If the `save_solutions` parameter is set to `True`, then it checks if the solution is already explored and saved in the `solutions` instance attribute. If so, then it just retrieves its fitness from the `solutions_fitness` instance attribute without calling the fitness function. +2. If `save_solutions` is set to `False` or if it is `True` but the solution was not explored yet, then the `cal_pop_fitness()` method checks if the `keep_elitism` parameter is set to a positive integer. If so, then it checks if the solution is saved into the `last_generation_elitism` instance attribute. If so, then it retrieves its fitness from the `previous_generation_fitness` instance attribute. +3. If neither of the above 3 conditions apply (1. `save_solutions` is set to `False` or 2. if it is `True` but the solution was not explored yet or 3. `keep_elitism` is set to zero), then the `cal_pop_fitness()` method checks if the `keep_parents` parameter is set to `-1` or a positive integer. If so, then it checks if the solution is saved into the `last_generation_parents` instance attribute. If so, then it retrieves its fitness from the `previous_generation_fitness` instance attribute. +4. If neither of the above 4 conditions apply, then we have to call the fitness function to calculate the fitness for the solution. This is by calling the function assigned to the `fitness_func` parameter. + +This function takes into consideration: + +1. The `parallel_processing` parameter to check whether parallel processing is in effect. +2. The `fitness_batch_size` parameter to check if the fitness should be calculated in batches of solutions. + +It returns a vector of the solutions' fitness values. + +## `run()` + +Runs the genetic algorithm. This is the main method in which the genetic algorithm is evolved through some generations. It accepts no parameters as it uses the instance to access all of its requirements. + +For each generation, the fitness values of all solutions within the population are calculated according to the `cal_pop_fitness()` method which internally just calls the function assigned to the `fitness_func` parameter in the `pygad.GA` class constructor for each solution. + +According to the fitness values of all solutions, the parents are selected using the `select_parents()` method. This method behaviour is determined according to the parent selection type in the `parent_selection_type` parameter in the `pygad.GA` class constructor + +Based on the selected parents, offspring are generated by applying the crossover and mutation operations using the `crossover()` and `mutation()` methods. The behaviour of such 2 methods is defined according to the `crossover_type` and `mutation_type` parameters in the `pygad.GA` class constructor. + +After the generation completes, the following takes place: + +- The `population` attribute is updated by the new population. +- The `generations_completed` attribute is assigned by the number of the last completed generation. +- If there is a callback function assigned to the `on_generation` attribute, then it will be called. + +After the `run()` method completes, the following takes place: + +- The `best_solution_generation` is assigned the generation number at which the best fitness value is reached. +- The `run_completed` attribute is set to `True`. + +## Parent Selection Methods + +The `ParentSelection` class in the `pygad.utils.parent_selection` module has several methods for selecting the parents that will mate to produce the offspring. All of such methods accept the same parameters which are: + +* `fitness`: The fitness values of the solutions in the current population. +* `num_parents`: The number of parents to be selected. + +All of such methods return an array of the selected parents. + +The next subsections list the supported methods for parent selection. + +### `steady_state_selection()` + +Selects the parents using the steady-state selection technique. + +### `rank_selection()` + +Selects the parents using the rank selection technique. + +### `random_selection()` + +Selects the parents randomly. + +### `tournament_selection()` + +Selects the parents using the tournament selection technique. + +### `roulette_wheel_selection()` + +Selects the parents using the roulette wheel selection technique. + +### `stochastic_universal_selection()` + +Selects the parents using the stochastic universal selection technique. + +### `nsga2_selection()` + +Selects the parents for the NSGA-II algorithm to solve multi-objective optimization problems. It selects the parents by ranking them based on non-dominated sorting and crowding distance. + +### `tournament_selection_nsga2()` + +Selects the parents for the NSGA-II algorithm to solve multi-objective optimization problems. It selects the parents using the tournament selection technique applied based on non-dominated sorting and crowding distance. + +## Crossover Methods + +The `Crossover` class in the `pygad.utils.crossover` module supports several methods for applying crossover between the selected parents. All of these methods accept the same parameters which are: + +* `parents`: The parents to mate for producing the offspring. +* `offspring_size`: The size of the offspring to produce. + +All of such methods return an array of the produced offspring. + +The next subsections list the supported methods for crossover. + +### `single_point_crossover()` + +Applies the single-point crossover. It selects a point randomly at which crossover takes place between the pairs of parents. + +### `two_points_crossover()` + +Applies the 2 points crossover. It selects the 2 points randomly at which crossover takes place between the pairs of parents. + +### `uniform_crossover()` + +Applies the uniform crossover. For each gene, a parent out of the 2 mating parents is selected randomly and the gene is copied from it. + +### `scattered_crossover()` + +Applies the scattered crossover. It randomly selects the gene from one of the 2 parents. + +## Mutation Methods + +The `Mutation` class in the `pygad.utils.mutation` module supports several methods for applying mutation. All of these methods accept the same parameter which is: + +* `offspring`: The offspring to mutate. + +All of such methods return an array of the mutated offspring. + +The next subsections list the supported methods for mutation. + +### `random_mutation()` + +Applies the random mutation which changes the values of some genes randomly. The number of genes is specified according to either the `mutation_num_genes` or the `mutation_percent_genes` attributes. + +For each gene, a random value is selected according to the range specified by the 2 attributes `random_mutation_min_val` and `random_mutation_max_val`. The random value is added to the selected gene. + +### `swap_mutation()` + +Applies the swap mutation which interchanges the values of 2 randomly selected genes. + +### `inversion_mutation()` + +Applies the inversion mutation which selects a subset of genes and inverts them. + +### `scramble_mutation()` + +Applies the scramble mutation which selects a subset of genes and shuffles their order randomly. + +### `adaptive_mutation()` + +Applies the adaptive mutation which selects the number/percentage of genes to mutate based on the solution's fitness. If the fitness is high (i.e. solution quality is high), then small number/percentage of genes is mutated compared to a solution with a low fitness. + +## `best_solution()` + +Returns information about the best solution found by the genetic algorithm. + +It accepts the following parameters: + +* `pop_fitness=None`: An optional parameter that accepts a list of the fitness values of the solutions in the population. If `None`, then the `cal_pop_fitness()` method is called to calculate the fitness values of the population. + +It returns the following: + +* `best_solution`: Best solution in the current population. + +* `best_solution_fitness`: Fitness value of the best solution. + +* `best_match_idx`: Index of the best solution in the current population. + +## `plot_fitness()` + +Previously named `plot_result()`, this method creates, shows, and returns a figure that summarizes how the fitness value evolves by generation. + +It works only after completing at least 1 generation. If no generation is completed (at least 1), an exception is raised. + +## `plot_new_solution_rate()` + +The `plot_new_solution_rate()` method creates, shows, and returns a figure that shows the number of new solutions explored in each generation. This method works only when `save_solutions=True` in the constructor of the `pygad.GA` class. + +It works only after completing at least 1 generation. If no generation is completed (at least 1), an exception is raised. + +## `plot_genes()` + +The `plot_genes()` method creates, shows, and returns a figure that describes each gene. It has different options to create the figures which helps to: + +1. Explore the gene value for each generation by creating a normal plot. +2. Create a histogram for each gene. +3. Create a boxplot. + +This is controlled by the `graph_type` parameter. + +It works only after completing at least 1 generation. If no generation is completed (at least 1), an exception is raised. + +## `save()` + +Saves the genetic algorithm instance + +Accepts the following parameter: + +* `filename`: Name of the file to save the instance. No extension is needed. + +# Functions in `pygad` + +Besides the methods available in the `pygad.GA` class, this section discusses the functions available in `pygad`. Up to this time, there is only a single function named `load()`. + +## `pygad.load()` + +Reads a saved instance of the genetic algorithm. This is not a method but a function that is indented under the `pygad` module. So, it could be called by the pygad module as follows: `pygad.load(filename)`. + +Accepts the following parameter: + +* `filename`: Name of the file holding the saved instance of the genetic algorithm. No extension is needed. + +Returns the genetic algorithm instance. + +# Steps to Use `pygad` + +To use the `pygad` module, here is a summary of the required steps: + +1. Preparing the `fitness_func` parameter. +2. Preparing Other Parameters. +4. Import `pygad`. +5. Create an Instance of the `pygad.GA` Class. +6. Run the Genetic Algorithm. +7. Plotting Results. +7. Information about the Best Solution. +8. Saving & Loading the Results. + +Let's discuss how to do each of these steps. + +## Preparing the `fitness_func` Parameter + +Even though some steps in the genetic algorithm pipeline can work the same regardless of the problem being solved, one critical step is the calculation of the fitness value. There is no unique way of calculating the fitness value and it changes from one problem to another. + +PyGAD has a parameter called `fitness_func` that allows the user to specify a custom function/method to use when calculating the fitness. This function/method must be a maximization function/method so that a solution with a high fitness value returned is selected compared to a solution with a low value. + +The fitness function is where the user can decide whether the optimization problem is single-objective or multi-objective. + +* If the fitness function returns a numeric value, then the problem is single-objective. The numeric data types supported by PyGAD are listed in the `supported_int_float_types` variable of the `pygad.GA` class. +* If the fitness function returns a `list`, `tuple`, or `numpy.ndarray`, then the problem is multi-objective. Even if there is only one element, the problem is still considered multi-objective. Each element represents the fitness value of its corresponding objective. + +Using a user-defined fitness function allows the user to freely use PyGAD to solve any problem by passing the appropriate fitness function/method. It is very important to understand the problem well before creating it. + +Let's discuss an example: + +> Given the following function: +> y = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6 +> where (x1,x2,x3,x4,x5,x6)=(4, -2, 3.5, 5, -11, -4.7) and y=44 +> What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize this function. + +So, the task is about using the genetic algorithm to find the best values for the 6 weight `W1` to `W6`. Thinking of the problem, it is clear that the best solution is that returning an output that is close to the desired output `y=44`. So, the fitness function/method should return a value that gets higher when the solution's output is closer to `y=44`. Here is a function that does that: + +```python +function_inputs = [4, -2, 3.5, 5, -11, -4.7] # Function inputs. +desired_output = 44 # Function output. + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / numpy.abs(output - desired_output) + return fitness +``` + +Because the fitness function returns a numeric value, then the problem is single-objective. + +Such a user-defined function must accept 3 parameters: + +1. The instance of the `pygad.GA` class. This helps the user to fetch any property that helps when calculating the fitness. +2. The solution(s) to calculate the fitness value(s). Note that the fitness function can accept multiple solutions only if the `fitness_batch_size` is given a value greater than 1. +3. The indices of the solutions in the population. The number of indices also depends on the `fitness_batch_size` parameter. + +If a method is passed to the `fitness_func` parameter, then it accepts a fourth parameter representing the method's instance. + +The `__code__` object is used to check if this function accepts the required number of parameters. If more or fewer parameters are passed, an exception is thrown. + +By creating this function, you did a very important step towards using PyGAD. + +### Preparing Other Parameters + +Here is an example for preparing the other parameters: + +```python +num_generations = 50 +num_parents_mating = 4 + +fitness_function = fitness_func + +sol_per_pop = 8 +num_genes = len(function_inputs) + +init_range_low = -2 +init_range_high = 5 + +parent_selection_type = "sss" +keep_parents = 1 + +crossover_type = "single_point" + +mutation_type = "random" +mutation_percent_genes = 10 +``` + +### The `on_generation` Parameter + +An optional parameter named `on_generation` is supported which allows the user to call a function (with a single parameter) after each generation. Here is a simple function that just prints the current generation number and the fitness value of the best solution in the current generation. The `generations_completed` attribute of the GA class returns the number of the last completed generation. + +```python +def on_gen(ga_instance): + print("Generation : ", ga_instance.generations_completed) + print("Fitness of the best solution :", ga_instance.best_solution()[1]) +``` + +After being defined, the function is assigned to the `on_generation` parameter of the GA class constructor. By doing that, the `on_gen()` function will be called after each generation. + +```python +ga_instance = pygad.GA(..., + on_generation=on_gen, + ...) +``` + +After the parameters are prepared, we can import PyGAD and build an instance of the `pygad.GA` class. + +## Import `pygad` + +The next step is to import PyGAD as follows: + +```python +import pygad +``` + +The `pygad.GA` class holds the implementation of all methods for running the genetic algorithm. + +## Create an Instance of the `pygad.GA` Class + +The `pygad.GA` class is instantiated where the previously prepared parameters are fed to its constructor. The constructor is responsible for creating the initial population. + +```python +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + fitness_func=fitness_function, + sol_per_pop=sol_per_pop, + num_genes=num_genes, + init_range_low=init_range_low, + init_range_high=init_range_high, + parent_selection_type=parent_selection_type, + keep_parents=keep_parents, + crossover_type=crossover_type, + mutation_type=mutation_type, + mutation_percent_genes=mutation_percent_genes) +``` + +## Run the Genetic Algorithm + +After an instance of the `pygad.GA` class is created, the next step is to call the `run()` method as follows: + +```python +ga_instance.run() +``` + +Inside this method, the genetic algorithm evolves over some generations by doing the following tasks: + +1. Calculating the fitness values of the solutions within the current population. +2. Select the best solutions as parents in the mating pool. +3. Apply the crossover & mutation operation +4. Repeat the process for the specified number of generations. + +## Plotting Results + +There is a method named `plot_fitness()` which creates a figure summarizing how the fitness values of the solutions change with the generations. + +```python +ga_instance.plot_fitness() +``` + +![Fig02](https://user-images.githubusercontent.com/16560492/78830005-93111d00-79e7-11ea-9d8e-a8d8325a6101.png) + +## Information about the Best Solution + +The following information about the best solution in the last population is returned using the `best_solution()` method. + +- Solution +- Fitness value of the solution +- Index of the solution within the population + +```python +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") +``` + +Using the `best_solution_generation` attribute of the instance from the `pygad.GA` class, the generation number at which the `best fitness` is reached could be fetched. + +```python +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") +``` + +## Saving & Loading the Results + +After the `run()` method completes, it is possible to save the current instance of the genetic algorithm to avoid losing the progress made. The `save()` method is available for that purpose. Just pass the file name to it without an extension. According to the next code, a file named `genetic.pkl` will be created and saved in the current directory. + +```python +filename = 'genetic' +ga_instance.save(filename=filename) +``` + +You can also load the saved model using the `load()` function and continue using it. For example, you might run the genetic algorithm for some generations, save its current state using the `save()` method, load the model using the `load()` function, and then call the `run()` method again. + +```python +loaded_ga_instance = pygad.load(filename=filename) +``` + +After the instance is loaded, you can use it to run any method or access any property. + +```python +print(loaded_ga_instance.best_solution()) +``` + +# Life Cycle of PyGAD + +The next figure lists the different stages in the lifecycle of an instance of the `pygad.GA` class. Note that PyGAD stops when either all generations are completed or when the function passed to the `on_generation` parameter returns the string `stop`. + +![PyGAD Lifecycle](https://user-images.githubusercontent.com/16560492/220486073-c5b6089d-81e4-44d9-a53c-385f479a7273.jpg) + +The next code implements all the callback functions to trace the execution of the genetic algorithm. Each callback function prints its name. + +```python +import pygad +import numpy + +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +fitness_function = fitness_func + +def on_start(ga_instance): + print("on_start()") + +def on_fitness(ga_instance, population_fitness): + print("on_fitness()") + +def on_parents(ga_instance, selected_parents): + print("on_parents()") + +def on_crossover(ga_instance, offspring_crossover): + print("on_crossover()") + +def on_mutation(ga_instance, offspring_mutation): + print("on_mutation()") + +def on_generation(ga_instance): + print("on_generation()") + +def on_stop(ga_instance, last_population_fitness): + print("on_stop()") + +ga_instance = pygad.GA(num_generations=3, + num_parents_mating=5, + fitness_func=fitness_function, + sol_per_pop=10, + num_genes=len(function_inputs), + on_start=on_start, + on_fitness=on_fitness, + on_parents=on_parents, + on_crossover=on_crossover, + on_mutation=on_mutation, + on_generation=on_generation, + on_stop=on_stop) + +ga_instance.run() +``` + +Based on the used 3 generations as assigned to the `num_generations` argument, here is the output. + +``` +on_start() + +on_fitness() +on_parents() +on_crossover() +on_mutation() +on_generation() + +on_fitness() +on_parents() +on_crossover() +on_mutation() +on_generation() + +on_fitness() +on_parents() +on_crossover() +on_mutation() +on_generation() + +on_stop() +``` + +# Examples + +This section gives the complete code of some examples that use `pygad`. Each subsection builds a different example. + +## Linear Model Optimization - Single Objective + +This example is discussed in the [Steps to Use PyGAD](https://pygad.readthedocs.io/en/latest/pygad.html#steps-to-use-pygad) section which optimizes a linear model. Its complete code is listed below. + +```python +import pygad +import numpy + +""" +Given the following function: + y = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6 + where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=44 +What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize this function. +""" + +function_inputs = [4,-2,3.5,5,-11,-4.7] # Function inputs. +desired_output = 44 # Function output. + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +num_generations = 100 # Number of generations. +num_parents_mating = 10 # Number of solutions to be selected as parents in the mating pool. + +sol_per_pop = 20 # Number of solutions in the population. +num_genes = len(function_inputs) + +last_fitness = 0 +def on_generation(ga_instance): + global last_fitness + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}") + print(f"Change = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1] - last_fitness}") + last_fitness = ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1] + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + sol_per_pop=sol_per_pop, + num_genes=num_genes, + fitness_func=fitness_func, + on_generation=on_generation) + +# Running the GA to optimize the parameters of the function. +ga_instance.run() + +ga_instance.plot_fitness() + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness) +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +prediction = numpy.sum(numpy.array(function_inputs)*solution) +print(f"Predicted output based on the best solution : {prediction}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +# Saving the GA instance. +filename = 'genetic' # The filename to which the instance is saved. The name is without extension. +ga_instance.save(filename=filename) + +# Loading the saved GA instance. +loaded_ga_instance = pygad.load(filename=filename) +loaded_ga_instance.plot_fitness() +``` + +## Linear Model Optimization - Multi-Objective + +This is a multi-objective optimization example that optimizes these 2 functions: + +1. `y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6` +2. `y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12` + +Where: + +1. `(x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)` and `y=50` +2. `(x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5)` and `y=30` + +The 2 functions use the same parameters (weights) `w1` to `w6`. + +The goal is to use PyGAD to find the optimal values for such weights that satisfy the 2 functions `y1` and `y2`. + +To use PyGAD to solve multi-objective problems, the only adjustment is to return a `list`, `tuple`, or `numpy.ndarray` from the fitness function. Each element represents the fitness of an objective in order. That is the first element is the fitness of the first objective, the second element is the fitness for the second objective, and so on. + +```python +import pygad +import numpy + +""" +Given these 2 functions: + y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6 + y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12 + where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50 + and (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30 +What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize these 2 functions. +This is a multi-objective optimization problem. + +PyGAD considers the problem as multi-objective if the fitness function returns: + 1) List. + 2) Or tuple. + 3) Or numpy.ndarray. +""" + +function_inputs1 = [4,-2,3.5,5,-11,-4.7] # Function 1 inputs. +function_inputs2 = [-2,0.7,-9,1.4,3,5] # Function 2 inputs. +desired_output1 = 50 # Function 1 output. +desired_output2 = 30 # Function 2 output. + +def fitness_func(ga_instance, solution, solution_idx): + output1 = numpy.sum(solution*function_inputs1) + output2 = numpy.sum(solution*function_inputs2) + fitness1 = 1.0 / (numpy.abs(output1 - desired_output1) + 0.000001) + fitness2 = 1.0 / (numpy.abs(output2 - desired_output2) + 0.000001) + return [fitness1, fitness2] + +num_generations = 100 +num_parents_mating = 10 + +sol_per_pop = 20 +num_genes = len(function_inputs1) + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + sol_per_pop=sol_per_pop, + num_genes=num_genes, + fitness_func=fitness_func, + parent_selection_type='nsga2') + +ga_instance.run() + +ga_instance.plot_fitness(label=['Obj 1', 'Obj 2']) + +solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness) +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") + +prediction = numpy.sum(numpy.array(function_inputs1)*solution) +print(f"Predicted output 1 based on the best solution : {prediction}") +prediction = numpy.sum(numpy.array(function_inputs2)*solution) +print(f"Predicted output 2 based on the best solution : {prediction}") +``` + +This is the result of the print statements. The predicted outputs are close to the desired outputs. + +``` +Parameters of the best solution : [ 0.79676439 -2.98823386 -4.12677662 5.70539445 -2.02797016 -1.07243922] +Fitness value of the best solution = [ 1.68090829 349.8591915 ] +Predicted output 1 based on the best solution : 50.59491545442283 +Predicted output 2 based on the best solution : 29.99714270722312 +``` + +This is the figure created by the `plot_fitness()` method. The fitness of the first objective has the green color. The blue color is used for the second objective fitness. + +![multi-objective-pygad](https://github.com/ahmedfgad/GeneticAlgorithmPython/assets/16560492/7896f8d8-01c5-4ff9-8d15-52191c309b63) + +## Reproducing Images + +This project reproduces a single image using PyGAD by evolving pixel values. This project works with both color and gray images. Check this project at [GitHub](https://github.com/ahmedfgad/GARI): https://github.com/ahmedfgad/GARI. + +For more information about this project, read this tutorial titled [Reproducing Images using a Genetic Algorithm with Python](https://www.linkedin.com/pulse/reproducing-images-using-genetic-algorithm-python-ahmed-gad) available at these links: + +- [Heartbeat](https://heartbeat.fritz.ai/reproducing-images-using-a-genetic-algorithm-with-python-91fc701ff84): https://heartbeat.fritz.ai/reproducing-images-using-a-genetic-algorithm-with-python-91fc701ff84 +- [LinkedIn](https://www.linkedin.com/pulse/reproducing-images-using-genetic-algorithm-python-ahmed-gad): https://www.linkedin.com/pulse/reproducing-images-using-genetic-algorithm-python-ahmed-gad + +### Project Steps + +The steps to follow in order to reproduce an image are as follows: + +- Read an image +- Prepare the fitness function +- Create an instance of the pygad.GA class with the appropriate parameters +- Run PyGAD +- Plot results +- Calculate some statistics + +The next sections discusses the code of each of these steps. + +### Read an Image + +There is an image named `fruit.jpg` in the [GARI project](https://github.com/ahmedfgad/GARI) which is read according to the next code. + +```python +import imageio +import numpy + +target_im = imageio.imread('fruit.jpg') +target_im = numpy.asarray(target_im/255, dtype=float) +``` + +Here is the read image. + +![fruit](https://user-images.githubusercontent.com/16560492/36948808-f0ac882e-1fe8-11e8-8d07-1307e3477fd0.jpg) + +Based on the chromosome representation used in the example, the pixel values can be either in the 0-255, 0-1, or any other ranges. + +Note that the range of pixel values affect other parameters like the range from which the random values are selected during mutation and also the range of the values used in the initial population. So, be consistent. + +### Prepare the Fitness Function + +The next code creates a function that will be used as a fitness function for calculating the fitness value for each solution in the population. This function must be a maximization function that accepts 3 parameters representing the instance of the `pygad.GA` class, a solution, and its index. It returns a value representing the fitness value. + +```python +import gari + +target_chromosome = gari.img2chromosome(target_im) + +def fitness_fun(ga_instance, solution, solution_idx): + fitness = numpy.sum(numpy.abs(target_chromosome-solution)) + + # Negating the fitness value to make it increasing rather than decreasing. + fitness = numpy.sum(target_chromosome) - fitness + return fitness +``` + +The fitness value is calculated using the sum of absolute difference between genes values in the original and reproduced chromosomes. The `gari.img2chromosome()` function is called before the fitness function to represent the image as a vector because the genetic algorithm can work with 1D chromosomes. + +The implementation of the `gari` module is available at the [GARI GitHub project](https://github.com/ahmedfgad/GARI/blob/master/gari.py) and its code is listed below. + + ```python +import numpy +import functools +import operator + +def img2chromosome(img_arr): + return numpy.reshape(a=img_arr, newshape=(functools.reduce(operator.mul, img_arr.shape))) + +def chromosome2img(vector, shape): + if len(vector) != functools.reduce(operator.mul, shape): + raise ValueError(f"A vector of length {len(vector)} into an array of shape {shape}.") + + return numpy.reshape(a=vector, newshape=shape) + ``` + +### Create an Instance of the `pygad.GA` Class + +It is very important to use random mutation and set the `mutation_by_replacement` to `True`. Based on the range of pixel values, the values assigned to the `init_range_low`, `init_range_high`, `random_mutation_min_val`, and `random_mutation_max_val` parameters should be changed. + +If the image pixel values range from 0 to 255, then set `init_range_low` and `random_mutation_min_val` to 0 as they are but change `init_range_high` and `random_mutation_max_val` to 255. + +Feel free to change the other parameters or add other parameters. Please check the [PyGAD's documentation](https://pygad.readthedocs.io) for the full list of parameters. + +```python +import pygad + +ga_instance = pygad.GA(num_generations=20000, + num_parents_mating=10, + fitness_func=fitness_fun, + sol_per_pop=20, + num_genes=target_im.size, + init_range_low=0.0, + init_range_high=1.0, + mutation_percent_genes=0.01, + mutation_type="random", + mutation_by_replacement=True, + random_mutation_min_val=0.0, + random_mutation_max_val=1.0) +``` + +### Run PyGAD + +Simply, call the `run()` method to run PyGAD. + +```python +ga_instance.run() +``` + +### Plot Results + +After the `run()` method completes, the fitness values of all generations can be viewed in a plot using the `plot_fitness()` method. + +```python +ga_instance.plot_fitness() +``` + +Here is the plot after 20,000 generations. + +![Fitness Values](https://user-images.githubusercontent.com/16560492/82232124-77762c00-992e-11ea-9fc6-14a1cd7a04ff.png) + +### Calculate Some Statistics + +Here is some information about the best solution. + +```python +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +if ga_instance.best_solution_generation != -1: + print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") + +result = gari.chromosome2img(solution, target_im.shape) +matplotlib.pyplot.imshow(result) +matplotlib.pyplot.title("PyGAD & GARI for Reproducing Images") +matplotlib.pyplot.show() +``` + +### Evolution by Generation + +The solution reached after the 20,000 generations is shown below. + +![solution](https://user-images.githubusercontent.com/16560492/82232405-e0f63a80-992e-11ea-984f-b6ed76465bd1.png) + +After more generations, the result can be enhanced like what shown below. + +![solution](https://user-images.githubusercontent.com/16560492/82232345-cf149780-992e-11ea-8390-bf1a57a19de7.png) + +The results can also be enhanced by changing the parameters passed to the constructor of the `pygad.GA` class. + +Here is how the image is evolved from generation 0 to generation 20,000s. + +Generation 0 + +![solution_0](https://user-images.githubusercontent.com/16560492/36948589-b47276f0-1fe5-11e8-8efe-0cd1a225ea3a.png) + +Generation 1,000 + +![solution_1000](https://user-images.githubusercontent.com/16560492/36948823-16f490ee-1fe9-11e8-97db-3e8905ad5440.png) + +Generation 2,500 + +![solution_2500](https://user-images.githubusercontent.com/16560492/36948832-3f314b60-1fe9-11e8-8f4a-4d9a53b99f3d.png) + +Generation 4,500 + +![solution_4500](https://user-images.githubusercontent.com/16560492/36948837-53d1849a-1fe9-11e8-9b36-e9e9291e347b.png) + +Generation 7,000 + +![solution_7000](https://user-images.githubusercontent.com/16560492/36948852-66f1b176-1fe9-11e8-9f9b-460804e94004.png) + +Generation 8,000 + +![solution_8500](https://user-images.githubusercontent.com/16560492/36948865-7fbb5158-1fe9-11e8-8c04-8ac3c1f7b1b1.png) + +Generation 20,000 + +![solution](https://user-images.githubusercontent.com/16560492/82232405-e0f63a80-992e-11ea-984f-b6ed76465bd1.png) + +## Clustering + +For a 2-cluster problem, the code is available [here](https://github.com/ahmedfgad/GeneticAlgorithmPython/blob/master/example_clustering_2.py). For a 3-cluster problem, the code is [here](https://github.com/ahmedfgad/GeneticAlgorithmPython/blob/master/example_clustering_3.py). The 2 examples are using artificial samples. + +Soon a tutorial will be published at [Paperspace](https://blog.paperspace.com/author/ahmed) to explain how clustering works using the genetic algorithm with examples in PyGAD. + +## CoinTex Game Playing using PyGAD + +The code is available the [CoinTex GitHub project](https://github.com/ahmedfgad/CoinTex/tree/master/PlayerGA). CoinTex is an Android game written in Python using the Kivy framework. Find CoinTex at [Google Play](https://play.google.com/store/apps/details?id=coin.tex.cointexreactfast): https://play.google.com/store/apps/details?id=coin.tex.cointexreactfast + +Check this [Paperspace tutorial](https://blog.paperspace.com/building-agent-for-cointex-using-genetic-algorithm) for how the genetic algorithm plays CoinTex: https://blog.paperspace.com/building-agent-for-cointex-using-genetic-algorithm. Check also this [YouTube video](https://youtu.be/Sp_0RGjaL-0) showing the genetic algorithm while playing CoinTex. + diff --git a/docs/md/pygad_more.md b/docs/md/pygad_more.md new file mode 100644 index 0000000..def0fc4 --- /dev/null +++ b/docs/md/pygad_more.md @@ -0,0 +1,1910 @@ +# More About PyGAD + +# Multi-Objective Optimization + +In [PyGAD 3.2.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-2-0), the library supports multi-objective optimization using the non-dominated sorting genetic algorithm II (NSGA-II). The code is exactly similar to the regular code used for single-objective optimization except for 1 difference. It is the return value of the fitness function. + +In single-objective optimization, the fitness function returns a single numeric value. In this example, the variable `fitness` is expected to be a numeric value. + +```python +def fitness_func(ga_instance, solution, solution_idx): + ... + return fitness +``` + +But in multi-objective optimization, the fitness function returns any of these data types: + +1. `list` +2. `tuple` +3. `numpy.ndarray` + +```python +def fitness_func(ga_instance, solution, solution_idx): + ... + return [fitness1, fitness2, ..., fitnessN] +``` + +Whenever the fitness function returns an iterable of these data types, then the problem is considered multi-objective. This holds even if there is a single element in the returned iterable. + +Other than the fitness function, everything else could be the same in both single and multi-objective problems. + +But it is recommended to use one of these 2 parent selection operators to solve multi-objective problems: + +1. `nsga2`: This selects the parents based on non-dominated sorting and crowding distance. +2. `tournament_nsga2`: This selects the parents using tournament selection which uses non-dominated sorting and crowding distance to rank the solutions. + +This is a multi-objective optimization example that optimizes these 2 linear functions: + +1. `y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6` +2. `y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12` + +Where: + +1. `(x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)` and `y=50` +2. `(x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5)` and `y=30` + +The 2 functions use the same parameters (weights) `w1` to `w6`. + +The goal is to use PyGAD to find the optimal values for such weights that satisfy the 2 functions `y1` and `y2`. + +```python +import pygad +import numpy + +""" +Given these 2 functions: + y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6 + y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12 + where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50 + and (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30 +What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize these 2 functions. +This is a multi-objective optimization problem. + +PyGAD considers the problem as multi-objective if the fitness function returns: + 1) List. + 2) Or tuple. + 3) Or numpy.ndarray. +""" + +function_inputs1 = [4,-2,3.5,5,-11,-4.7] # Function 1 inputs. +function_inputs2 = [-2,0.7,-9,1.4,3,5] # Function 2 inputs. +desired_output1 = 50 # Function 1 output. +desired_output2 = 30 # Function 2 output. + +def fitness_func(ga_instance, solution, solution_idx): + output1 = numpy.sum(solution*function_inputs1) + output2 = numpy.sum(solution*function_inputs2) + fitness1 = 1.0 / (numpy.abs(output1 - desired_output1) + 0.000001) + fitness2 = 1.0 / (numpy.abs(output2 - desired_output2) + 0.000001) + return [fitness1, fitness2] + +num_generations = 100 +num_parents_mating = 10 + +sol_per_pop = 20 +num_genes = len(function_inputs1) + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + sol_per_pop=sol_per_pop, + num_genes=num_genes, + fitness_func=fitness_func, + parent_selection_type='nsga2') + +ga_instance.run() + +ga_instance.plot_fitness(label=['Obj 1', 'Obj 2']) + +solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness) +print(f"Parameters of the best solution : {solution}") +print(f"Fitness value of the best solution = {solution_fitness}") + +prediction = numpy.sum(numpy.array(function_inputs1)*solution) +print(f"Predicted output 1 based on the best solution : {prediction}") +prediction = numpy.sum(numpy.array(function_inputs2)*solution) +print(f"Predicted output 2 based on the best solution : {prediction}") +``` + +This is the result of the print statements. The predicted outputs are close to the desired outputs. + +``` +Parameters of the best solution : [ 0.79676439 -2.98823386 -4.12677662 5.70539445 -2.02797016 -1.07243922] +Fitness value of the best solution = [ 1.68090829 349.8591915 ] +Predicted output 1 based on the best solution : 50.59491545442283 +Predicted output 2 based on the best solution : 29.99714270722312 +``` + +This is the figure created by the `plot_fitness()` method. The fitness of the first objective has the green color. The blue color is used for the second objective fitness. + +![multi-objective-pygad](https://github.com/ahmedfgad/GeneticAlgorithmPython/assets/16560492/7896f8d8-01c5-4ff9-8d15-52191c309b63) + +# Limit the Gene Value Range using the `gene_space` Parameter + +In [PyGAD 2.11.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-11-0), the `gene_space` parameter supported a new feature to allow customizing the range of accepted values for each gene. Let's take a quick review of the `gene_space` parameter to build over it. + +The `gene_space` parameter allows the user to feed the space of values of each gene. This way the accepted values for each gene is retracted to the user-defined values. Assume there is a problem that has 3 genes where each gene has different set of values as follows: + +1. Gene 1: `[0.4, 12, -5, 21.2]` +2. Gene 2: `[-2, 0.3]` +3. Gene 3: `[1.2, 63.2, 7.4]` + +Then, the `gene_space` for this problem is as given below. Note that the order is very important. + +```python +gene_space = [[0.4, 12, -5, 21.2], + [-2, 0.3], + [1.2, 63.2, 7.4]] +``` + +In case all genes share the same set of values, then simply feed a single list to the `gene_space` parameter as follows. In this case, all genes can only take values from this list of 6 values. + +```python +gene_space = [33, 7, 0.5, 95. 6.3, 0.74] +``` + +The previous example restricts the gene values to just a set of fixed number of discrete values. In case you want to use a range of discrete values to the gene, then you can use the `range()` function. For example, `range(1, 7)` means the set of allowed values for the gene are `1, 2, 3, 4, 5, and 6`. You can also use the `numpy.arange()` or `numpy.linspace()` functions for the same purpose. + +The previous discussion only works with a range of discrete values not continuous values. In [PyGAD 2.11.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-11-0), the `gene_space` parameter can be assigned a dictionary that allows the gene to have values from a continuous range. + +Assuming you want to restrict the gene within this half-open range [1 to 5) where 1 is included and 5 is not. Then simply create a dictionary with 2 items where the keys of the 2 items are: + +1. `'low'`: The minimum value in the range which is 1 in the example. +2. `'high'`: The maximum value in the range which is 5 in the example. + +The dictionary will look like that: + +```python +{'low': 1, + 'high': 5} +``` + +It is not acceptable to add more than 2 items in the dictionary or use other keys than `'low'` and `'high'`. + +For a 3-gene problem, the next code creates a dictionary for each gene to restrict its values in a continuous range. For the first gene, it can take any floating-point value from the range that starts from 1 (inclusive) and ends at 5 (exclusive). + +```python +gene_space = [{'low': 1, 'high': 5}, {'low': 0.3, 'high': 1.4}, {'low': -0.2, 'high': 4.5}] +``` + +# More about the `gene_space` Parameter + +The `gene_space` parameter customizes the space of values of each gene. + +Assuming that all genes have the same global space which include the values 0.3, 5.2, -4, and 8, then those values can be assigned to the `gene_space` parameter as a list, tuple, or range. Here is a list assigned to this parameter. By doing that, then the gene values are restricted to those assigned to the `gene_space` parameter. + +```python +gene_space = [0.3, 5.2, -4, 8] +``` + +If some genes have different spaces, then `gene_space` should accept a nested list or tuple. In this case, the elements could be: + +1. Number (of `int`, `float`, or `NumPy` data types): A single value to be assigned to the gene. This means this gene will have the same value across all generations. +2. `list`, `tuple`, `numpy.ndarray`, or any range like `range`, `numpy.arange()`, or `numpy.linspace`: It holds the space for each individual gene. But this space is usually discrete. That is there is a set of finite values to select from. +3. `dict`: To sample a value for a gene from a continuous range. The dictionary must have 2 mandatory keys which are `"low"` and `"high"` in addition to an optional key which is `"step"`. A random value is returned between the values assigned to the items with `"low"` and `"high"` keys. If the `"step"` exists, then this works as the previous options (i.e. discrete set of values). +4. `None`: A gene with its space set to `None` is initialized randomly from the range specified by the 2 parameters `init_range_low` and `init_range_high`. For mutation, its value is mutated based on a random value from the range specified by the 2 parameters `random_mutation_min_val` and `random_mutation_max_val`. If all elements in the `gene_space` parameter are `None`, the parameter will not have any effect. + +Assuming that a chromosome has 2 genes and each gene has a different value space. Then the `gene_space` could be assigned a nested list/tuple where each element determines the space of a gene. + +According to the next code, the space of the first gene is `[0.4, -5]` which has 2 values and the space for the second gene is `[0.5, -3.2, 8.8, -9]` which has 4 values. + +```python +gene_space = [[0.4, -5], [0.5, -3.2, 8.2, -9]] +``` + +For a 2 gene chromosome, if the first gene space is restricted to the discrete values from 0 to 4 and the second gene is restricted to the values from 10 to 19, then it could be specified according to the next code. + +```python +gene_space = [range(5), range(10, 20)] +``` + +The `gene_space` can also be assigned to a single range, as given below, where the values of all genes are sampled from the same range. + +```python +gene_space = numpy.arange(15) +``` + + The `gene_space` can be assigned a dictionary to sample a value from a continuous range. + +```python +gene_space = {"low": 4, "high": 30} +``` + + A step also can be assigned to the dictionary. This works as if a range is used. + +```python +gene_space = {"low": 4, "high": 30, "step": 2.5} +``` + +> Setting a `dict` like `{"low": 0, "high": 10}` in the `gene_space` means that random values from the continuous range [0, 10) are sampled. Note that `0` is included but `10` is not included while sampling. Thus, the maximum value that could be returned is less than `10` like `9.9999`. But if the user decided to round the genes using, for example, `[float, 2]`, then this value will become 10. So, the user should be careful to the inputs. + +If a `None` is assigned to only a single gene, then its value will be randomly generated initially using the `init_range_low` and `init_range_high` parameters in the `pygad.GA` class's constructor. During mutation, the value are sampled from the range defined by the 2 parameters `random_mutation_min_val` and `random_mutation_max_val`. This is an example where the second gene is given a `None` value. + +```python +gene_space = [range(5), None, numpy.linspace(10, 20, 300)] +``` + +If the user did not assign the initial population to the `initial_population` parameter, the initial population is created randomly based on the `gene_space` parameter. Moreover, the mutation is applied based on this parameter. + +## How Mutation Works with the `gene_space` Parameter? + +Mutation changes based on whether the `gene_space` has a continuous range or discrete set of values. + +If a gene has its **static/discrete space** defined in the `gene_space` parameter, then mutation works by replacing the gene value by a value randomly selected from the gene space. This happens for both `int` and `float` data types. + +For example, the following `gene_space` has the static space `[1, 2, 3]` defined for the first gene. So, this gene can only have a value out of these 3 values. + +```python +Gene space: [[1, 2, 3], + None] +Solution: [1, 5] +``` + +For a solution like `[1, 5]`, then mutation happens for the first gene by simply replacing its current value by a randomly selected value (other than its current value if possible). So, the value 1 will be replaced by either 2 or 3. + +For the second gene, its space is set to `None`. So, traditional mutation happens for this gene by: + +1. Generating a random value from the range defined by the `random_mutation_min_val` and `random_mutation_max_val` parameters. +2. Adding this random value to the current gene's value. + +If its current value is 5 and the random value is `-0.5`, then the new value is 4.5. If the gene type is integer, then the value will be rounded. + +On the other hand, if a gene has a **continuous space** defined in the `gene_space` parameter, then mutation occurs by adding a random value to the current gene value. + +For example, the following `gene_space` has the continuous space defined by the dictionary `{'low': 1, 'high': 5}`. This applies to all genes. So, mutation is applied to one or more selected genes by adding a random value to the current gene value. + +```python +Gene space: {'low': 1, 'high': 5} +Solution: [1.5, 3.4] +``` + +Assuming `random_mutation_min_val=-1` and `random_mutation_max_val=1`, then a random value such as `0.3` can be added to the gene(s) participating in mutation. If only the first gene is mutated, then its new value changes from `1.5` to `1.5+0.3=1.8`. Note that PyGAD verifies that the new value is within the range. In the worst scenarios, the value will be set to either boundary of the continuous range. For example, if the gene value is 1.5 and the random value is -0.55, then the new value is 0.95 which smaller than the lower boundary 1. Thus, the gene value will be rounded to 1. + +If the dictionary has a step like the example below, then it is considered a discrete range and mutation occurs by randomly selecting a value from the set of values. In other words, no random value is added to the gene value. + +```python +Gene space: {'low': 1, 'high': 5, 'step': 0.5} +``` + +# Gene Constraint + +In [PyGAD 3.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-5-0), a new parameter called `gene_constraint` is added to the constructor of the `pygad.GA` class. An instance attribute of the same name is created for any instance of the `pygad.GA` class. + +The `gene_constraint` parameter allows the users to define constraints to be enforced (as much as possible) when selecting a value for a gene. For example, this constraint is enforced when applying mutation to make sure the new gene value after mutation meets the gene constraint. + +The default value of this parameter is `None` which means no genes have constraints. It can be assigned a list but the length of this list must be equal to the number of genes as specified by the `num_gene` parameter. + +When assigned a list, the allowed values for each element are: + +1. `None`: No constraint for the gene. +2. `callable`: A callable/function that accepts 2 parameters: + 1. The solution where the gene exists. + 2. A list or NumPy array of candidate values for the gene. + +It is the user's responsibility to build such callables to filter the passed list of values and return a new list with the values that meets the gene constraint. If no value meets the constraint, return an empty list or NumPy array. + +For example, if the gene must be smaller than 5, then use this callable: + +```python +lambda solution,values: [val for val in values if val<5] +``` + +The first parameter is the solution where the target gene exists. It is passed just in case you would like to compare the gene value with other genes. The second parameter is the list of candidate values for the gene. The objective of the lambda function is to filter the values and return only the valid values that are less than 5. + +A lambda function is used in this case but we can use a regular function: + +```python +def constraint_func(solution,values): + return [val for val in values if val<5] +``` + +Assuming `num_genes` is 2, then here is a valid value for the `gene_constraint` parameter. + +```python +import pygad + +def fitness_func(...): + ... + return fitness + +ga_instance = pygad.GA( + num_genes=2, + sample_size=200, + ... + gene_constraint= + [ + lambda solution,values: [val for val in values if val<5], + lambda solution,values: [val for val in values if val>[solution[0]] + ] +) +``` + +The first lambda function filters the values for the first gene by only considering the gene values that are less than 5. If the passed values is `[-5, 2, 6, 13, 3, 4, 0]`, then the returned filtered values will be `[-5, 2, 3, 4, 0]`. + +The constraint for the second gene makes sure the selected value is larger than the value of the first gene. Assuming the values for the 2 parameters are: + +1. `solution=[1, 4]` +2. `values=[17, 2, -1, 0.5, -2.1, 1.4]` + +Then the value of the first gene in the passed solution is `1`. By filtering the passed values using the callable corresponding to the second gene, then the returned values will be `[17, 2, 1.4]` because these are the only values that are larger than the first gene value of `1`. + +Sometimes it is normal for PyGAD to fail to find a gene value that satisfies the constraint. For example, if the possible gene values are only `[20,30,40]` and the gene constraint restricts the values to be greater than 50, then it is impossible to meet the constraint. + +For some other cases, the constraint can be met but with some changes. For example, increasing the range from which a value is sampled. If the `gene_space` is used and assigned `range(10)`, then the gene constraint can be met by using `range(50)` so that we can find values greater than 50. + +Even if the the gene space is already assigned `range(1000)`, it might still not find values meeting the constraints This is because PyGAD samples a number of values equal to the `sample_size` parameter which defaults to *100*. + +Out of the range of *1000* numbers, all the 100 values might not be satisfying the constraint. This issue could be solved by simply assigning a larger value for the `sample_size` parameter. + +> PyGAD does not yet handle the **dependencies** among the genes in the `gene_constraint` parameter. +> +> For example, gene 0 might depend on gene 1. To efficiently enforce the constraints, the constraint for gene 1 must be enforced first (if not `None`) then the constraint for gene 0. +> +> PyGAD applies constraints sequentially, starting from the first gene to the last. To ensure correct behavior when genes depend on each other, structure your GA problem so that if gene X depends on gene Y, then gene Y appears earlier in the chromosome (solution) than gene X. + +## Full Example + +For a full example, please check the [`examples/example_gene_constraint.py` script](https://github.com/ahmedfgad/GeneticAlgorithmPython/blob/master/examples/example_gene_constraint.py). + +# `sample_size` Parameter + +In [PyGAD 3.5.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-5-0), a new parameter called `sample_size`. It is used in some situations where PyGAD seeks a single value for a gene out of a range. Two of the important use cases are: + +1. Find a unique value for the gene. This is when the `allow_duplicate_genes` parameter is set to `False` to reject the duplicate gene values within the same solution. +2. Find a value that satisfies the `gene_constraint` parameter. + +Given that we are sampling values from a continuous range as defined by the 2 attributes: + +1. `random_mutation_min_val=0` +2. `random_mutation_max_val=100` + +PyGAD samples a fixed number of values out of this continuous range. The number of values in the sample is defined by the `sample_size` parameter which defaults to `100`. + +If the objective is to find a unique value or enforce the gene constraint, then the 100 values are filtered to keep only the values that keep the gene unique or meet the constraint. + +Sometimes 100 values is not enough and PyGAD sometimes fails to find a good value. In this case, it is highly recommended to increase the `sample_size` parameter. This is to create a larger sample to increase the chance of finding a value that meets our objectives. + +# Stop at Any Generation + +In [PyGAD 2.4.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-4-0), it is possible to stop the genetic algorithm after any generation. All you need to do it to return the string `"stop"` in the callback function `on_generation`. When this callback function is implemented and assigned to the `on_generation` parameter in the constructor of the `pygad.GA` class, then the algorithm immediately stops after completing its current generation. Let's discuss an example. + +Assume that the user wants to stop algorithm either after the 100 generations or if a condition is met. The user may assign a value of 100 to the `num_generations` parameter of the `pygad.GA` class constructor. + +The condition that stops the algorithm is written in a callback function like the one in the next code. If the fitness value of the best solution exceeds 70, then the string `"stop"` is returned. + + ```python +def func_generation(ga_instance): + if ga_instance.best_solution()[1] >= 70: + return "stop" + ``` + +# Stop Criteria + +In [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0), a new parameter named `stop_criteria` is added to the constructor of the `pygad.GA` class. It helps to stop the evolution based on some criteria. It can be assigned to one or more criterion. + +Each criterion is passed as `str` that consists of 2 parts: + +1. Stop word. +2. Number. + +It takes this form: + +```python +"word_num" +``` + +The current 2 supported words are `reach` and `saturate`. + +The `reach` word stops the `run()` method if the fitness value is equal to or greater than a given fitness value. An example for `reach` is `"reach_40"` which stops the evolution if the fitness is >= 40. + +`saturate` stops the evolution if the fitness saturates for a given number of consecutive generations. An example for `saturate` is `"saturate_7"` which means stop the `run()` method if the fitness does not change for 7 consecutive generations. + +Here is an example that stops the evolution if either the fitness value reached `127.4` or if the fitness saturates for `15` generations. + +```python +import pygad +import numpy + +equation_inputs = [4, -2, 3.5, 8, 9, 4] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + + return fitness + +ga_instance = pygad.GA(num_generations=200, + sol_per_pop=10, + num_parents_mating=4, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + stop_criteria=["reach_127.4", "saturate_15"]) + +ga_instance.run() +print(f"Number of generations passed is {ga_instance.generations_completed}") +``` + +## Multi-Objective Stop Criteria + +When multi-objective is used, then there are 2 options to use the `stop_criteria` parameter with the `reach` keyword: + +1. Pass a single value to use along the `reach` keyword to use across all the objectives. +2. Pass multiple values along the `reach` keyword. But the number of values must equal the number of objectives. + +For the `saturate` keyword, it is independent to the number of objectives. + +Suppose there are 3 objectives, this is a working example. It stops when the fitness value of the 3 objectives reach or exceed 10, 20, and 30, respectively. + +```python +stop_criteria='reach_10_20_30' +``` + +More than one criterion can be used together. In this case, pass the `stop_criteria` parameter as an iterable. This is an example. It stops when either of these 2 conditions hold: + +1. The fitness values of the 3 objectives reach or exceed 10, 20, and 30, respectively. +2. The fitness values of the 3 objectives reach or exceed 90, -5.7, and 10, respectively. + +```python +stop_criteria=['reach_10_20_30', 'reach_90_-5.7_10'] +``` + +# Elitism Selection + +In [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0), a new parameter called `keep_elitism` is supported. It accepts an integer to define the number of elitism (i.e. best solutions) to keep in the next generation. This parameter defaults to `1` which means only the best solution is kept in the next generation. + +In the next example, the `keep_elitism` parameter in the constructor of the `pygad.GA` class is set to 2. Thus, the best 2 solutions in each generation are kept in the next generation. + +```python +import numpy +import pygad + +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / numpy.abs(output - desired_output) + return fitness + +ga_instance = pygad.GA(num_generations=2, + num_parents_mating=3, + fitness_func=fitness_func, + num_genes=6, + sol_per_pop=5, + keep_elitism=2) + +ga_instance.run() +``` + +The value passed to the `keep_elitism` parameter must satisfy 2 conditions: + +1. It must be `>= 0`. +2. It must be `<= sol_per_pop`. That is its value cannot exceed the number of solutions in the current population. + +In the previous example, if the `keep_elitism` parameter is set equal to the value passed to the `sol_per_pop` parameter, which is 5, then there will be no evolution at all as in the next figure. This is because all the 5 solutions are used as elitism in the next generation and no offspring will be created. + +```python +... + +ga_instance = pygad.GA(..., + sol_per_pop=5, + keep_elitism=5) + +ga_instance.run() +``` + + + +![elitism_kills_evolution](https://user-images.githubusercontent.com/16560492/189273225-67ffad41-97ab-45e1-9324-429705e17b20.png) + +Note that if the `keep_elitism` parameter is effective (i.e. is assigned a positive integer, not zero), then the `keep_parents` parameter will have no effect. Because the default value of the `keep_elitism` parameter is 1, then the `keep_parents` parameter has no effect by default. The `keep_parents` parameter is only effective when `keep_elitism=0`. + +# Random Seed + +In [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0), a new parameter called `random_seed` is supported. Its value is used as a seed for the random function generators. + + PyGAD uses random functions in these 2 libraries: + +1. NumPy +2. random + +The `random_seed` parameter defaults to `None` which means no seed is used. As a result, different random numbers are generated for each run of PyGAD. + +If this parameter is assigned a proper seed, then the results will be reproducible. In the next example, the integer 2 is used as a random seed. + +```python +import numpy +import pygad + +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / numpy.abs(output - desired_output) + return fitness + +ga_instance = pygad.GA(num_generations=2, + num_parents_mating=3, + fitness_func=fitness_func, + sol_per_pop=5, + num_genes=6, + random_seed=2) + +ga_instance.run() +best_solution, best_solution_fitness, best_match_idx = ga_instance.best_solution() +print(best_solution) +print(best_solution_fitness) +``` + +This is the best solution found and its fitness value. + +``` +[ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267] +0.04872203136549972 +``` + +After running the code again, it will find the same result. + +``` +[ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267] +0.04872203136549972 +``` + +# Continue without Losing Progress + +In [PyGAD 2.18.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-18-0), and thanks for [Felix Bernhard](https://github.com/FeBe95) for opening [this GitHub issue](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/123#issuecomment-1203035106), the values of these 4 instance attributes are no longer reset after each call to the `run()` method. + +1. `self.best_solutions` +2. `self.best_solutions_fitness` +3. `self.solutions` +4. `self.solutions_fitness` + +This helps the user to continue where the last run stopped without losing the values of these 4 attributes. + +Now, the user can save the model by calling the `save()` method. + +```python +import pygad + +def fitness_func(ga_instance, solution, solution_idx): + ... + return fitness + +ga_instance = pygad.GA(...) + +ga_instance.run() + +ga_instance.plot_fitness() + +ga_instance.save("pygad_GA") +``` + +Then the saved model is loaded by calling the `load()` function. After calling the `run()` method over the loaded instance, then the data from the previous 4 attributes are not reset but extended with the new data. + +```python +import pygad + +def fitness_func(ga_instance, solution, solution_idx): + ... + return fitness + +loaded_ga_instance = pygad.load("pygad_GA") + +loaded_ga_instance.run() + +loaded_ga_instance.plot_fitness() +``` + +The plot created by the `plot_fitness()` method will show the data collected from both the runs. + +Note that the 2 attributes (`self.best_solutions` and `self.best_solutions_fitness`) only work if the `save_best_solutions` parameter is set to `True`. Also, the 2 attributes (`self.solutions` and `self.solutions_fitness`) only work if the `save_solutions` parameter is `True`. + +# Change Population Size during Runtime + +Starting from [PyGAD 3.3.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-3-0), the population size can changed during runtime. In other words, the number of solutions/chromosomes and number of genes can be changed. + +The user has to carefully arrange the list of *parameters* and *instance attributes* that have to be changed to keep the GA consistent before and after changing the population size. Generally, change everything that would be used during the GA evolution. + +> CAUTION: If the user failed to change a parameter or an instance attributes necessary to keep the GA running after the population size changed, errors will arise. + +These are examples of the parameters that the user should decide whether to change. The user should check the [list of parameters](https://pygad.readthedocs.io/en/latest/pygad.html#init) and decide what to change. + +1. `population`: The population. It *must* be changed. +2. `num_offspring`: The number of offspring to produce out of the crossover and mutation operations. Change this parameter if the number of offspring have to be changed to be consistent with the new population size. +3. `num_parents_mating`: The number of solutions to select as parents. Change this parameter if the number of parents have to be changed to be consistent with the new population size. +4. `fitness_func`: If the way of calculating the fitness changes after the new population size, then the fitness function have to be changed. +5. `sol_per_pop`: The number of solutions per population. It is not critical to change it but it is recommended to keep this number consistent with the number of solutions in the `population` parameter. + +These are examples of the instance attributes that might be changed. The user should check the [list of instance attributes](https://pygad.readthedocs.io/en/latest/pygad.html#other-instance-attributes-methods) and decide what to change. + +1. All the `last_generation_*` parameters + 1. `last_generation_fitness`: A 1D NumPy array of fitness values of the population. + 2. `last_generation_parents` and `last_generation_parents_indices`: Two NumPy arrays: 2D array representing the parents and 1D array of the parents indices. + 3. `last_generation_elitism` and `last_generation_elitism_indices`: Must be changed if `keep_elitism != 0`. The default value of `keep_elitism` is 1. Two NumPy arrays: 2D array representing the elitism and 1D array of the elitism indices. +2. `pop_size`: The population size. + +# Prevent Duplicates in Gene Values + +In [PyGAD 2.13.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-13-0), a new bool parameter called `allow_duplicate_genes` is supported to control whether duplicates are supported in the chromosome or not. In other words, whether 2 or more genes might have the same exact value. + +If `allow_duplicate_genes=True` (which is the default case), genes may have the same value. If `allow_duplicate_genes=False`, then no 2 genes will have the same value given that there are enough unique values for the genes. + +The next code gives an example to use the `allow_duplicate_genes` parameter. A callback generation function is implemented to print the population after each generation. + +```python +import pygad + +def fitness_func(ga_instance, solution, solution_idx): + return 0 + +def on_generation(ga): + print("Generation", ga.generations_completed) + print(ga.population) + +ga_instance = pygad.GA(num_generations=5, + sol_per_pop=5, + num_genes=4, + mutation_num_genes=3, + random_mutation_min_val=-5, + random_mutation_max_val=5, + num_parents_mating=2, + fitness_func=fitness_func, + gene_type=int, + on_generation=on_generation, + sample_size=200, + allow_duplicate_genes=False) +ga_instance.run() +``` + +Here are the population after the 5 generations. Note how there are no duplicate values. + +```python +Generation 1 +[[ 2 -2 -3 3] + [ 0 1 2 3] + [ 5 -3 6 3] + [-3 1 -2 4] + [-1 0 -2 3]] +Generation 2 +[[-1 0 -2 3] + [-3 1 -2 4] + [ 0 -3 -2 6] + [-3 0 -2 3] + [ 1 -4 2 4]] +Generation 3 +[[ 1 -4 2 4] + [-3 0 -2 3] + [ 4 0 -2 1] + [-4 0 -2 -3] + [-4 2 0 3]] +Generation 4 +[[-4 2 0 3] + [-4 0 -2 -3] + [-2 5 4 -3] + [-1 2 -4 4] + [-4 2 0 -3]] +Generation 5 +[[-4 2 0 -3] + [-1 2 -4 4] + [ 3 4 -4 0] + [-1 0 2 -2] + [-4 2 -1 1]] +``` + +The `allow_duplicate_genes` parameter is configured with use with the `gene_space` parameter. Here is an example where each of the 4 genes has the same space of values that consists of 4 values (1, 2, 3, and 4). + +```python +import pygad + +def fitness_func(ga_instance, solution, solution_idx): + return 0 + +def on_generation(ga): + print("Generation", ga.generations_completed) + print(ga.population) + +ga_instance = pygad.GA(num_generations=1, + sol_per_pop=5, + num_genes=4, + num_parents_mating=2, + fitness_func=fitness_func, + gene_type=int, + gene_space=[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], + on_generation=on_generation, + sample_size=200, + allow_duplicate_genes=False) +ga_instance.run() +``` + +Even that all the genes share the same space of values, no 2 genes duplicate their values as provided by the next output. + +```python +Generation 1 +[[2 3 1 4] + [2 3 1 4] + [2 4 1 3] + [2 3 1 4] + [1 3 2 4]] +Generation 2 +[[1 3 2 4] + [2 3 1 4] + [1 3 2 4] + [2 3 4 1] + [1 3 4 2]] +Generation 3 +[[1 3 4 2] + [2 3 4 1] + [1 3 4 2] + [3 1 4 2] + [3 2 4 1]] +Generation 4 +[[3 2 4 1] + [3 1 4 2] + [3 2 4 1] + [1 2 4 3] + [1 3 4 2]] +Generation 5 +[[1 3 4 2] + [1 2 4 3] + [2 1 4 3] + [1 2 4 3] + [1 2 4 3]] +``` + +You should care of giving enough values for the genes so that PyGAD is able to find alternatives for the gene value in case it duplicates with another gene. + +If PyGAD failed to find a unique gene while there is still room to find a unique value, one possible option is to set the `sample_size` parameter to a larger value. Check the [sample_size Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#sample-size-parameter) section for more information. + +## Limitation + +There might be 2 duplicate genes where changing either of the 2 duplicating genes will not solve the problem. For example, if `gene_space=[[3, 0, 1], [4, 1, 2], [0, 2], [3, 2, 0]]` and the solution is `[3 2 0 0]`, then the values of the last 2 genes duplicate. There are no possible changes in the last 2 genes to solve the problem. + +This problem can be solved by randomly changing one of the non-duplicating genes that may make a room for a unique value in one the 2 duplicating genes. For example, by changing the second gene from 2 to 4, then any of the last 2 genes can take the value 2 and solve the duplicates. The resultant gene is then `[3 4 2 0]`. But this option is not yet supported in PyGAD. + +## Solve Duplicates using a Third Gene + +When `allow_duplicate_genes=False` and a user-defined `gene_space` is used, it sometimes happen that there is no room to solve the duplicates between the 2 genes by simply replacing the value of one gene by another gene. In [PyGAD 3.1.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-0-1), the duplicates are solved by looking for a third gene that will help in solving the duplicates. The following examples explain how it works. + +Example 1: + +Let's assume that this gene space is used and there is a solution with 2 duplicate genes with the same value 4. + +```python +Gene space: [[2, 3], + [3, 4], + [4, 5], + [5, 6]] +Solution: [3, 4, 4, 5] +``` + +By checking the gene space, the second gene can have the values `[3, 4]` and the third gene can have the values `[4, 5]`. To solve the duplicates, we have the value of any of these 2 genes. + +If the value of the second gene changes from 4 to 3, then it will be duplicate with the first gene. If we are to change the value of the third gene from 4 to 5, then it will duplicate with the fourth gene. As a conclusion, trying to just selecting a different gene value for either the second or third genes will introduce new duplicating genes. + +When there are 2 duplicate genes but there is no way to solve their duplicates, then the solution is to change a third gene that makes a room to solve the duplicates between the 2 genes. + +In our example, duplicates between the second and third genes can be solved by, for example,: + +* Changing the first gene from 3 to 2 then changing the second gene from 4 to 3. +* Or changing the fourth gene from 5 to 6 then changing the third gene from 4 to 5. + +Generally, this is how to solve such duplicates: + +1. For any duplicate gene **GENE1**, select another value. +2. Check which other gene **GENEX** has duplicate with this new value. +3. Find if **GENEX** can have another value that will not cause any more duplicates. If so, go to step 7. +4. If all the other values of **GENEX** will cause duplicates, then try another gene **GENEY**. +5. Repeat steps 3 and 4 until exploring all the genes. +6. If there is no possibility to solve the duplicates, then there is not way to solve the duplicates and we have to keep the duplicate value. +7. If a value for a gene **GENEM** is found that will not cause more duplicates, then use this value for the gene **GENEM**. +8. Replace the value of the gene **GENE1** by the old value of the gene **GENEM**. This solves the duplicates. + +This is an example to solve the duplicate for the solution `[3, 4, 4, 5]`: + +1. Let's use the second gene with value 4. Because the space of this gene is `[3, 4]`, then the only other value we can select is 3. +2. The first gene also have the value 3. +3. The first gene has another value 2 that will not cause more duplicates in the solution. Then go to step 7. +4. Skip. +5. Skip. +6. Skip. +7. The value of the first gene 3 will be replaced by the new value 2. The new solution is [2, 4, 4, 5]. +8. Replace the value of the second gene 4 by the old value of the first gene which is 3. The new solution is [2, 3, 4, 5]. The duplicate is solved. + +Example 2: + +```python +Gene space: [[0, 1], + [1, 2], + [2, 3], + [3, 4]] +Solution: [1, 2, 2, 3] +``` + +The quick summary is: + +* Change the value of the first gene from 1 to 0. The solution becomes [0, 2, 2, 3]. +* Change the value of the second gene from 2 to 1. The solution becomes [0, 1, 2, 3]. The duplicate is solved. + +# More about the `gene_type` Parameter + +The `gene_type` parameter allows the user to control the data type for all genes at once or each individual gene. In [PyGAD 2.15.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-15-0), the `gene_type` parameter also supports customizing the precision for `float` data types. As a result, the `gene_type` parameter helps to: + +1. Select a data type for all genes with or without precision. +2. Select a data type for each individual gene with or without precision. + +Let's discuss things by examples. + +## Data Type for All Genes without Precision + +The data type for all genes can be specified by assigning the numeric data type directly to the `gene_type` parameter. This is an example to make all genes of `int` data types. + +```python +gene_type=int +``` + +Given that the supported numeric data types of PyGAD include Python's `int` and `float` in addition to all numeric types of `NumPy`, then any of these types can be assigned to the `gene_type` parameter. + +If no precision is specified for a `float` data type, then the complete floating-point number is kept. + +The next code uses an `int` data type for all genes where the genes in the initial and final population are only integers. + +```python +import pygad +import numpy + +equation_inputs = [4, -2, 3.5, 8, -2] +desired_output = 2671.1234 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=int) + +print("Initial Population") +print(ga_instance.initial_population) + +ga_instance.run() + +print("Final Population") +print(ga_instance.population) +``` + +```python +Initial Population +[[ 1 -1 2 0 -3] + [ 0 -2 0 -3 -1] + [ 0 -1 -1 2 0] + [-2 3 -2 3 3] + [ 0 0 2 -2 -2]] + +Final Population +[[ 1 -1 2 2 0] + [ 1 -1 2 2 0] + [ 1 -1 2 2 0] + [ 1 -1 2 2 0] + [ 1 -1 2 2 0]] +``` + +## Data Type for All Genes with Precision + +A precision can only be specified for a `float` data type and cannot be specified for integers. Here is an example to use a precision of 3 for the `float` data type. In this case, all genes are of type `float` and their maximum precision is 3. + +```python +gene_type=[float, 3] +``` + +The next code uses prints the initial and final population where the genes are of type `float` with precision 3. + +```python +import pygad +import numpy + +equation_inputs = [4, -2, 3.5, 8, -2] +desired_output = 2671.1234 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + + return fitness + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=[float, 3]) + +print("Initial Population") +print(ga_instance.initial_population) + +ga_instance.run() + +print("Final Population") +print(ga_instance.population) +``` + +```python +Initial Population +[[-2.417 -0.487 3.623 2.457 -2.362] + [-1.231 0.079 -1.63 1.629 -2.637] + [ 0.692 -2.098 0.705 0.914 -3.633] + [ 2.637 -1.339 -1.107 -0.781 -3.896] + [-1.495 1.378 -1.026 3.522 2.379]] + +Final Population +[[ 1.714 -1.024 3.623 3.185 -2.362] + [ 0.692 -1.024 3.623 3.185 -2.362] + [ 0.692 -1.024 3.623 3.375 -2.362] + [ 0.692 -1.024 4.041 3.185 -2.362] + [ 1.714 -0.644 3.623 3.185 -2.362]] +``` + +## Data Type for each Individual Gene without Precision + +In [PyGAD 2.14.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-14-0), the `gene_type` parameter allows customizing the gene type for each individual gene. This is by using a `list`/`tuple`/`numpy.ndarray` with number of elements equal to the number of genes. For each element, a type is specified for the corresponding gene. + +This is an example for a 5-gene problem where different types are assigned to the genes. + +```python +gene_type=[int, float, numpy.float16, numpy.int8, float] +``` + +This is a complete code that prints the initial and final population for a custom-gene data type. + +```python +import pygad +import numpy + +equation_inputs = [4, -2, 3.5, 8, -2] +desired_output = 2671.1234 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=[int, float, numpy.float16, numpy.int8, float]) + +print("Initial Population") +print(ga_instance.initial_population) + +ga_instance.run() + +print("Final Population") +print(ga_instance.population) +``` + +```python +Initial Population +[[0 0.8615522360026828 0.7021484375 -2 3.5301821368185866] + [-3 2.648189378595294 -3.830078125 1 -0.9586271572917742] + [3 3.7729827570110714 1.2529296875 -3 1.395741994211889] + [0 1.0490687178053282 1.51953125 -2 0.7243617940450235] + [0 -0.6550158436937226 -2.861328125 -2 1.8212734549263097]] + +Final Population +[[3 3.7729827570110714 2.055 0 0.7243617940450235] + [3 3.7729827570110714 1.458 0 -0.14638754050305036] + [3 3.7729827570110714 1.458 0 0.0869406120516778] + [3 3.7729827570110714 1.458 0 0.7243617940450235] + [3 3.7729827570110714 1.458 0 -0.14638754050305036]] +``` + +## Data Type for each Individual Gene with Precision + +The precision can also be specified for the `float` data types as in the next line where the second gene precision is 2 and last gene precision is 1. + +```python +gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]] +``` + +This is a complete example where the initial and final populations are printed where the genes comply with the data types and precisions specified. + +```python +import pygad +import numpy + +equation_inputs = [4, -2, 3.5, 8, -2] +desired_output = 2671.1234 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]]) + +print("Initial Population") +print(ga_instance.initial_population) + +ga_instance.run() + +print("Final Population") +print(ga_instance.population) +``` + +```python +Initial Population +[[-2 -1.22 1.716796875 -1 0.2] + [-1 -1.58 -3.091796875 0 -1.3] + [3 3.35 -0.107421875 1 -3.3] + [-2 -3.58 -1.779296875 0 0.6] + [2 -3.73 2.65234375 3 -0.5]] + +Final Population +[[2 -4.22 3.47 3 -1.3] + [2 -3.73 3.47 3 -1.3] + [2 -4.22 3.47 2 -1.3] + [2 -4.58 3.47 3 -1.3] + [2 -3.73 3.47 3 -1.3]] +``` + +# Parallel Processing in PyGAD + +Starting from [PyGAD 2.17.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-17-0), parallel processing becomes supported. This section explains how to use parallel processing in PyGAD. + +According to the [PyGAD lifecycle](https://pygad.readthedocs.io/en/latest/pygad.html#life-cycle-of-pygad), parallel processing can be parallelized in only 2 operations: + +1. Population fitness calculation. +2. Mutation. + +The reason is that the calculations in these 2 operations are independent (i.e. each solution/chromosome is handled independently from the others) and can be distributed across different processes or threads. + +For the mutation operation, it does not do intensive calculations on the CPU. Its calculations are simple like flipping the values of some genes from 0 to 1 or adding a random value to some genes. So, it does not take much CPU processing time. Experiments proved that parallelizing the mutation operation across the solutions increases the time instead of reducing it. This is because running multiple processes or threads adds overhead to manage them. Thus, parallel processing cannot be applied on the mutation operation. + +For the population fitness calculation, parallel processing can help make a difference and reduce the processing time. But this is conditional on the type of calculations done in the fitness function. If the fitness function makes intensive calculations and takes much processing time from the CPU, then it is probably that parallel processing will help to cut down the overall time. + +This section explains how parallel processing works in PyGAD and how to use parallel processing in PyGAD + +### How to Use Parallel Processing in PyGAD + +Starting from [PyGAD 2.17.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-17-0), a new parameter called `parallel_processing` added to the constructor of the `pygad.GA` class. + +```python +import pygad +... +ga_instance = pygad.GA(..., + parallel_processing=...) +... +``` + +This parameter allows the user to do the following: + +1. Enable parallel processing. +2. Select whether processes or threads are used. +3. Specify the number of processes or threads to be used. + +These are 3 possible values for the `parallel_processing` parameter: + +1) `None`: (Default) It means no parallel processing is used. +2) A positive integer referring to the number of threads to be used (i.e. threads, not processes, are used. +3) `list`/`tuple`: If a list or a tuple of exactly 2 elements is assigned, then: + 1) The first element can be either `'process'` or `'thread'` to specify whether processes or threads are used, respectively. + 2) The second element can be: + 1) A positive integer to select the maximum number of processes or threads to be used + 2) `0` to indicate that 0 processes or threads are used. It means no parallel processing. This is identical to setting `parallel_processing=None`. + 3) `None` to use the default value as calculated by the `concurrent.futures module`. + +These are examples of the values assigned to the `parallel_processing` parameter: + +* `parallel_processing=4`: Because the parameter is assigned a positive integer, this means parallel processing is activated where 4 threads are used. +* `parallel_processing=["thread", 5]`: Use parallel processing with 5 threads. This is identical to `parallel_processing=5`. +* `parallel_processing=["process", 8]`: Use parallel processing with 8 processes. +* `parallel_processing=["process", 0]`: As the second element is given the value 0, this means do not use parallel processing. This is identical to `parallel_processing=None`. + +### Examples + +The examples will help you know the difference between using processes and threads. Moreover, it will give an idea when parallel processing would make a difference and reduce the time. These are dummy examples where the fitness function is made to always return 0. + +The first example uses 10 genes, 5 solutions in the population where only 3 solutions mate, and 9999 generations. The fitness function uses a `for` loop with 100 iterations just to have some calculations. In the constructor of the `pygad.GA` class, `parallel_processing=None` means no parallel processing is used. + +```python +import pygad +import time + +def fitness_func(ga_instance, solution, solution_idx): + for _ in range(99): + pass + return 0 + +ga_instance = pygad.GA(num_generations=9999, + num_parents_mating=3, + sol_per_pop=5, + num_genes=10, + fitness_func=fitness_func, + suppress_warnings=True, + parallel_processing=None) + +if __name__ == '__main__': + t1 = time.time() + + ga_instance.run() + + t2 = time.time() + print("Time is", t2-t1) +``` + +When parallel processing is not used, the time it takes to run the genetic algorithm is `1.5` seconds. + +In the comparison, let's do a second experiment where parallel processing is used with 5 threads. In this case, it take `5` seconds. + +```python +... +ga_instance = pygad.GA(..., + parallel_processing=5) +... +``` + +For the third experiment, processes instead of threads are used. Also, only 99 generations are used instead of 9999. The time it takes is `99` seconds. + +```python +... +ga_instance = pygad.GA(num_generations=99, + ..., + parallel_processing=["process", 5]) +... +``` + +This is the summary of the 3 experiments: + +1. No parallel processing & 9999 generations: 1.5 seconds. +2. Parallel processing with 5 threads & 9999 generations: 5 seconds +3. Parallel processing with 5 processes & 99 generations: 99 seconds + +Because the fitness function does not need much CPU time, the normal processing takes the least time. Running processes for this simple problem takes 99 compared to only 5 seconds for threads because managing processes is much heavier than managing threads. Thus, most of the CPU time is for swapping the processes instead of executing the code. + +In the second example, the loop makes 99999999 iterations and only 5 generations are used. With no parallelization, it takes 22 seconds. + +```python +import pygad +import time + +def fitness_func(ga_instance, solution, solution_idx): + for _ in range(99999999): + pass + return 0 + +ga_instance = pygad.GA(num_generations=5, + num_parents_mating=3, + sol_per_pop=5, + num_genes=10, + fitness_func=fitness_func, + suppress_warnings=True, + parallel_processing=None) + +if __name__ == '__main__': + t1 = time.time() + ga_instance.run() + t2 = time.time() + print("Time is", t2-t1) +``` + +It takes 15 seconds when 10 processes are used. + +```python +... +ga_instance = pygad.GA(..., + parallel_processing=["process", 10]) +... +``` + +This is compared to 20 seconds when 10 threads are used. + +```python +... +ga_instance = pygad.GA(..., + parallel_processing=["thread", 10]) +... +``` + +Based on the second example, using parallel processing with 10 processes takes the least time because there is much CPU work done. Generally, processes are preferred over threads when most of the work in on the CPU. Threads are preferred over processes in some situations like doing input/output operations. + +*Before releasing [PyGAD 2.17.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-17-0), [László Fazekas](https://www.linkedin.com/in/l%C3%A1szl%C3%B3-fazekas-2429a912) wrote an article to parallelize the fitness function with PyGAD. Check it: [How Genetic Algorithms Can Compete with Gradient Descent and Backprop](https://hackernoon.com/how-genetic-algorithms-can-compete-with-gradient-descent-and-backprop-9m9t33bq)*. + +# Print Lifecycle Summary + +In [PyGAD 2.19.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-19-0), a new method called `summary()` is supported. It prints a Keras-like summary of the PyGAD lifecycle showing the steps, callback functions, parameters, etc. + + This method accepts the following parameters: + +- `line_length=70`: An integer representing the length of the single line in characters. +- `fill_character=" "`: A character to fill the lines. +- `line_character="-"`: A character for creating a line separator. +- `line_character2="="`: A secondary character to create a line separator. +- `columns_equal_len=False`: The table rows are split into equal-sized columns or split subjective to the width needed. +- `print_step_parameters=True`: Whether to print extra parameters about each step inside the step. If `print_step_parameters=False` and `print_parameters_summary=True`, then the parameters of each step are printed at the end of the table. +- `print_parameters_summary=True`: Whether to print parameters summary at the end of the table. If `print_step_parameters=False`, then the parameters of each step are printed at the end of the table too. + +This is a quick example to create a PyGAD example. + +```python +import pygad +import numpy + +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 + +def genetic_fitness(solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +def on_gen(ga): + pass + +def on_crossover_callback(a, b): + pass + +ga_instance = pygad.GA(num_generations=100, + num_parents_mating=10, + sol_per_pop=20, + num_genes=len(function_inputs), + on_crossover=on_crossover_callback, + on_generation=on_gen, + parallel_processing=2, + stop_criteria="reach_10", + fitness_batch_size=4, + crossover_probability=0.4, + fitness_func=genetic_fitness) +``` + +Then call the `summary()` method to print the summary with the default parameters. Note that entries for the crossover and generation callback function are created because their callback functions are implemented through the `on_crossover_callback()` and `on_gen()`, respectively. + +```python +ga_instance.summary() +``` + +```bash +---------------------------------------------------------------------- + PyGAD Lifecycle +====================================================================== +Step Handler Output Shape +====================================================================== +Fitness Function genetic_fitness() (1) +Fitness batch size: 4 +---------------------------------------------------------------------- +Parent Selection steady_state_selection() (10, 6) +Number of Parents: 10 +---------------------------------------------------------------------- +Crossover single_point_crossover() (10, 6) +Crossover probability: 0.4 +---------------------------------------------------------------------- +On Crossover on_crossover_callback() None +---------------------------------------------------------------------- +Mutation random_mutation() (10, 6) +Mutation Genes: 1 +Random Mutation Range: (-1.0, 1.0) +Mutation by Replacement: False +Allow Duplicated Genes: True +---------------------------------------------------------------------- +On Generation on_gen() None +Stop Criteria: [['reach', 10.0]] +---------------------------------------------------------------------- +====================================================================== +Population Size: (20, 6) +Number of Generations: 100 +Initial Population Range: (-4, 4) +Keep Elitism: 1 +Gene DType: [, None] +Parallel Processing: ['thread', 2] +Save Best Solutions: False +Save Solutions: False +====================================================================== +``` + +We can set the `print_step_parameters` and `print_parameters_summary` parameters to `False` to not print the parameters. + +```python +ga_instance.summary(print_step_parameters=False, + print_parameters_summary=False) +``` + +```bash +---------------------------------------------------------------------- + PyGAD Lifecycle +====================================================================== +Step Handler Output Shape +====================================================================== +Fitness Function genetic_fitness() (1) +---------------------------------------------------------------------- +Parent Selection steady_state_selection() (10, 6) +---------------------------------------------------------------------- +Crossover single_point_crossover() (10, 6) +---------------------------------------------------------------------- +On Crossover on_crossover_callback() None +---------------------------------------------------------------------- +Mutation random_mutation() (10, 6) +---------------------------------------------------------------------- +On Generation on_gen() None +---------------------------------------------------------------------- +====================================================================== +``` + +# Logging Outputs + +In [PyGAD 3.0.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-3-0-0), the `print()` statement is no longer used and the outputs are printed using the [logging](https://docs.python.org/3/library/logging.html) module. A a new parameter called `logger` is supported to accept the user-defined logger. + +```python +import logging + +logger = ... + +ga_instance = pygad.GA(..., + logger=logger, + ...) +``` + +The default value for this parameter is `None`. If there is no logger passed (i.e. `logger=None`), then a default logger is created to log the messages to the console exactly like how the `print()` statement works. + +Some advantages of using the the [logging](https://docs.python.org/3/library/logging.html) module instead of the `print()` statement are: + +1. The user has more control over the printed messages specially if there is a project that uses multiple modules where each module prints its messages. A logger can organize the outputs. +2. Using the proper `Handler`, the user can log the output messages to files and not only restricted to printing it to the console. So, it is much easier to record the outputs. +3. The format of the printed messages can be changed by customizing the `Formatter` assigned to the Logger. + +This section gives some quick examples to use the `logging` module and then gives an example to use the logger with PyGAD. + +## Logging to the Console + +This is an example to create a logger to log the messages to the console. + +```python +import logging + +# Create a logger +logger = logging.getLogger(__name__) + +# Set the logger level to debug so that all the messages are printed. +logger.setLevel(logging.DEBUG) + +# Create a stream handler to log the messages to the console. +stream_handler = logging.StreamHandler() + +# Set the handler level to debug. +stream_handler.setLevel(logging.DEBUG) + +# Create a formatter +formatter = logging.Formatter('%(message)s') + +# Add the formatter to handler. +stream_handler.setFormatter(formatter) + +# Add the stream handler to the logger +logger.addHandler(stream_handler) +``` + +Now, we can log messages to the console with the format specified in the `Formatter`. + +```python +logger.debug('Debug message.') +logger.info('Info message.') +logger.warning('Warn message.') +logger.error('Error message.') +logger.critical('Critical message.') +``` + +The outputs are identical to those returned using the `print()` statement. + +``` +Debug message. +Info message. +Warn message. +Error message. +Critical message. +``` + +By changing the format of the output messages, we can have more information about each message. + +```python +formatter = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') +``` + +This is a sample output. + +```python +2023-04-03 18:46:27 DEBUG: Debug message. +2023-04-03 18:46:27 INFO: Info message. +2023-04-03 18:46:27 WARNING: Warn message. +2023-04-03 18:46:27 ERROR: Error message. +2023-04-03 18:46:27 CRITICAL: Critical message. +``` + +Note that you may need to clear the handlers after finishing the execution. This is to make sure no cached handlers are used in the next run. If the cached handlers are not cleared, then the single output message may be repeated. + +```python +logger.handlers.clear() +``` + +## Logging to a File + +This is another example to log the messages to a file named `logfile.txt`. The formatter prints the following about each message: + +1. The date and time at which the message is logged. +2. The log level. +3. The message. +4. The path of the file. +5. The lone number of the log message. + +```python +import logging + +level = logging.DEBUG +name = 'logfile.txt' + +logger = logging.getLogger(name) +logger.setLevel(level) + +file_handler = logging.FileHandler(name, 'a+', 'utf-8') +file_handler.setLevel(logging.DEBUG) +file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S') +file_handler.setFormatter(file_format) +logger.addHandler(file_handler) +``` + +This is how the outputs look like. + +```python +2023-04-03 18:54:03 DEBUG: Debug message. - c:\users\agad069\desktop\logger\example2.py:46 +2023-04-03 18:54:03 INFO: Info message. - c:\users\agad069\desktop\logger\example2.py:47 +2023-04-03 18:54:03 WARNING: Warn message. - c:\users\agad069\desktop\logger\example2.py:48 +2023-04-03 18:54:03 ERROR: Error message. - c:\users\agad069\desktop\logger\example2.py:49 +2023-04-03 18:54:03 CRITICAL: Critical message. - c:\users\agad069\desktop\logger\example2.py:50 +``` + +Consider clearing the handlers if necessary. + +```python +logger.handlers.clear() +``` + +## Log to Both the Console and a File + +This is an example to create a single Logger associated with 2 handlers: + +1. A file handler. +2. A stream handler. + +```python +import logging + +level = logging.DEBUG +name = 'logfile.txt' + +logger = logging.getLogger(name) +logger.setLevel(level) + +file_handler = logging.FileHandler(name,'a+','utf-8') +file_handler.setLevel(logging.DEBUG) +file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S') +file_handler.setFormatter(file_format) +logger.addHandler(file_handler) + +console_handler = logging.StreamHandler() +console_handler.setLevel(logging.INFO) +console_format = logging.Formatter('%(message)s') +console_handler.setFormatter(console_format) +logger.addHandler(console_handler) +``` + +When a log message is executed, then it is both printed to the console and saved in the `logfile.txt`. + +Consider clearing the handlers if necessary. + +```python +logger.handlers.clear() +``` + +## PyGAD Example + +To use the logger in PyGAD, just create your custom logger and pass it to the `logger` parameter. + +```python +import logging +import pygad +import numpy + +level = logging.DEBUG +name = 'logfile.txt' + +logger = logging.getLogger(name) +logger.setLevel(level) + +file_handler = logging.FileHandler(name,'a+','utf-8') +file_handler.setLevel(logging.DEBUG) +file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') +file_handler.setFormatter(file_format) +logger.addHandler(file_handler) + +console_handler = logging.StreamHandler() +console_handler.setLevel(logging.INFO) +console_format = logging.Formatter('%(message)s') +console_handler.setFormatter(console_format) +logger.addHandler(console_handler) + +equation_inputs = [4, -2, 8] +desired_output = 2671.1234 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +def on_generation(ga_instance): + ga_instance.logger.info(f"Generation = {ga_instance.generations_completed}") + ga_instance.logger.info(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}") + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=40, + num_parents_mating=2, + keep_parents=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + on_generation=on_generation, + logger=logger) +ga_instance.run() + +logger.handlers.clear() +``` + +By executing this code, the logged messages are printed to the console and also saved in the text file. + +```python +2023-04-03 19:04:27 INFO: Generation = 1 +2023-04-03 19:04:27 INFO: Fitness = 0.00038086960368076276 +2023-04-03 19:04:27 INFO: Generation = 2 +2023-04-03 19:04:27 INFO: Fitness = 0.00038214871408010853 +2023-04-03 19:04:27 INFO: Generation = 3 +2023-04-03 19:04:27 INFO: Fitness = 0.0003832795907974678 +2023-04-03 19:04:27 INFO: Generation = 4 +2023-04-03 19:04:27 INFO: Fitness = 0.00038398612055017196 +2023-04-03 19:04:27 INFO: Generation = 5 +2023-04-03 19:04:27 INFO: Fitness = 0.00038442348890867516 +2023-04-03 19:04:27 INFO: Generation = 6 +2023-04-03 19:04:27 INFO: Fitness = 0.0003854406039137763 +2023-04-03 19:04:27 INFO: Generation = 7 +2023-04-03 19:04:27 INFO: Fitness = 0.00038646083174063284 +2023-04-03 19:04:27 INFO: Generation = 8 +2023-04-03 19:04:27 INFO: Fitness = 0.0003875169193024936 +2023-04-03 19:04:27 INFO: Generation = 9 +2023-04-03 19:04:27 INFO: Fitness = 0.0003888816727311021 +2023-04-03 19:04:27 INFO: Generation = 10 +2023-04-03 19:04:27 INFO: Fitness = 0.000389832593101348 +``` + +# Solve Non-Deterministic Problems + +PyGAD can be used to solve both deterministic and non-deterministic problems. Deterministic are those that return the same fitness for the same solution. For non-deterministic problems, a different fitness value would be returned for the same solution. + +By default, PyGAD settings are set to solve deterministic problems. PyGAD can save the explored solutions and their fitness to reuse in the future. These instances attributes can save the solutions: + +1. `solutions`: Exists if `save_solutions=True`. +2. `best_solutions`: Exists if `save_best_solutions=True`. +3. `last_generation_elitism`: Exists if `keep_elitism` > 0. +4. `last_generation_parents`: Exists if `keep_parents` > 0 or `keep_parents=-1`. + +To configure PyGAD for non-deterministic problems, we have to disable saving the previous solutions. This is by setting these parameters: + +1. `keep_elitism=0` +2. `keep_parents=0` +3. `keep_solutions=False` +4. `keep_best_solutions=False` + +```python +import pygad +... +ga_instance = pygad.GA(..., + keep_elitism=0, + keep_parents=0, + save_solutions=False, + save_best_solutions=False, + ...) +``` + +This way PyGAD will not save any explored solution and thus the fitness function have to be called for each individual solution. + +# Reuse the Fitness instead of Calling the Fitness Function + +It may happen that a previously explored solution in generation X is explored again in another generation Y (where Y > X). For some problems, calling the fitness function takes much time. + +For deterministic problems, it is better to not call the fitness function for an already explored solutions. Instead, reuse the fitness of the old solution. PyGAD supports some options to help you save time calling the fitness function for a previously explored solution. + +The parameters explored in this section can be set in the constructor of the `pygad.GA` class. + +The `cal_pop_fitness()` method of the `pygad.GA` class checks these parameters to see if there is a possibility of reusing the fitness instead of calling the fitness function. + +## 1. `save_solutions` + +It defaults to `False`. If set to `True`, then the population of each generation is saved into the `solutions` attribute of the `pygad.GA` instance. In other words, every single solution is saved in the `solutions` attribute. + +## 2. `save_best_solutions` + +It defaults to `False`. If `True`, then it only saves the best solution in every generation. + +## 3. `keep_elitism` + +It accepts an integer and defaults to 1. If set to a positive integer, then it keeps the elitism of one generation available in the next generation. + +## 4. `keep_parents` + +It accepts an integer and defaults to -1. It set to `-1` or a positive integer, then it keeps the parents of one generation available in the next generation. + +# Why the Fitness Function is not Called for Solution at Index 0? + +PyGAD has a parameter called `keep_elitism` which defaults to 1. This parameter defines the number of best solutions in generation **X** to keep in the next generation **X+1**. The best solutions are just copied from generation **X** to generation **X+1** without making any change. + +```python +ga_instance = pygad.GA(..., + keep_elitism=1, + ...) +``` + +The best solutions are copied at the beginning of the population. If `keep_elitism=1`, this means the best solution in generation X is kept in the next generation X+1 at index 0 of the population. If `keep_elitism=2`, this means the 2 best solutions in generation X are kept in the next generation X+1 at indices 0 and 1 of the population of generation 1. + +Because the fitness of these best solutions are already calculated in generation X, then their fitness values will not be recalculated at generation X+1 (i.e. the fitness function will not be called for these solutions again). Instead, their fitness values are just reused. This is why you see that no solution with index 0 is passed to the fitness function. + +To force calling the fitness function for each solution in every generation, consider setting `keep_elitism` and `keep_parents` to 0. Moreover, keep the 2 parameters `save_solutions` and `save_best_solutions` to their default value `False`. + +```python +ga_instance = pygad.GA(..., + keep_elitism=0, + keep_parents=0, + save_solutions=False, + save_best_solutions=False, + ...) +``` + + + +# Batch Fitness Calculation + +In [PyGAD 2.19.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-19-0), a new optional parameter called `fitness_batch_size` is supported. A new optional parameter called `fitness_batch_size` is supported to calculate the fitness function in batches. Thanks to [Linan Qiu](https://github.com/linanqiu) for opening the [GitHub issue #136](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/136). + +Its values can be: + +* `1` or `None`: If the `fitness_batch_size` parameter is assigned the value `1` or `None` (default), then the normal flow is used where the fitness function is called for each individual solution. That is if there are 15 solutions, then the fitness function is called 15 times. +* `1 < fitness_batch_size <= sol_per_pop`: If the `fitness_batch_size` parameter is assigned a value satisfying this condition `1 < fitness_batch_size <= sol_per_pop`, then the solutions are grouped into batches of size `fitness_batch_size` and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed. + +## Example without `fitness_batch_size` Parameter + +This is an example where the `fitness_batch_size` parameter is given the value `None` (which is the default value). This is equivalent to using the value `1`. In this case, the fitness function will be called for each solution. This means the fitness function `fitness_func` will receive only a single solution. This is an example of the passed arguments to the fitness function: + +``` +solution: [ 2.52860734, -0.94178795, 2.97545704, 0.84131987, -3.78447118, 2.41008358] +solution_idx: 3 +``` + +The fitness function also must return a single numeric value as the fitness for the passed solution. + +As we have a population of `20` solutions, then the fitness function is called 20 times per generation. For 5 generations, then the fitness function is called `20*5 = 100` times. In PyGAD, the fitness function is called after the last generation too and this adds additional 20 times. So, the total number of calls to the fitness function is `20*5 + 20 = 120`. + +Note that the `keep_elitism` and `keep_parents` parameters are set to `0` to make sure no fitness values are reused and to force calling the fitness function for each individual solution. + +```python +import pygad +import numpy + +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 + +number_of_calls = 0 + +def fitness_func(ga_instance, solution, solution_idx): + global number_of_calls + number_of_calls = number_of_calls + 1 + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +ga_instance = pygad.GA(num_generations=5, + num_parents_mating=10, + sol_per_pop=20, + fitness_func=fitness_func, + fitness_batch_size=None, + # fitness_batch_size=1, + num_genes=len(function_inputs), + keep_elitism=0, + keep_parents=0) + +ga_instance.run() +print(number_of_calls) +``` + +``` +120 +``` + +## Example with `fitness_batch_size` Parameter + +This is an example where the `fitness_batch_size` parameter is used and assigned the value `4`. This means the solutions will be grouped into batches of `4` solutions. The fitness function will be called once for each patch (i.e. called once for each 4 solutions). + +This is an example of the arguments passed to it: + +```python +solutions: + [[ 3.1129432 -0.69123589 1.93792414 2.23772968 -1.54616001 -0.53930799] + [ 3.38508121 0.19890812 1.93792414 2.23095014 -3.08955597 3.10194128] + [ 2.37079504 -0.88819803 2.97545704 1.41742256 -3.95594055 2.45028256] + [ 2.52860734 -0.94178795 2.97545704 0.84131987 -3.78447118 2.41008358]] +solutions_indices: + [16, 17, 18, 19] +``` + +As we have 20 solutions, then there are `20/4 = 5` patches. As a result, the fitness function is called only 5 times per generation instead of 20. For each call to the fitness function, it receives a batch of 4 solutions. + +As we have 5 generations, then the function will be called `5*5 = 25` times. Given the call to the fitness function after the last generation, then the total number of calls is `5*5 + 5 = 30`. + +```python +import pygad +import numpy + +function_inputs = [4,-2,3.5,5,-11,-4.7] +desired_output = 44 + +number_of_calls = 0 + +def fitness_func_batch(ga_instance, solutions, solutions_indices): + global number_of_calls + number_of_calls = number_of_calls + 1 + batch_fitness = [] + for solution in solutions: + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + batch_fitness.append(fitness) + return batch_fitness + +ga_instance = pygad.GA(num_generations=5, + num_parents_mating=10, + sol_per_pop=20, + fitness_func=fitness_func_batch, + fitness_batch_size=4, + num_genes=len(function_inputs), + keep_elitism=0, + keep_parents=0) + +ga_instance.run() +print(number_of_calls) +``` + +``` +30 +``` + +When batch fitness calculation is used, then we saved `120 - 30 = 90` calls to the fitness function. + +# Use Functions and Methods to Build Fitness and Callbacks + +In PyGAD 2.19.0, it is possible to pass user-defined functions or methods to the following parameters: + +1. `fitness_func` +2. `on_start` +3. `on_fitness` +4. `on_parents` +5. `on_crossover` +6. `on_mutation` +7. `on_generation` +8. `on_stop` + +This section gives 2 examples to assign these parameters user-defined: + +1. Functions. +2. Methods. + +## Assign Functions + +This is a dummy example where the fitness function returns a random value. Note that the instance of the `pygad.GA` class is passed as the last parameter of all functions. + +```python +import pygad +import numpy + +def fitness_func(ga_instanse, solution, solution_idx): + return numpy.random.rand() + +def on_start(ga_instanse): + print("on_start") + +def on_fitness(ga_instanse, last_gen_fitness): + print("on_fitness") + +def on_parents(ga_instanse, last_gen_parents): + print("on_parents") + +def on_crossover(ga_instanse, last_gen_offspring): + print("on_crossover") + +def on_mutation(ga_instanse, last_gen_offspring): + print("on_mutation") + +def on_generation(ga_instanse): + print("on_generation\n") + +def on_stop(ga_instanse, last_gen_fitness): + print("on_stop") + +ga_instance = pygad.GA(num_generations=5, + num_parents_mating=4, + sol_per_pop=10, + num_genes=2, + on_start=on_start, + on_fitness=on_fitness, + on_parents=on_parents, + on_crossover=on_crossover, + on_mutation=on_mutation, + on_generation=on_generation, + on_stop=on_stop, + fitness_func=fitness_func) + +ga_instance.run() +``` + +## Assign Methods + +The next example has all the method defined inside the class `Test`. All of the methods accept an additional parameter representing the method's object of the class `Test`. + +All methods accept `self` as the first parameter and the instance of the `pygad.GA` class as the last parameter. + +```python +import pygad +import numpy + +class Test: + def fitness_func(self, ga_instanse, solution, solution_idx): + return numpy.random.rand() + + def on_start(self, ga_instanse): + print("on_start") + + def on_fitness(self, ga_instanse, last_gen_fitness): + print("on_fitness") + + def on_parents(self, ga_instanse, last_gen_parents): + print("on_parents") + + def on_crossover(self, ga_instanse, last_gen_offspring): + print("on_crossover") + + def on_mutation(self, ga_instanse, last_gen_offspring): + print("on_mutation") + + def on_generation(self, ga_instanse): + print("on_generation\n") + + def on_stop(self, ga_instanse, last_gen_fitness): + print("on_stop") + +ga_instance = pygad.GA(num_generations=5, + num_parents_mating=4, + sol_per_pop=10, + num_genes=2, + on_start=Test().on_start, + on_fitness=Test().on_fitness, + on_parents=Test().on_parents, + on_crossover=Test().on_crossover, + on_mutation=Test().on_mutation, + on_generation=Test().on_generation, + on_stop=Test().on_stop, + fitness_func=Test().fitness_func) + +ga_instance.run() +``` + diff --git a/docs/md/releases.md b/docs/md/releases.md new file mode 100644 index 0000000..291d69e --- /dev/null +++ b/docs/md/releases.md @@ -0,0 +1,984 @@ +# Release History + +![PYGAD-LOGO](https://user-images.githubusercontent.com/16560492/101267295-c74c0180-375f-11eb-9ad0-f8e37bd796ce.png) + +## PyGAD 1.0.17 + +Release Date: 15 April 2020 + +1. The **pygad.GA** class accepts a new argument named `fitness_func` which accepts a function to be used for calculating the fitness values for the solutions. This allows the project to be customized to any problem by building the right fitness function. + +## PyGAD 1.0.20 + +Release Date: 4 May 2020 + +1. The **pygad.GA** attributes are moved from the class scope to the instance scope. +2. Raising an exception for incorrect values of the passed parameters. +3. Two new parameters are added to the **pygad.GA** class constructor (`init_range_low` and `init_range_high`) allowing the user to customize the range from which the genes values in the initial population are selected. +4. The code object `__code__` of the passed fitness function is checked to ensure it has the right number of parameters. + +## PyGAD 2.0.0 + +Release Date: 13 May 2020 + +1. The fitness function accepts a new argument named `sol_idx` representing the index of the solution within the population. +2. A new parameter to the **pygad.GA** class constructor named `initial_population` is supported to allow the user to use a custom initial population to be used by the genetic algorithm. If not None, then the passed population will be used. If `None`, then the genetic algorithm will create the initial population using the `sol_per_pop` and `num_genes` parameters. +3. The parameters `sol_per_pop` and `num_genes` are optional and set to `None` by default. +4. A new parameter named `callback_generation` is introduced in the **pygad.GA** class constructor. It accepts a function with a single parameter representing the **pygad.GA** class instance. This function is called after each generation. This helps the user to do post-processing or debugging operations after each generation. + +## PyGAD 2.1.0 + +Release Date: 14 May 2020 + +1. The `best_solution()` method in the **pygad.GA** class returns a new output representing the index of the best solution within the population. Now, it returns a total of 3 outputs and their order is: best solution, best solution fitness, and best solution index. Here is an example: +```python +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print("Parameters of the best solution :", solution) +print("Fitness value of the best solution :", solution_fitness, "\n") +print("Index of the best solution :", solution_idx, "\n") +``` + +2. A new attribute named `best_solution_generation` is added to the instances of the **pygad.GA** class. it holds the generation number at which the best solution is reached. It is only assigned the generation number after the `run()` method completes. Otherwise, its value is -1. +Example: +```python +print("Best solution reached after {best_solution_generation} generations.".format(best_solution_generation=ga_instance.best_solution_generation)) +``` + +3. The `best_solution_fitness` attribute is renamed to `best_solutions_fitness` (plural solution). +4. Mutation is applied independently for the genes. + +## PyGAD 2.2.1 + +Release Date: 17 May 2020 + +1. Adding 2 extra modules (pygad.nn and pygad.gann) for building and training neural networks with the genetic algorithm. + +## PyGAD 2.2.2 + +Release Date: 18 May 2020 +1. The initial value of the `generations_completed` attribute of instances from the pygad.GA class is `0` rather than `None`. + +2. An optional bool parameter named `mutation_by_replacement` is added to the constructor of the pygad.GA class. It works only when the selected type of mutation is random (`mutation_type="random"`). In this case, setting `mutation_by_replacement=True` means replace the gene by the randomly generated value. If `False`, then it has no effect and random mutation works by adding the random value to the gene. This parameter should be used when the gene falls within a fixed range and its value must not go out of this range. Here are some examples: + + Assume there is a gene with the value 0.5. + + If `mutation_type="random"` and `mutation_by_replacement=False`, then the generated random value (e.g. 0.1) will be added to the gene value. The new gene value is **0.5+0.1=0.6**. + + If `mutation_type="random"` and `mutation_by_replacement=True`, then the generated random value (e.g. 0.1) will replace the gene value. The new gene value is **0.1**. + +3. `None` value could be assigned to the `mutation_type` and `crossover_type` parameters of the pygad.GA class constructor. When `None`, this means the step is bypassed and has no action. + +## PyGAD 2.3.0 + +Release date: 1 June 2020 + +1. A new module named `pygad.cnn` is supported for building convolutional neural networks. +2. A new module named `pygad.gacnn` is supported for training convolutional neural networks using the genetic algorithm. +3. The `pygad.plot_result()` method has 3 optional parameters named `title`, `xlabel`, and `ylabel` to customize the plot title, x-axis label, and y-axis label, respectively. +4. The `pygad.nn` module supports the softmax activation function. +5. The name of the `pygad.nn.predict_outputs()` function is changed to `pygad.nn.predict()`. +6. The name of the `pygad.nn.train_network()` function is changed to `pygad.nn.train()`. + +## PyGAD 2.4.0 + +Release date: 5 July 2020 + +1. A new parameter named `delay_after_gen` is added which accepts a non-negative number specifying the time in seconds to wait after a generation completes and before going to the next generation. It defaults to `0.0` which means no delay after the generation. + +2. The passed function to the `callback_generation` parameter of the pygad.GA class constructor can terminate the execution of the genetic algorithm if it returns the string `stop`. This causes the `run()` method to stop. + +One important use case for that feature is to stop the genetic algorithm when a condition is met before passing though all the generations. The user may assigned a value of 100 to the `num_generations` parameter of the pygad.GA class constructor. Assuming that at generation 50, for example, a condition is met and the user wants to stop the execution before waiting the remaining 50 generations. To do that, just make the function passed to the `callback_generation` parameter to return the string `stop`. + +Here is an example of a function to be passed to the `callback_generation` parameter which stops the execution if the fitness value 70 is reached. The value 70 might be the best possible fitness value. After being reached, then there is no need to pass through more generations because no further improvement is possible. + + ```python + def func_generation(ga_instance): + if ga_instance.best_solution()[1] >= 70: + return "stop" + ``` + +## PyGAD 2.5.0 + +Release date: 19 July 2020 + +1. 2 new optional parameters added to the constructor of the `pygad.GA` class which are `crossover_probability` and `mutation_probability`. + While applying the crossover operation, each parent has a random value generated between 0.0 and 1.0. If this random value is less than or equal to the value assigned to the `crossover_probability` parameter, then the parent is selected for the crossover operation. + For the mutation operation, a random value between 0.0 and 1.0 is generated for each gene in the solution. If this value is less than or equal to the value assigned to the `mutation_probability`, then this gene is selected for mutation. +2. A new optional parameter named `linewidth` is added to the `plot_result()` method to specify the width of the curve in the plot. It defaults to 3.0. +3. Previously, the indices of the genes selected for mutation was randomly generated once for all solutions within the generation. Currently, the genes' indices are randomly generated for each solution in the population. If the population has 4 solutions, the indices are randomly generated 4 times inside the single generation, 1 time for each solution. +4. Previously, the position of the point(s) for the single-point and two-points crossover was(were) randomly selected once for all solutions within the generation. Currently, the position(s) is(are) randomly selected for each solution in the population. If the population has 4 solutions, the position(s) is(are) randomly generated 4 times inside the single generation, 1 time for each solution. +5. A new optional parameter named `gene_space` as added to the `pygad.GA` class constructor. It is used to specify the possible values for each gene in case the user wants to restrict the gene values. It is useful if the gene space is restricted to a certain range or to discrete values. For more information, check the [More about the `gene_space` Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#more-about-the-gene-space-parameter) section. Thanks to [Prof. Tamer A. Farrag](https://github.com/tfarrag2000) for requesting this useful feature. + +## PyGAD 2.6.0 + +Release Date: 6 August 2020 + +1. A bug fix in assigning the value to the `initial_population` parameter. +2. A new parameter named `gene_type` is added to control the gene type. It can be either `int` or `float`. It has an effect only when the parameter `gene_space` is `None`. +3. 7 new parameters that accept callback functions: `on_start`, `on_fitness`, `on_parents`, `on_crossover`, `on_mutation`, `on_generation`, and `on_stop`. + +## PyGAD 2.7.0 + +Release Date: 11 September 2020 +1. The `learning_rate` parameter in the `pygad.nn.train()` function defaults to **0.01**. +2. Added support of building neural networks for regression using the new parameter named `problem_type`. It is added as a parameter to both `pygad.nn.train()` and `pygad.nn.predict()` functions. The value of this parameter can be either **classification** or **regression** to define the problem type. It defaults to **classification**. +3. The activation function for a layer can be set to the string `"None"` to refer that there is no activation function at this layer. As a result, the supported values for the activation function are `"sigmoid"`, `"relu"`, `"softmax"`, and `"None"`. + +To build a regression network using the `pygad.nn` module, just do the following: +1. Set the `problem_type` parameter in the `pygad.nn.train()` and `pygad.nn.predict()` functions to the string `"regression"`. +2. Set the activation function for the output layer to the string `"None"`. This sets no limits on the range of the outputs as it will be from `-infinity` to `+infinity`. If you are sure that all outputs will be nonnegative values, then use the ReLU function. + +Check the documentation of the `pygad.nn` module for an example that builds a neural network for regression. The regression example is also available at [this GitHub project](https://github.com/ahmedfgad/NumPyANN): https://github.com/ahmedfgad/NumPyANN + +To build and train a regression network using the `pygad.gann` module, do the following: + +1. Set the `problem_type` parameter in the `pygad.nn.train()` and `pygad.nn.predict()` functions to the string `"regression"`. +2. Set the `output_activation` parameter in the constructor of the `pygad.gann.GANN` class to `"None"`. + +Check the documentation of the `pygad.gann` module for an example that builds and trains a neural network for regression. The regression example is also available at [this GitHub project](https://github.com/ahmedfgad/NeuralGenetic): https://github.com/ahmedfgad/NeuralGenetic + +To build a classification network, either ignore the `problem_type` parameter or set it to `"classification"` (default value). In this case, the activation function of the last layer can be set to any type (e.g. softmax). + +## PyGAD 2.7.1 + +Release Date: 11 September 2020 + +1. A bug fix when the `problem_type` argument is set to `regression`. + +## PyGAD 2.7.2 + +Release Date: 14 September 2020 + +1. Bug fix to support building and training regression neural networks with multiple outputs. + +## PyGAD 2.8.0 + +Release Date: 20 September 2020 + +1. Support of a new module named `kerasga` so that the Keras models can be trained by the genetic algorithm using PyGAD. + +## PyGAD 2.8.1 + +Release Date: 3 October 2020 + +1. Bug fix in applying the crossover operation when the `crossover_probability` parameter is used. Thanks to [Eng. Hamada Kassem, Research and Teaching Assistant, Construction Engineering and Management, Faculty of Engineering, Alexandria University, Egypt](https://www.linkedin.com/in/hamadakassem). + +## PyGAD 2.9.0 + +Release Date: 06 December 2020 + +1. The fitness values of the initial population are considered in the `best_solutions_fitness` attribute. +2. An optional parameter named `save_best_solutions` is added. It defaults to `False`. When it is `True`, then the best solution after each generation is saved into an attribute named `best_solutions`. If `False`, then no solutions are saved and the `best_solutions` attribute will be empty. +3. Scattered crossover is supported. To use it, assign the `crossover_type` parameter the value `"scattered"`. +4. NumPy arrays are now supported by the `gene_space` parameter. +5. The following parameters (`gene_type`, `crossover_probability`, `mutation_probability`, `delay_after_gen`) can be assigned to a numeric value of any of these data types: `int`, `float`, `numpy.int`, `numpy.int8`, `numpy.int16`, `numpy.int32`, `numpy.int64`, `numpy.float`, `numpy.float16`, `numpy.float32`, or `numpy.float64`. + +## PyGAD 2.10.0 + +Release Date: 03 January 2021 + +1. Support of a new module `pygad.torchga` to train PyTorch models using PyGAD. Check [its documentation](https://pygad.readthedocs.io/en/latest/torchga.html). +2. Support of adaptive mutation where the mutation rate is determined by the fitness value of each solution. Read the [Adaptive Mutation](https://pygad.readthedocs.io/en/latest/pygad_more.html#adaptive-mutation) section for more details. Also, read this paper: [Libelli, S. Marsili, and P. Alba. "Adaptive mutation in genetic algorithms." Soft computing 4.2 (2000): 76-80.](https://www.researchgate.net/publication/225642916_Adaptive_mutation_in_genetic_algorithms) +3. Before the `run()` method completes or exits, the fitness value of the best solution in the current population is appended to the `best_solution_fitness` list attribute. Note that the fitness value of the best solution in the initial population is already saved at the beginning of the list. So, the fitness value of the best solution is saved before the genetic algorithm starts and after it ends. +4. When the parameter `parent_selection_type` is set to `sss` (steady-state selection), then a warning message is printed if the value of the `keep_parents` parameter is set to 0. +5. More validations to the user input parameters. +6. The default value of the `mutation_percent_genes` is set to the string `"default"` rather than the integer 10. This change helps to know whether the user explicitly passed a value to the `mutation_percent_genes` parameter or it is left to its default one. The `"default"` value is later translated into the integer 10. +7. The `mutation_percent_genes` parameter is no longer accepting the value 0. It must be `>0` and `<=100`. +8. The built-in `warnings` module is used to show warning messages rather than just using the `print()` function. +9. A new `bool` parameter called `suppress_warnings` is added to the constructor of the `pygad.GA` class. It allows the user to control whether the warning messages are printed or not. It defaults to `False` which means the messages are printed. +10. A helper method called `adaptive_mutation_population_fitness()` is created to calculate the average fitness value used in adaptive mutation to filter the solutions. +11. The `best_solution()` method accepts a new optional parameter called `pop_fitness`. It accepts a list of the fitness values of the solutions in the population. If `None`, then the `cal_pop_fitness()` method is called to calculate the fitness values of the population. + +## PyGAD 2.10.1 + +Release Date: 10 January 2021 + +1. In the `gene_space` parameter, any `None` value (regardless of its index or axis), is replaced by a randomly generated number based on the 3 parameters `init_range_low`, `init_range_high`, and `gene_type`. So, the `None` value in `[..., None, ...]` or `[..., [..., None, ...], ...]` are replaced with random values. This gives more freedom in building the space of values for the genes. +2. All the numbers passed to the `gene_space` parameter are casted to the type specified in the `gene_type` parameter. +3. The `numpy.uint` data type is supported for the parameters that accept integer values. +4. In the `pygad.kerasga` module, the `model_weights_as_vector()` function uses the `trainable` attribute of the model's layers to only return the trainable weights in the network. So, only the trainable layers with their `trainable` attribute set to `True` (`trainable=True`), which is the default value, have their weights evolved. All non-trainable layers with the `trainable` attribute set to `False` (`trainable=False`) will not be evolved. Thanks to [Prof. Tamer A. Farrag](https://github.com/tfarrag2000) for pointing about that at [GitHub](https://github.com/ahmedfgad/KerasGA/issues/1). + +## PyGAD 2.10.2 + +Release Date: 15 January 2021 + +1. A bug fix when `save_best_solutions=True`. Refer to this issue for more information: https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/25 + +## PyGAD 2.11.0 + +Release Date: 16 February 2021 + +1. In the `gene_space` argument, the user can use a dictionary to specify the lower and upper limits of the gene. This dictionary must have only 2 items with keys `low` and `high` to specify the low and high limits of the gene, respectively. This way, PyGAD takes care of not exceeding the value limits of the gene. For a problem with only 2 genes, then using `gene_space=[{'low': 1, 'high': 5}, {'low': 0.2, 'high': 0.81}]` means the accepted values in the first gene start from 1 (inclusive) to 5 (exclusive) while the second one has values between 0.2 (inclusive) and 0.85 (exclusive). For more information, please check the [Limit the Gene Value Range](https://pygad.readthedocs.io/en/latest/pygad_more.html#limit-the-gene-value-range) section of the documentation. +2. The `plot_result()` method returns the figure so that the user can save it. +3. Bug fixes in copying elements from the gene space. +4. For a gene with a set of discrete values (more than 1 value) in the `gene_space` parameter like `[0, 1]`, it was possible that the gene value may not change after mutation. That is if the current value is 0, then the randomly selected value could also be 0. Now, it is verified that the new value is changed. So, if the current value is 0, then the new value after mutation will not be 0 but 1. + +## PyGAD 2.12.0 + +Release Date: 20 February 2021 + +1. 4 new instance attributes are added to hold temporary results after each generation: `last_generation_fitness` holds the fitness values of the solutions in the last generation, `last_generation_parents` holds the parents selected from the last generation, `last_generation_offspring_crossover` holds the offspring generated after applying the crossover in the last generation, and `last_generation_offspring_mutation` holds the offspring generated after applying the mutation in the last generation. You can access these attributes inside the `on_generation()` method for example. +2. A bug fixed when the `initial_population` parameter is used. The bug occurred due to a mismatch between the data type of the array assigned to `initial_population` and the gene type in the `gene_type` attribute. Assuming that the array assigned to the `initial_population` parameter is `((1, 1), (3, 3), (5, 5), (7, 7))` which has type `int`. When `gene_type` is set to `float`, then the genes will not be float but casted to `int` because the defined array has `int` type. The bug is fixed by forcing the array assigned to `initial_population` to have the data type in the `gene_type` attribute. Check the [issue at GitHub](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/27): https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/27 + +Thanks to Andrei Rozanski [PhD Bioinformatics Specialist, Department of Tissue Dynamics and Regeneration, Max Planck Institute for Biophysical Chemistry, Germany] for opening my eye to the first change. + +Thanks to [Marios Giouvanakis](https://www.researchgate.net/profile/Marios-Giouvanakis), a PhD candidate in Electrical & Computer Engineer, [Aristotle University of Thessaloniki (Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης), Greece](https://www.auth.gr/en), for emailing me about the second issue. + +## PyGAD 2.13.0 + +Release Date: 12 March 2021 + +1. A new `bool` parameter called `allow_duplicate_genes` is supported. If `True`, which is the default, then a solution/chromosome may have duplicate gene values. If `False`, then each gene will have a unique value in its solution. Check the [Prevent Duplicates in Gene Values](https://pygad.readthedocs.io/en/latest/pygad_more.html#prevent-duplicates-in-gene-values) section for more details. +2. The `last_generation_fitness` is updated at the end of each generation not at the beginning. This keeps the fitness values of the most up-to-date population assigned to the `last_generation_fitness` parameter. + +## PyGAD 2.14.0 + +PyGAD 2.14.0 has an issue that is solved in PyGAD 2.14.1. Please consider using 2.14.1 not 2.14.0. + +Release Date: 19 May 2021 + +1. [Issue #40](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/40) is solved. Now, the `None` value works with the `crossover_type` and `mutation_type` parameters: https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/40 +2. The `gene_type` parameter supports accepting a `list/tuple/numpy.ndarray` of numeric data types for the genes. This helps to control the data type of each individual gene. Previously, the `gene_type` can be assigned only to a single data type that is applied for all genes. For more information, check the [More about the `gene_type` Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#more-about-the-gene-type-parameter) section. Thanks to [Rainer Engel](https://www.linkedin.com/in/rainer-matthias-engel-5ba47a9) for asking about this feature in [this discussion](https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/43): https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/43 +3. A new `bool` attribute named `gene_type_single` is added to the `pygad.GA` class. It is `True` when there is a single data type assigned to the `gene_type` parameter. When the `gene_type` parameter is assigned a `list/tuple/numpy.ndarray`, then `gene_type_single` is set to `False`. +4. The `mutation_by_replacement` flag now has no effect if `gene_space` exists except for the genes with `None` values. For example, for `gene_space=[None, [5, 6]]` the `mutation_by_replacement` flag affects only the first gene which has `None` for its value space. +5. When an element has a value of `None` in the `gene_space` parameter (e.g. `gene_space=[None, [5, 6]]`), then its value will be randomly generated for each solution rather than being generate once for all solutions. Previously, the gene with `None` value in `gene_space` is the same across all solutions +6. Some changes in the documentation according to [issue #32](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/32): https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/32 + +## PyGAD 2.14.2 + +Release Date: 27 May 2021 + +1. Some bug fixes when the `gene_type` parameter is nested. Thanks to [Rainer Engel](https://www.linkedin.com/in/rainer-matthias-engel-5ba47a9) for opening [a discussion](https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/43#discussioncomment-763342) to report this bug: https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/43#discussioncomment-763342 + +[Rainer Engel](https://www.linkedin.com/in/rainer-matthias-engel-5ba47a9) helped a lot in suggesting new features and suggesting enhancements in 2.14.0 to 2.14.2 releases. + +## PyGAD 2.14.3 + +Release Date: 6 June 2021 + +1. Some bug fixes when setting the `save_best_solutions` parameter to `True`. Previously, the best solution for generation `i` was added into the `best_solutions` attribute at generation `i+1`. Now, the `best_solutions` attribute is updated by each best solution at its exact generation. + +## PyGAD 2.15.0 + +Release Date: 17 June 2021 + +1. Control the precision of all genes/individual genes. Thanks to [Rainer](https://github.com/rengel8) for asking about this feature: https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/43#discussioncomment-763452 +2. A new attribute named `last_generation_parents_indices` holds the indices of the selected parents in the last generation. +3. In adaptive mutation, no need to recalculate the fitness values of the parents selected in the last generation as these values can be returned based on the `last_generation_fitness` and `last_generation_parents_indices` attributes. This speeds-up the adaptive mutation. +4. When a sublist has a value of `None` in the `gene_space` parameter (e.g. `gene_space=[[1, 2, 3], [5, 6, None]]`), then its value will be randomly generated for each solution rather than being generated once for all solutions. Previously, a value of `None` in a sublist of the `gene_space` parameter was identical across all solutions. +5. The dictionary assigned to the `gene_space` parameter itself or one of its elements has a new key called `"step"` to specify the step of moving from the start to the end of the range specified by the 2 existing keys `"low"` and `"high"`. An example is `{"low": 0, "high": 30, "step": 2}` to have only even values for the gene(s) starting from 0 to 30. For more information, check the [More about the `gene_space` Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#more-about-the-gene-space-parameter) section. https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/48 +6. A new function called `predict()` is added in both the `pygad.kerasga` and `pygad.torchga` modules to make predictions. This makes it easier than using custom code each time a prediction is to be made. +7. A new parameter called `stop_criteria` allows the user to specify one or more stop criteria to stop the evolution based on some conditions. Each criterion is passed as `str` which has a stop word. The current 2 supported words are `reach` and `saturate`. `reach` stops the `run()` method if the fitness value is equal to or greater than a given fitness value. An example for `reach` is `"reach_40"` which stops the evolution if the fitness is >= 40. `saturate` means stop the evolution if the fitness saturates for a given number of consecutive generations. An example for `saturate` is `"saturate_7"` which means stop the `run()` method if the fitness does not change for 7 consecutive generations. Thanks to [Rainer](https://github.com/rengel8) for asking about this feature: https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/44 +8. A new bool parameter, defaults to `False`, named `save_solutions` is added to the constructor of the `pygad.GA` class. If `True`, then all solutions in each generation are appended into an attribute called `solutions` which is NumPy array. +9. The `plot_result()` method is renamed to `plot_fitness()`. The users should migrate to the new name as the old name will be removed in the future. +10. Four new optional parameters are added to the `plot_fitness()` function in the `pygad.GA` class which are `font_size=14`, `save_dir=None`, `color="#3870FF"`, and `plot_type="plot"`. Use `font_size` to change the font of the plot title and labels. `save_dir` accepts the directory to which the figure is saved. It defaults to `None` which means do not save the figure. `color` changes the color of the plot. `plot_type` changes the plot type which can be either `"plot"` (default), `"scatter"`, or `"bar"`. https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/47 +11. The default value of the `title` parameter in the `plot_fitness()` method is `"PyGAD - Generation vs. Fitness"` rather than `"PyGAD - Iteration vs. Fitness"`. +12. A new method named `plot_new_solution_rate()` creates, shows, and returns a figure showing the rate of new/unique solutions explored in each generation. It accepts the same parameters as in the `plot_fitness()` method. This method only works when `save_solutions=True` in the `pygad.GA` class's constructor. +13. A new method named `plot_genes()` creates, shows, and returns a figure to show how each gene changes per each generation. It accepts similar parameters like the `plot_fitness()` method in addition to the `graph_type`, `fill_color`, and `solutions` parameters. The `graph_type` parameter can be either `"plot"` (default), `"boxplot"`, or `"histogram"`. `fill_color` accepts the fill color which works when `graph_type` is either `"boxplot"` or `"histogram"`. `solutions` can be either `"all"` or `"best"` to decide whether all solutions or only best solutions are used. +14. The `gene_type` parameter now supports controlling the precision of `float` data types. For a gene, rather than assigning just the data type like `float`, assign a `list`/`tuple`/`numpy.ndarray` with 2 elements where the first one is the type and the second one is the precision. For example, `[float, 2]` forces a gene with a value like `0.1234` to be `0.12`. For more information, check the [More about the `gene_type` Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#more-about-the-gene-type-parameter) section. + +## PyGAD 2.15.1 + +Release Date: 18 June 2021 + +1. Fix a bug when `keep_parents` is set to a positive integer. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/49 + +## PyGAD 2.15.2 + +Release Date: 18 June 2021 + +1. Fix a bug when using the `kerasga` or `torchga` modules. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/51 + +## PyGAD 2.16.0 + +Release Date: 19 June 2021 + +1. A user-defined function can be passed to the `mutation_type`, `crossover_type`, and `parent_selection_type` parameters in the `pygad.GA` class to create a custom mutation, crossover, and parent selection operators. Check the [User-Defined Crossover, Mutation, and Parent Selection Operators](https://pygad.readthedocs.io/en/latest/pygad_more.html#user-defined-crossover-mutation-and-parent-selection-operators) section for more details. https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/50 + +## PyGAD 2.16.1 + +Release Date: 28 September 2021 + +1. The user can use the `tqdm` library to show a progress bar. https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/50. + +```python +import pygad +import numpy +import tqdm + +equation_inputs = [4,-2,3.5] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +num_generations = 10000 +with tqdm.tqdm(total=num_generations) as pbar: + ga_instance = pygad.GA(num_generations=num_generations, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + on_generation=lambda _: pbar.update(1)) + + ga_instance.run() + +ga_instance.plot_result() +``` +But this work does not work if the `ga_instance` will be pickled (i.e. the `save()` method will be called. + +```python +ga_instance.save("test") +``` + +To solve this issue, define a function and pass it to the `on_generation` parameter. In the next code, the `on_generation_progress()` function is defined which updates the progress bar. + +```python +import pygad +import numpy +import tqdm + +equation_inputs = [4,-2,3.5] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +def on_generation_progress(ga): + pbar.update(1) + +num_generations = 100 +with tqdm.tqdm(total=num_generations) as pbar: + ga_instance = pygad.GA(num_generations=num_generations, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + on_generation=on_generation_progress) + + ga_instance.run() + +ga_instance.plot_result() + +ga_instance.save("test") +``` + +2. Solved the issue of unequal length between the `solutions` and `solutions_fitness` when the `save_solutions` parameter is set to `True`. Now, the fitness of the last population is appended to the `solutions_fitness` array. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/64 + +3. There was an issue of getting the length of these 4 variables (`solutions`, `solutions_fitness`, `best_solutions`, and `best_solutions_fitness`) doubled after each call of the `run()` method. This is solved by resetting these variables at the beginning of the `run()` method. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/62 +4. Bug fixes when adaptive mutation is used (`mutation_type="adaptive"`). https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/65 + +## PyGAD 2.16.2 + +Release Date: 2 February 2022 + +1. A new instance attribute called `previous_generation_fitness` added in the `pygad.GA` class. It holds the fitness values of one generation before the fitness values saved in the `last_generation_fitness`. +3. Issue in the `cal_pop_fitness()` method in getting the correct indices of the previous parents. This is solved by using the previous generation's fitness saved in the new attribute `previous_generation_fitness` to return the parents' fitness values. Thanks to Tobias Tischhauser (M.Sc. - [Mitarbeiter Institut EMS, Departement Technik, OST – Ostschweizer Fachhochschule, Switzerland](https://www.ost.ch/de/forschung-und-dienstleistungen/technik/systemtechnik/ems/team)) for detecting this bug. + +## PyGAD 2.16.3 + +Release Date: 2 February 2022 + +1. Validate the fitness value returned from the fitness function. An exception is raised if something is wrong. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/67 + +## PyGAD 2.17.0 + +Release Date: 8 July 2022 + +1. An issue is solved when the `gene_space` parameter is given a fixed value. e.g. gene_space=[range(5), 4]. The second gene's value is static (4) which causes an exception. +2. Fixed the issue where the `allow_duplicate_genes` parameter did not work when mutation is disabled (i.e. `mutation_type=None`). This is by checking for duplicates after crossover directly. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/39 +3. Solve an issue in the `tournament_selection()` method as the indices of the selected parents were incorrect. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/89 +4. Reuse the fitness values of the previously explored solutions rather than recalculating them. This feature only works if `save_solutions=True`. +4. Parallel processing is supported. This is by the introduction of a new parameter named `parallel_processing` in the constructor of the `pygad.GA` class. Thanks to [@windowshopr](https://github.com/windowshopr) for opening the issue [#78](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/78) at GitHub. Check the [Parallel Processing in PyGAD](https://pygad.readthedocs.io/en/latest/pygad_more.html#parallel-processing-in-pygad) section for more information and examples. + +## PyGAD 2.18.0 +Release Date: 9 September 2022 + +1. Raise an exception if the sum of fitness values is zero while either roulette wheel or stochastic universal parent selection is used. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/129 +2. Initialize the value of the `run_completed` property to `False`. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/122 +3. The values of these properties are no longer reset with each call to the `run()` method `self.best_solutions, self.best_solutions_fitness, self.solutions, self.solutions_fitness`: https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/123. Now, the user can have the flexibility of calling the `run()` method more than once while extending the data collected after each generation. Another advantage happens when the instance is loaded and the `run()` method is called, as the old fitness value are shown on the graph alongside with the new fitness values. Read more in this section: [Continue without Losing Progress](https://pygad.readthedocs.io/en/latest/pygad_more.html#continue-without-losing-progress) +4. Thanks [Prof. Fernando Jiménez Barrionuevo](http://webs.um.es/fernan) (Dept. of Information and Communications Engineering, University of Murcia, Murcia, Spain) for editing this [comment](https://github.com/ahmedfgad/GeneticAlgorithmPython/blob/5315bbec02777df96ce1ec665c94dece81c440f4/pygad.py#L73) in the code. https://github.com/ahmedfgad/GeneticAlgorithmPython/commit/5315bbec02777df96ce1ec665c94dece81c440f4 +5. A bug fixed when `crossover_type=None`. +6. Support of elitism selection through a new parameter named `keep_elitism`. It defaults to 1 which means for each generation keep only the best solution in the next generation. If assigned 0, then it has no effect. Read more in this section: [Elitism Selection](https://pygad.readthedocs.io/en/latest/pygad_more.html#elitism-selection). https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/74 +7. A new instance attribute named `last_generation_elitism` added to hold the elitism in the last generation. +8. A new parameter called `random_seed` added to accept a seed for the random function generators. Credit to this issue https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/70 and [Prof. Fernando Jiménez Barrionuevo](http://webs.um.es/fernan). Read more in this section: [Random Seed](https://pygad.readthedocs.io/en/latest/pygad_more.html#random-seed). +9. Editing the `pygad.TorchGA` module to make sure the tensor data is moved from GPU to CPU. Thanks to Rasmus Johansson for opening this pull request: https://github.com/ahmedfgad/TorchGA/pull/2 + +## PyGAD 2.18.1 + +Release Date: 19 September 2022 + +1. A big fix when `keep_elitism` is used. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/132 + +## PyGAD 2.18.2 +Release Date: 14 February 2023 + +1. Remove `numpy.int` and `numpy.float` from the list of supported data types. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/151 https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/152 +2. Call the `on_crossover()` callback function even if `crossover_type` is `None`. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/138 +3. Call the `on_mutation()` callback function even if `mutation_type` is `None`. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/138 + +## PyGAD 2.18.3 + +Release Date: 14 February 2023 + +1. Bug fixes. + +## PyGAD 2.19.0 + +Release Date: 22 February 2023 +1. A new `summary()` method is supported to return a Keras-like summary of the PyGAD lifecycle. +2. A new optional parameter called `fitness_batch_size` is supported to calculate the fitness in batches. If it is assigned the value `1` or `None` (default), then the normal flow is used where the fitness function is called for each individual solution. If the `fitness_batch_size` parameter is assigned a value satisfying this condition `1 < fitness_batch_size <= sol_per_pop`, then the solutions are grouped into batches of size `fitness_batch_size` and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/136. +3. The `cloudpickle` library (https://github.com/cloudpipe/cloudpickle) is used instead of the `pickle` library to pickle the `pygad.GA` objects. This solves the issue of having to redefine the functions (e.g. fitness function). The `cloudpickle` library is added as a dependency in the `requirements.txt` file. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/159 +4. Support of assigning methods to these parameters: `fitness_func`, `crossover_type`, `mutation_type`, `parent_selection_type`, `on_start`, `on_fitness`, `on_parents`, `on_crossover`, `on_mutation`, `on_generation`, and `on_stop`. https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/92 https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/138 +5. Validating the output of the parent selection, crossover, and mutation functions. +6. The built-in parent selection operators return the parent's indices as a NumPy array. +7. The outputs of the parent selection, crossover, and mutation operators must be NumPy arrays. +8. Fix an issue when `allow_duplicate_genes=True`. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/39 +9. Fix an issue creating scatter plots of the solutions' fitness. +10. Sampling from a `set()` is no longer supported in Python 3.11. Instead, sampling happens from a `list()`. Thanks `Marco Brenna` for pointing to this issue. +11. The lifecycle is updated to reflect that the new population's fitness is calculated at the end of the lifecycle not at the beginning. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/154#issuecomment-1438739483 +12. There was an issue when `save_solutions=True` that causes the fitness function to be called for solutions already explored and have their fitness pre-calculated. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/160 +13. A new instance attribute named `last_generation_elitism_indices` added to hold the indices of the selected elitism. This attribute helps to re-use the fitness of the elitism instead of calling the fitness function. +14. Fewer calls to the `best_solution()` method which in turns saves some calls to the fitness function. +15. Some updates in the documentation to give more details about the `cal_pop_fitness()` method. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/79#issuecomment-1439605442 + +## PyGAD 2.19.1 + +Release Date: 22 February 2023 + +1. Add the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library as a dependency. + +## PyGAD 2.19.2 + +Release Date 23 February 2023 + +1. Fix an issue when parallel processing was used where the elitism solutions' fitness values are not re-used. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/160#issuecomment-1441718184 + +## PyGAD 3.0.0 + +Release Date 8 April 2023 + +1. The structure of the library is changed and some methods defined in the `pygad.py` module are moved to the `pygad.utils`, `pygad.helper`, and `pygad.visualize` submodules. + 2. The `pygad.utils.parent_selection` module has a class named `ParentSelection` where all the parent selection operators exist. The `pygad.GA` class extends this class. + 3. The `pygad.utils.crossover` module has a class named `Crossover` where all the crossover operators exist. The `pygad.GA` class extends this class. + 4. The `pygad.utils.mutation` module has a class named `Mutation` where all the mutation operators exist. The `pygad.GA` class extends this class. + 5. The `pygad.helper.unique` module has a class named `Unique` some helper methods exist to solve duplicate genes and make sure every gene is unique. The `pygad.GA` class extends this class. + 6. The `pygad.visualize.plot` module has a class named `Plot` where all the methods that create plots exist. The `pygad.GA` class extends this class. + 7. Support of using the `logging` module to log the outputs to both the console and text file instead of using the `print()` function. This is by assigning the `logging.Logger` to the new `logger` parameter. Check the [Logging Outputs](https://pygad.readthedocs.io/en/latest/pygad_more.html#logging-outputs) for more information. + 8. A new instance attribute called `logger` to save the logger. + 9. The function/method passed to the `fitness_func` parameter accepts a new parameter that refers to the instance of the `pygad.GA` class. Check this for an example: [Use Functions and Methods to Build Fitness Function and Callbacks](https://pygad.readthedocs.io/en/latest/pygad_more.html#use-functions-and-methods-to-build-fitness-and-callbacks). https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/163 + 10. Update the documentation to include an example of using functions and methods to calculate the fitness and build callbacks. Check this for more details: [Use Functions and Methods to Build Fitness Function and Callbacks](https://pygad.readthedocs.io/en/latest/pygad_more.html#use-functions-and-methods-to-build-fitness-and-callbacks). https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/92#issuecomment-1443635003 + 11. Validate the value passed to the `initial_population` parameter. + 12. Validate the type and length of the `pop_fitness` parameter of the `best_solution()` method. + 13. Some edits in the documentation. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/106 + 14. Fix an issue when building the initial population as (some) genes have their value taken from the mutation range (defined by the parameters `random_mutation_min_val` and `random_mutation_max_val`) instead of using the parameters `init_range_low` and `init_range_high`. + 15. The `summary()` method returns the summary as a single-line string. Just log/print the returned string it to see it properly. + 16. The `callback_generation` parameter is removed. Use the `on_generation` parameter instead. + 17. There was an issue when using the `parallel_processing` parameter with Keras and PyTorch. As Keras/PyTorch are not thread-safe, the `predict()` method gives incorrect and weird results when more than 1 thread is used. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/145 https://github.com/ahmedfgad/TorchGA/issues/5 https://github.com/ahmedfgad/KerasGA/issues/6. Thanks to this [StackOverflow answer](https://stackoverflow.com/a/75606666/5426539). + 18. Replace `numpy.float` by `float` in the 2 parent selection operators roulette wheel and stochastic universal. https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/168 + +## PyGAD 3.0.1 + +Release Date 20 April 2023 + +1. Fix an issue with passing user-defined function/method for parent selection. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/179 + +## PyGAD 3.1.0 + +Release Date 20 June 2023 + +1. Fix a bug when the initial population has duplciate genes if a nested gene space is used. +2. The `gene_space` parameter can no longer be assigned a tuple. +3. Fix a bug when the `gene_space` parameter has a member of type `tuple`. +4. A new instance attribute called `gene_space_unpacked` which has the unpacked `gene_space`. It is used to solve duplicates. For infinite ranges in the `gene_space`, they are unpacked to a limited number of values (e.g. 100). +5. Bug fixes when creating the initial population using `gene_space` attribute. +6. When a `dict` is used with the `gene_space` attribute, the new gene value was calculated by summing 2 values: 1) the value sampled from the `dict` 2) a random value returned from the random mutation range defined by the 2 parameters `random_mutation_min_val` and `random_mutation_max_val`. This might cause the gene value to exceed the range limit defined in the `gene_space`. To respect the `gene_space` range, this release only returns the value from the `dict` without summing it to a random value. +7. Formatting the strings using f-string instead of the `format()` method. https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/189 +8. In the `__init__()` of the `pygad.GA` class, the logged error messages are handled using a `try-except` block instead of repeating the `logger.error()` command. https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/189 +9. A new class named `CustomLogger` is created in the `pygad.cnn` module to create a default logger using the `logging` module assigned to the `logger` attribute. This class is extended in all other classes in the module. The constructors of these classes have a new parameter named `logger` which defaults to `None`. If no logger is passed, then the default logger in the `CustomLogger` class is used. +10. Except for the `pygad.nn` module, the `print()` function in all other modules are replaced by the `logging` module to log messages. +11. The callback functions/methods `on_fitness()`, `on_parents()`, `on_crossover()`, and `on_mutation()` can return values. These returned values override the corresponding properties. The output of `on_fitness()` overrides the population fitness. The `on_parents()` function/method must return 2 values representing the parents and their indices. The output of `on_crossover()` overrides the crossover offspring. The output of `on_mutation()` overrides the mutation offspring. +12. Fix a bug when adaptive mutation is used while `fitness_batch_size`>1. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/195 +13. When `allow_duplicate_genes=False` and a user-defined `gene_space` is used, it sometimes happen that there is no room to solve the duplicates between the 2 genes by simply replacing the value of one gene by another gene. This release tries to solve such duplicates by looking for a third gene that will help in solving the duplicates. Check [this section](https://pygad.readthedocs.io/en/latest/pygad_more.html#prevent-duplicates-in-gene-values) for more information. +14. Use probabilities to select parents using the rank parent selection method. https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/205 +15. The 2 parameters `random_mutation_min_val` and `random_mutation_max_val` can accept iterables (list/tuple/numpy.ndarray) with length equal to the number of genes. This enables customizing the mutation range for each individual gene. https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/198 +16. The 2 parameters `init_range_low` and `init_range_high` can accept iterables (list/tuple/numpy.ndarray) with length equal to the number of genes. This enables customizing the initial range for each individual gene when creating the initial population. +17. The `data` parameter in the `predict()` function of the `pygad.kerasga` module can be assigned a data generator. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/115 https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/207 +18. The `predict()` function of the `pygad.kerasga` module accepts 3 optional parameters: 1) `batch_size=None`, `verbose=0`, and `steps=None`. Check documentation of the [Keras Model.predict()](https://keras.io/api/models/model_training_apis) method for more information. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/207 +19. The documentation is updated to explain how mutation works when `gene_space` is used with `int` or `float` data types. Check [this section](https://pygad.readthedocs.io/en/latest/pygad_more.html#limit-the-gene-value-range-using-the-gene-space-parameter). https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/198 + +## PyGAD 3.2.0 + +Release Date 7 September 2023 + +1. A new module `pygad.utils.nsga2` is created that has the `NSGA2` class that includes the functionalities of NSGA-II. The class has these methods: 1) `get_non_dominated_set()` 2) `non_dominated_sorting()` 3) `crowding_distance()` 4) `sort_solutions_nsga2()`. Check [this section](https://pygad.readthedocs.io/en/latest/pygad_more.html#multi-objective-optimization) for an example. +2. Support of multi-objective optimization using Non-Dominated Sorting Genetic Algorithm II (NSGA-II) using the `NSGA2` class in the `pygad.utils.nsga2` module. Just return a `list`, `tuple`, or `numpy.ndarray` from the fitness function and the library will consider the problem as multi-objective optimization. All the objectives are expected to be maximization. Check [this section](https://pygad.readthedocs.io/en/latest/pygad_more.html#multi-objective-optimization) for an example. +3. The parent selection methods and adaptive mutation are edited to support multi-objective optimization. +4. Two new NSGA-II parent selection methods are supported in the `pygad.utils.parent_selection` module: 1) Tournament selection for NSGA-II 2) NSGA-II selection. +5. The `plot_fitness()` method in the `pygad.plot` module has a new optional parameter named `label` to accept the label of the plots. This is only used for multi-objective problems. Otherwise, it is ignored. It defaults to `None` and accepts a `list`, `tuple`, or `numpy.ndarray`. The labels are used in a legend inside the plot. +6. The default color in the methods of the `pygad.plot` module is changed to the greenish `#64f20c` color. +7. A new instance attribute named `pareto_fronts` added to the `pygad.GA` instances that holds the pareto fronts when solving a multi-objective problem. +8. The `gene_type` accepts a `list`, `tuple`, or `numpy.ndarray` for integer data types given that the precision is set to `None` (e.g. `gene_type=[float, [int, None]]`). +9. In the `cal_pop_fitness()` method, the fitness value is re-used if `save_best_solutions=True` and the solution is found in the `best_solutions` attribute. These parameters also can help re-using the fitness of a solution instead of calling the fitness function: `keep_elitism`, `keep_parents`, and `save_solutions`. +10. The value `99999999999` is replaced by `float('inf')` in the 2 methods `wheel_cumulative_probs()` and `stochastic_universal_selection()` inside the `pygad.utils.parent_selection.ParentSelection` class. +11. The `plot_result()` method in the `pygad.visualize.plot.Plot` class is removed. Instead, please use the `plot_fitness()` if you did not upgrade yet. + +## PyGAD 3.3.0 + +Release Date 29 January 2024 + +1. Solve bugs when multi-objective optimization is used. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/238 +2. When the `stop_ciiteria` parameter is used with the `reach` keyword, then multiple numeric values can be passed when solving a multi-objective problem. For example, if a problem has 3 objective functions, then `stop_criteria="reach_10_20_30"` means the GA stops if the fitness of the 3 objectives are at least 10, 20, and 30, respectively. The number values must match the number of objective functions. If a single value found (e.g. `stop_criteria=reach_5`) when solving a multi-objective problem, then it is used across all the objectives. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/238 +3. The `delay_after_gen` parameter is now deprecated and will be removed in a future release. If it is necessary to have a time delay after each generation, then assign a callback function/method to the `on_generation` parameter to pause the evolution. +4. Parallel processing now supports calculating the fitness during adaptive mutation. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/201 +5. The population size can be changed during runtime by changing all the parameters that would affect the size of any thing used by the GA. For more information, check the [Change Population Size during Runtime](https://pygad.readthedocs.io/en/latest/pygad_more.html#change-population-size-during-runtime) section. https://github.com/ahmedfgad/GeneticAlgorithmPython/discussions/234 +6. When a dictionary exists in the `gene_space` parameter without a step, then mutation occurs by adding a random value to the gene value. The random vaue is generated based on the 2 parameters `random_mutation_min_val` and `random_mutation_max_val`. For more information, check the [How Mutation Works with the gene_space Parameter?](https://pygad.readthedocs.io/en/latest/pygad_more.html#how-mutation-works-with-the-gene-space-parameter) section. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/229 +7. Add `object` as a supported data type for int (GA.supported_int_types) and float (GA.supported_float_types). https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/174 +8. Use the `raise` clause instead of the `sys.exit(-1)` to terminate the execution. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/213 +9. Fix a bug when multi-objective optimization is used with batch fitness calculation (e.g. `fitness_batch_size` set to a non-zero number). +10. Fix a bug in the `pygad.py` script when finding the index of the best solution. It does not work properly with multi-objective optimization where `self.best_solutions_fitness` have multiple columns. + + ```python + self.best_solution_generation = numpy.where(numpy.array( + self.best_solutions_fitness) == numpy.max(numpy.array(self.best_solutions_fitness)))[0][0] + ``` + +## PyGAD 3.3.1 + +Release Date 17 February 2024 + +1. After the last generation and before the `run()` method completes, update the 2 instance attributes: 1) `last_generation_parents` 2) `last_generation_parents_indices`. This is to keep the list of parents up-to-date with the latest population fitness `last_generation_fitness`. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/275 +2. 4 methods with names starting with `run_`. Their purpose is to keep the main loop inside the `run()` method clean. Check the [Other Methods](https://pygad.readthedocs.io/en/latest/pygad.html#other-methods) section for more information. + +## PyGAD 3.4.0 + +Release Date 07 January 2025 + +1. The `delay_after_gen` parameter is removed from the `pygad.GA` class constructor. As a result, it is no longer an attribute of the `pygad.GA` class instances. To add a delay after each generation, apply it inside the `on_generation` callback. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/283 +2. In the `single_point_crossover()` method of the `pygad.utils.crossover.Crossover` class, all the random crossover points are returned before the `for` loop. This is by calling the `numpy.random.randint()` function only once before the loop to generate all the K points (where K is the offspring size). This is compared to calling the `numpy.random.randint()` function inside the `for` loop K times, once for each individual offspring. +3. Bug fix in the `examples/example_custom_operators.py` script. https://github.com/ahmedfgad/GeneticAlgorithmPython/pull/285 +4. While making prediction using the `pygad.torchga.predict()` function, no gradients are calculated. +5. The `gene_type` parameter of the `pygad.helper.unique.Unique.unique_int_gene_from_range()` method accepts the type of the current gene only instead of the full gene_type list. +6. Created a new method called `unique_float_gene_from_range()` inside the `pygad.helper.unique.Unique` class to find a unique floating-point number from a range. +7. Fix a bug in the `pygad.helper.unique.Unique.unique_gene_by_space()` method to return the numeric value only instead of a NumPy array. +8. Refactoring the `pygad/helper/unique.py` script to remove duplicate codes and reformatting the docstrings. +9. The plot_pareto_front_curve() method added to the pygad.visualize.plot.Plot class to visualize the Pareto front for multi-objective problems. It only supports 2 objectives. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/279 +11. Fix a bug converting a nested NumPy array to a nested list. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/300 +12. The `Matplotlib` library is only imported when a method inside the `pygad/visualize/plot.py` script is used. This is more efficient than using `import matplotlib.pyplot` at the module level as this causes it to be imported when `pygad` is imported even when it is not needed. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/292 +13. Fix a bug when minus sign (-) is used inside the `stop_criteria` parameter (e.g. `stop_criteria=["saturate_10", "reach_-0.5"]`). https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/296 +14. Make sure `self.best_solutions` is a list of lists inside the `cal_pop_fitness` method. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/293 +15. Fix a bug where the `cal_pop_fitness()` method was using the `previous_generation_fitness` attribute to return the parents fitness. This instance attribute was not using the fitness of the latest population, instead the fitness of the population before the last one. The issue is solved by updating the `previous_generation_fitness` attribute to the latest population fitness before the GA completes. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/291 + +## PyGAD 3.5.0 + +Release Date 07 July 2025 + +1. Fix a bug when minus sign (-) is used inside the `stop_criteria` parameter for multi-objective problems. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/314 +2. Fix a bug when the `stop_criteria` parameter is passed as an iterable (e.g. list) for multi-objective problems (e.g. `['reach_50_60', 'reach_20, 40']`). https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/314 +3. Call the `get_matplotlib()` function from the `plot_genes()` method inside the `pygad.visualize.plot.Plot` class to import the matplotlib library. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/315 +4. Create a new helper method called `select_unique_value()` inside the `pygad/helper/unique.py` script to select a unique gene from an array of values. +5. Create a new helper method called `get_random_mutation_range()` inside the `pygad/utils/mutation.py` script that returns the random mutation range (min and max) for a single gene by its index. +6. Create a new helper method called `change_random_mutation_value_dtype` inside the `pygad/utils/mutation.py` script that changes the data type of the value used to apply random mutation. +7. Create a new helper method called `round_random_mutation_value()` inside the `pygad/utils/mutation.py` script that rounds the value used to apply random mutation. +8. Create the `pygad/helper/misc.py` script with a class called `Helper` that has the following helper methods: + 1. `change_population_dtype_and_round()`: For each gene in the population, round the gene value and change the data type. + 2. `change_gene_dtype_and_round()`: Round the change the data type of a single gene. + 3. `mutation_change_gene_dtype_and_round()`: Decides whether mutation is done by replacement or not. Then it rounds and change the data type of the new gene value. + 4. `validate_gene_constraint_callable_output()`: Validates the output of the user-defined callable/function that checks whether the gene constraint defined in the `gene_constraint` parameter is satisfied or not. + 5. `get_gene_dtype()`: Returns the gene data type from the `gene_type` instance attribute. + 6. `get_random_mutation_range()`: Returns the random mutation range using the `random_mutation_min_val` and `random_mutation_min_val` instance attributes. + 7. `get_initial_population_range()`: Returns the initial population values range using the `init_range_low` and `init_range_high` instance attributes. + 8. `generate_gene_value_from_space()`: Generates/selects a value for a gene using the `gene_space` instance attribute. + 9. `generate_gene_value_randomly()`: Generates a random value for the gene. Only used if `gene_space` is `None`. + 10. `generate_gene_value()`: Generates a value for the gene. It checks whether `gene_space` is `None` and calls either `generate_gene_value_randomly()` or `generate_gene_value_from_space()`. + 11. `filter_gene_values_by_constraint()`: Receives a list of values for a gene. Then it filters such values using the gene constraint. + 12. `get_valid_gene_constraint_values()`: Selects one valid gene value that satisfy the gene constraint. It simply calls `generate_gene_value()` to generate some gene values then it filters such values using `filter_gene_values_by_constraint()`. +9. Create a new helper method called `mutation_process_random_value()` inside the `pygad/utils/mutation.py` script that generates constrained random values for mutation. It calls either `generate_gene_value()` or `get_valid_gene_constraint_values()` based on whether the `gene_constraint` parameter is used or not. +10. A new parameter called `gene_constraint` is added. It accepts a list of callables (i.e. functions) acting as constraints for the gene values. Before selecting a value for a gene, the callable is called to ensure the candidate value is valid. Check the [Gene Constraint](https://pygad.readthedocs.io/en/latest/pygad_more.html#gene-constraint) section for more information. +11. A new parameter called `sample_size` is added. To select a gene value that respects a constraint, this variable defines the size of the sample from which a value is selected randomly. Useful if either `allow_duplicate_genes` or `gene_constraint` is used. An instance attribute of the same name is created in the instances of the `pygad.GA` class. Check the [sample_size Parameter](https://pygad.readthedocs.io/en/latest/pygad_more.html#sample-size-parameter) section for more information. +12. Use the `sample_size` parameter instead of `num_trials` in the methods `solve_duplicate_genes_randomly()` and `unique_float_gene_from_range()` inside the `pygad/helper/unique.py` script. It is the maximum number of values to generate as the search space when looking for a unique float value out of a range. +13. Fixed a bug in population initialization when `allow_duplicate_genes=False`. Previously, gene values were checked for duplicates before rounding, which could allow near-duplicates like 7.61 and 7.62 to pass. After rounding (e.g., both becoming 7.6), this resulted in unintended duplicates. The fix ensures gene values are now rounded before duplicate checks, preventing such cases. +14. More tests are created. +15. More examples are created. + +# PyGAD Projects at GitHub + +The PyGAD library is available at PyPI at this page https://pypi.org/project/pygad. PyGAD is built out of a number of open-source GitHub projects. A brief note about these projects is given in the next subsections. + +## [GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython) + +GitHub Link: https://github.com/ahmedfgad/GeneticAlgorithmPython + +[**GeneticAlgorithmPython**](https://github.com/ahmedfgad/GeneticAlgorithmPython) is the first project which is an open-source Python 3 project for implementing the genetic algorithm based on NumPy. + +## [NumPyANN](https://github.com/ahmedfgad/NumPyANN) + +GitHub Link: https://github.com/ahmedfgad/NumPyANN + +[**NumPyANN**](https://github.com/ahmedfgad/NumPyANN) builds artificial neural networks in **Python 3** using **NumPy** from scratch. The purpose of this project is to only implement the **forward pass** of a neural network without using a training algorithm. Currently, it only supports classification and later regression will be also supported. Moreover, only one class is supported per sample. + +## [NeuralGenetic](https://github.com/ahmedfgad/NeuralGenetic) + +GitHub Link: https://github.com/ahmedfgad/NeuralGenetic + +[NeuralGenetic](https://github.com/ahmedfgad/NeuralGenetic) trains neural networks using the genetic algorithm based on the previous 2 projects [GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython) and [NumPyANN](https://github.com/ahmedfgad/NumPyANN). + +## [NumPyCNN](https://github.com/ahmedfgad/NumPyCNN) + +GitHub Link: https://github.com/ahmedfgad/NumPyCNN + +[NumPyCNN](https://github.com/ahmedfgad/NumPyCNN) builds convolutional neural networks using NumPy. The purpose of this project is to only implement the **forward pass** of a convolutional neural network without using a training algorithm. + +## [CNNGenetic](https://github.com/ahmedfgad/CNNGenetic) + +GitHub Link: https://github.com/ahmedfgad/CNNGenetic + +[CNNGenetic](https://github.com/ahmedfgad/CNNGenetic) trains convolutional neural networks using the genetic algorithm. It uses the [GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython) project for building the genetic algorithm. + +## [KerasGA](https://github.com/ahmedfgad/KerasGA) + +GitHub Link: https://github.com/ahmedfgad/KerasGA + +[KerasGA](https://github.com/ahmedfgad/KerasGA) trains [Keras](https://keras.io) models using the genetic algorithm. It uses the [GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython) project for building the genetic algorithm. + +## [TorchGA](https://github.com/ahmedfgad/TorchGA) + +GitHub Link: https://github.com/ahmedfgad/TorchGA + +[TorchGA](https://github.com/ahmedfgad/TorchGA) trains [PyTorch](https://pytorch.org) models using the genetic algorithm. It uses the [GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython) project for building the genetic algorithm. + +[pygad.torchga](https://github.com/ahmedfgad/TorchGA): https://github.com/ahmedfgad/TorchGA + +# Stackoverflow Questions about PyGAD + +## [How do I proceed to load a ga_instance as “.pkl” format in PyGad?](https://stackoverflow.com/questions/67424181/how-do-i-proceed-to-load-a-ga-instance-as-pkl-format-in-pygad) + +## [Binary Classification NN Model Weights not being Trained in PyGAD](https://stackoverflow.com/questions/67276696/binary-classification-nn-model-weights-not-being-trained-in-pygad) + +## [How to solve TSP problem using pyGAD package?](https://stackoverflow.com/questions/66298595/how-to-solve-tsp-problem-using-pygad-package) + +## [How can I save a matplotlib plot that is the output of a function in jupyter?](https://stackoverflow.com/questions/66055330/how-can-i-save-a-matplotlib-plot-that-is-the-output-of-a-function-in-jupyter) + +## [How do I query the best solution of a pyGAD GA instance?](https://stackoverflow.com/questions/65757722/how-do-i-query-the-best-solution-of-a-pygad-ga-instance) + +## [Multi-Input Multi-Output in Genetic algorithm (python)](https://stackoverflow.com/questions/64943711/multi-input-multi-output-in-genetic-algorithm-python) + +https://www.linkedin.com/pulse/validation-short-term-parametric-trading-model-genetic-landolfi + +https://itchef.ru/articles/397758 + +https://audhiaprilliant.medium.com/genetic-algorithm-based-clustering-algorithm-in-searching-robust-initial-centroids-for-k-means-e3b4d892a4be + +https://python.plainenglish.io/validation-of-a-short-term-parametric-trading-model-with-genetic-optimization-and-walk-forward-89708b789af6 + +https://ichi.pro/ko/pygadwa-hamkke-yujeon-algolijeum-eul-sayonghayeo-keras-model-eul-hunlyeonsikineun-bangbeob-173299286377169 + +https://ichi.pro/tr/pygad-ile-genetik-algoritmayi-kullanarak-keras-modelleri-nasil-egitilir-173299286377169 + +https://ichi.pro/ru/kak-obucit-modeli-keras-s-pomos-u-geneticeskogo-algoritma-s-pygad-173299286377169 + +https://blog.csdn.net/sinat_38079265/article/details/108449614 + + + +# Submitting Issues + +If there is an issue using PyGAD, then use any of your preferred option to discuss that issue. + +One way is [submitting an issue](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/new) into this GitHub project ([github.com/ahmedfgad/GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython)) in case something is not working properly or to ask for questions. + +If this is not a proper option for you, then check the [**Contact Us**](https://pygad.readthedocs.io/en/latest/Footer.html#contact-us) section for more contact details. + +# Ask for Feature + +PyGAD is actively developed with the goal of building a dynamic library for suporting a wide-range of problems to be optimized using the genetic algorithm. + +To ask for a new feature, either [submit an issue](https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/new) into this GitHub project ([github.com/ahmedfgad/GeneticAlgorithmPython](https://github.com/ahmedfgad/GeneticAlgorithmPython)) or send an e-mail to ahmed.f.gad@gmail.com. + +Also check the [**Contact Us**](https://pygad.readthedocs.io/en/latest/Footer.html#contact-us) section for more contact details. + +# Projects Built using PyGAD + +If you created a project that uses PyGAD, then we can support you by mentioning this project here in PyGAD's documentation. + +To do that, please send a message at ahmed.f.gad@gmail.com or check the [**Contact Us**](https://pygad.readthedocs.io/en/latest/Footer.html#contact-us) section for more contact details. + +Within your message, please send the following details: + +- Project title +- Brief description +- Preferably, a link that directs the readers to your project + +# Tutorials about PyGAD + +## [Adaptive Mutation in Genetic Algorithm with Python Examples](https://neptune.ai/blog/adaptive-mutation-in-genetic-algorithm-with-python-examples) + +In this tutorial, we’ll see why mutation with a fixed number of genes is bad, and how to replace it with adaptive mutation. Using the [PyGAD Python 3 library](https://pygad.readthedocs.io/), we’ll discuss a few examples that use both random and adaptive mutation. + +## [Clustering Using the Genetic Algorithm in Python](https://blog.paperspace.com/clustering-using-the-genetic-algorithm) + +This tutorial discusses how the genetic algorithm is used to cluster data, starting from random clusters and running until the optimal clusters are found. We'll start by briefly revising the K-means clustering algorithm to point out its weak points, which are later solved by the genetic algorithm. The code examples in this tutorial are implemented in Python using the [PyGAD library](https://pygad.readthedocs.io/). + +## [Working with Different Genetic Algorithm Representations in Python](https://blog.paperspace.com/working-with-different-genetic-algorithm-representations-python) + +Depending on the nature of the problem being optimized, the genetic algorithm (GA) supports two different gene representations: binary, and decimal. The binary GA has only two values for its genes, which are 0 and 1. This is easier to manage as its gene values are limited compared to the decimal GA, for which we can use different formats like float or integer, and limited or unlimited ranges. + +This tutorial discusses how the [PyGAD](https://pygad.readthedocs.io/) library supports the two GA representations, binary and decimal. + +## [5 Genetic Algorithm Applications Using PyGAD](https://blog.paperspace.com/genetic-algorithm-applications-using-pygad) + +This tutorial introduces PyGAD, an open-source Python library for implementing the genetic algorithm and training machine learning algorithms. PyGAD supports 19 parameters for customizing the genetic algorithm for various applications. + +Within this tutorial we'll discuss 5 different applications of the genetic algorithm and build them using PyGAD. + +## [Train Neural Networks Using a Genetic Algorithm in Python with PyGAD](https://heartbeat.fritz.ai/train-neural-networks-using-a-genetic-algorithm-in-python-with-pygad-862905048429?gi=ba58ee6b4bbd) + +The genetic algorithm (GA) is a biologically-inspired optimization algorithm. It has in recent years gained importance, as it’s simple while also solving complex problems like travel route optimization, training machine learning algorithms, working with single and multi-objective problems, game playing, and more. + +Deep neural networks are inspired by the idea of how the biological brain works. It’s a universal function approximator, which is capable of simulating any function, and is now used to solve the most complex problems in machine learning. What’s more, they’re able to work with all types of data (images, audio, video, and text). + +Both genetic algorithms (GAs) and neural networks (NNs) are similar, as both are biologically-inspired techniques. This similarity motivates us to create a hybrid of both to see whether a GA can train NNs with high accuracy. + +This tutorial uses [PyGAD](https://pygad.readthedocs.io/), a Python library that supports building and training NNs using a GA. [PyGAD](https://pygad.readthedocs.io/) offers both classification and regression NNs. + +## [Building a Game-Playing Agent for CoinTex Using the Genetic Algorithm](https://blog.paperspace.com/building-agent-for-cointex-using-genetic-algorithm) + +In this tutorial we'll see how to build a game-playing agent using only the genetic algorithm to play a game called [CoinTex](https://play.google.com/store/apps/details?id=coin.tex.cointexreactfast&hl=en), which is developed in the Kivy Python framework. The objective of CoinTex is to collect the randomly distributed coins while avoiding collision with fire and monsters (that move randomly). The source code of CoinTex can be found [on GitHub](https://github.com/ahmedfgad/CoinTex). + +The genetic algorithm is the only AI used here; there is no other machine/deep learning model used with it. We'll implement the genetic algorithm using [PyGad](https://blog.paperspace.com/genetic-algorithm-applications-using-pygad/). This tutorial starts with a quick overview of CoinTex followed by a brief explanation of the genetic algorithm, and how it can be used to create the playing agent. Finally, we'll see how to implement these ideas in Python. + +The source code of the genetic algorithm agent is available [here](https://github.com/ahmedfgad/CoinTex/tree/master/PlayerGA), and you can download the code used in this tutorial from [here](https://github.com/ahmedfgad/CoinTex/tree/master/PlayerGA/TutorialProject). + +## [How To Train Keras Models Using the Genetic Algorithm with PyGAD](https://blog.paperspace.com/train-keras-models-using-genetic-algorithm-with-pygad) + +PyGAD is an open-source Python library for building the genetic algorithm and training machine learning algorithms. It offers a wide range of parameters to customize the genetic algorithm to work with different types of problems. + +PyGAD has its own modules that support building and training neural networks (NNs) and convolutional neural networks (CNNs). Despite these modules working well, they are implemented in Python without any additional optimization measures. This leads to comparatively high computational times for even simple problems. + +The latest PyGAD version, 2.8.0 (released on 20 September 2020), supports a new module to train Keras models. Even though Keras is built in Python, it's fast. The reason is that Keras uses TensorFlow as a backend, and TensorFlow is highly optimized. + +This tutorial discusses how to train Keras models using PyGAD. The discussion includes building Keras models using either the Sequential Model or the Functional API, building an initial population of Keras model parameters, creating an appropriate fitness function, and more. + +[![PyGAD+Keras](https://user-images.githubusercontent.com/16560492/111009628-2b372500-8362-11eb-90cf-01b47d831624.png)](https://blog.paperspace.com/train-keras-models-using-genetic-algorithm-with-pygad) + +## [Train PyTorch Models Using Genetic Algorithm with PyGAD](https://neptune.ai/blog/train-pytorch-models-using-genetic-algorithm-with-pygad) + +[PyGAD](https://pygad.readthedocs.io/) is a genetic algorithm Python 3 library for solving optimization problems. One of these problems is training machine learning algorithms. + +PyGAD has a module called [pygad.kerasga](https://github.com/ahmedfgad/KerasGA). It trains Keras models using the genetic algorithm. On January 3rd, 2021, a new release of [PyGAD 2.10.0](https://pygad.readthedocs.io/) brought a new module called [pygad.torchga](https://github.com/ahmedfgad/TorchGA) to train PyTorch models. It’s very easy to use, but there are a few tricky steps. + +So, in this tutorial, we’ll explore how to use PyGAD to train PyTorch models. + +[![PyGAD+PyTorch](https://user-images.githubusercontent.com/16560492/111009678-5457b580-8362-11eb-899a-39e2f96984df.png)](https://neptune.ai/blog/train-pytorch-models-using-genetic-algorithm-with-pygad) + +## [A Guide to Genetic ‘Learning’ Algorithms for Optimization](https://towardsdatascience.com/a-guide-to-genetic-learning-algorithms-for-optimization-e1067cdc77e7) + +# PyGAD in Other Languages + +## French + +[Cómo los algoritmos genéticos pueden competir con el descenso de gradiente y el backprop](https://www.hebergementwebs.com/nouvelles/comment-les-algorithmes-genetiques-peuvent-rivaliser-avec-la-descente-de-gradient-et-le-backprop) + +Bien que la manière standard d'entraîner les réseaux de neurones soit la descente de gradient et la rétropropagation, il y a d'autres joueurs dans le jeu. L'un d'eux est les algorithmes évolutionnaires, tels que les algorithmes génétiques. + +Utiliser un algorithme génétique pour former un réseau de neurones simple pour résoudre le OpenAI CartPole Jeu. Dans cet article, nous allons former un simple réseau de neurones pour résoudre le OpenAI CartPole . J'utiliserai PyTorch et PyGAD . + +[![Cómo los algoritmos genéticos pueden competir con el descenso de gradiente y el backprop](https://user-images.githubusercontent.com/16560492/111009275-3178d180-8361-11eb-9e86-7fb1519acde7.png)](https://www.hebergementwebs.com/nouvelles/comment-les-algorithmes-genetiques-peuvent-rivaliser-avec-la-descente-de-gradient-et-le-backprop) + +## Spanish + +[Cómo los algoritmos genéticos pueden competir con el descenso de gradiente y el backprop](https://www.hebergementwebs.com/noticias/como-los-algoritmos-geneticos-pueden-competir-con-el-descenso-de-gradiente-y-el-backprop) + +Aunque la forma estandar de entrenar redes neuronales es el descenso de gradiente y la retropropagacion, hay otros jugadores en el juego, uno de ellos son los algoritmos evolutivos, como los algoritmos geneticos. + +Usa un algoritmo genetico para entrenar una red neuronal simple para resolver el Juego OpenAI CartPole. En este articulo, entrenaremos una red neuronal simple para resolver el OpenAI CartPole . Usare PyTorch y PyGAD . + +[![Cómo los algoritmos genéticos pueden competir con el descenso de gradiente y el backprop](https://user-images.githubusercontent.com/16560492/111009257-232ab580-8361-11eb-99a5-7226efbc3065.png)](https://www.hebergementwebs.com/noticias/como-los-algoritmos-geneticos-pueden-competir-con-el-descenso-de-gradiente-y-el-backprop) + +## Korean + +### [[PyGAD] Python 에서 Genetic Algorithm 을 사용해보기](https://data-newbie.tistory.com/m/685) + +[![Korean-1](https://user-images.githubusercontent.com/16560492/108586306-85bd0280-731b-11eb-874c-7ac4ce1326cd.jpg)](https://data-newbie.tistory.com/m/685) + +파이썬에서 genetic algorithm을 사용하는 패키지들을 다 사용해보진 않았지만, 확장성이 있어보이고, 시도할 일이 있어서 살펴봤다. + +이 패키지에서 가장 인상 깊었던 것은 neural network에서 hyper parameter 탐색을 gradient descent 방식이 아닌 GA로도 할 수 있다는 것이다. + +개인적으로 이 부분이 어느정도 초기치를 잘 잡아줄 수 있는 역할로도 쓸 수 있고, Loss가 gradient descent 하기 어려운 구조에서 대안으로 쓸 수 있을 것으로도 생각된다. + +일단 큰 흐름은 다음과 같이 된다. + +사실 완전히 흐름이나 각 parameter에 대한 이해는 부족한 상황 + +## Turkish + +### [PyGAD ile Genetik Algoritmayı Kullanarak Keras Modelleri Nasıl Eğitilir](https://erencan34.medium.com/pygad-ile-genetik-algoritmay%C4%B1-kullanarak-keras-modelleri-nas%C4%B1l-e%C4%9Fitilir-cf92639a478c) + +This is a translation of an original English tutorial published at Paperspace: [How To Train Keras Models Using the Genetic Algorithm with PyGAD](https://blog.paperspace.com/train-keras-models-using-genetic-algorithm-with-pygad) + +PyGAD, genetik algoritma oluşturmak ve makine öğrenimi algoritmalarını eğitmek için kullanılan açık kaynaklı bir Python kitaplığıdır. Genetik algoritmayı farklı problem türleri ile çalışacak şekilde özelleştirmek için çok çeşitli parametreler sunar. + +PyGAD, sinir ağları (NN’ler) ve evrişimli sinir ağları (CNN’ler) oluşturmayı ve eğitmeyi destekleyen kendi modüllerine sahiptir. Bu modüllerin iyi çalışmasına rağmen, herhangi bir ek optimizasyon önlemi olmaksızın Python’da uygulanırlar. Bu, basit problemler için bile nispeten yüksek hesaplama sürelerine yol açar. + +En son PyGAD sürümü 2.8.0 (20 Eylül 2020'de piyasaya sürüldü), Keras modellerini eğitmek için yeni bir modülü destekliyor. Keras Python’da oluşturulmuş olsa da hızlıdır. Bunun nedeni, Keras’ın arka uç olarak TensorFlow kullanması ve TensorFlow’un oldukça optimize edilmiş olmasıdır. + +Bu öğreticide, PyGAD kullanılarak Keras modellerinin nasıl eğitileceği anlatılmaktadır. Tartışma, Sıralı Modeli veya İşlevsel API’yi kullanarak Keras modellerini oluşturmayı, Keras model parametrelerinin ilk popülasyonunu oluşturmayı, uygun bir uygunluk işlevi oluşturmayı ve daha fazlasını içerir. + +[![national-cancer-institute-zz_3tCcrk7o-unsplash](https://user-images.githubusercontent.com/16560492/108586601-85be0200-731d-11eb-98a4-161c75a1f099.jpg)](https://erencan34.medium.com/pygad-ile-genetik-algoritmay%C4%B1-kullanarak-keras-modelleri-nas%C4%B1l-e%C4%9Fitilir-cf92639a478c) + +## Hungarian + +### [Tensorflow alapozó 10. Neurális hálózatok tenyésztése genetikus algoritmussal PyGAD és OpenAI Gym használatával](https://thebojda.medium.com/tensorflow-alapoz%C3%B3-10-24f7767d4a2c) + +Hogy kontextusba helyezzem a genetikus algoritmusokat, ismételjük kicsit át, hogy hogyan működik a gradient descent és a backpropagation, ami a neurális hálók tanításának általános módszere. Az erről írt cikkemet itt tudjátok elolvasni. + +A hálózatok tenyésztéséhez a [PyGAD](https://pygad.readthedocs.io/en/latest/) nevű programkönyvtárat használjuk, így mindenek előtt ezt kell telepítenünk, valamint a Tensorflow-t és a Gym-et, amit Colabban már eleve telepítve kapunk. + +Maga a PyGAD egy teljesen általános genetikus algoritmusok futtatására képes rendszer. Ennek a kiterjesztése a KerasGA, ami az általános motor Tensorflow (Keras) neurális hálókon történő futtatását segíti. A 47. sorban létrehozott KerasGA objektum ennek a kiterjesztésnek a része és arra szolgál, hogy a paraméterként átadott modellből a második paraméterben megadott számosságú populációt hozzon létre. Mivel a hálózatunk 386 állítható paraméterrel rendelkezik, ezért a DNS-ünk itt 386 elemből fog állni. A populáció mérete 10 egyed, így a kezdő populációnk egy 10x386 elemű mátrix lesz. Ezt adjuk át az 51. sorban az initial_population paraméterben. + +[![](https://user-images.githubusercontent.com/16560492/101267295-c74c0180-375f-11eb-9ad0-f8e37bd796ce.png)](https://thebojda.medium.com/tensorflow-alapoz%C3%B3-10-24f7767d4a2c) + +## Russian + +### [PyGAD: библиотека для имплементации генетического алгоритма](https://neurohive.io/ru/frameworki/pygad-biblioteka-dlya-implementacii-geneticheskogo-algoritma) + +PyGAD — это библиотека для имплементации генетического алгоритма. Кроме того, библиотека предоставляет доступ к оптимизированным реализациям алгоритмов машинного обучения. PyGAD разрабатывали на Python 3. + +Библиотека PyGAD поддерживает разные типы скрещивания, мутации и селекции родителя. PyGAD позволяет оптимизировать проблемы с помощью генетического алгоритма через кастомизацию целевой функции. + +Кроме генетического алгоритма, библиотека содержит оптимизированные имплементации алгоритмов машинного обучения. На текущий момент PyGAD поддерживает создание и обучение нейросетей для задач классификации. + +Библиотека находится в стадии активной разработки. Создатели планируют добавление функционала для решения бинарных задач и имплементации новых алгоритмов. + +PyGAD разрабатывали на Python 3.7.3. Зависимости включают в себя NumPy для создания и манипуляции массивами и Matplotlib для визуализации. Один из изкейсов использования инструмента — оптимизация весов, которые удовлетворяют заданной функции. + +[![](https://user-images.githubusercontent.com/16560492/101267295-c74c0180-375f-11eb-9ad0-f8e37bd796ce.png)](https://neurohive.io/ru/frameworki/pygad-biblioteka-dlya-implementacii-geneticheskogo-algoritma) + +# Research Papers using PyGAD + +A number of research papers used PyGAD and here are some of them: + +* Alberto Meola, Manuel Winkler, Sören Weinrich, Metaheuristic optimization of data preparation and machine learning hyperparameters for prediction of dynamic methane production, Bioresource Technology, Volume 372, 2023, 128604, ISSN 0960-8524. +* Jaros, Marta, and Jiri Jaros. "Performance-Cost Optimization of Moldable Scientific Workflows." +* Thorat, Divya. "Enhanced genetic algorithm to reduce makespan of multiple jobs in map-reduce application on serverless platform". Diss. Dublin, National College of Ireland, 2020. +* Koch, Chris, and Edgar Dobriban. "AttenGen: Generating Live Attenuated Vaccine Candidates using Machine Learning." (2021). +* Bhardwaj, Bhavya, et al. "Windfarm optimization using Nelder-Mead and Particle Swarm optimization." *2021 7th International Conference on Electrical Energy Systems (ICEES)*. IEEE, 2021. +* Bernardo, Reginald Christian S. and J. Said. “Towards a model-independent reconstruction approach for late-time Hubble data.” (2021). +* Duong, Tri Dung, Qian Li, and Guandong Xu. "Prototype-based Counterfactual Explanation for Causal Classification." *arXiv preprint arXiv:2105.00703* (2021). +* Farrag, Tamer Ahmed, and Ehab E. Elattar. "Optimized Deep Stacked Long Short-Term Memory Network for Long-Term Load Forecasting." *IEEE Access* 9 (2021): 68511-68522. +* Antunes, E. D. O., Caetano, M. F., Marotta, M. A., Araujo, A., Bondan, L., Meneguette, R. I., & Rocha Filho, G. P. (2021, August). Soluções Otimizadas para o Problema de Localização de Máxima Cobertura em Redes Militarizadas 4G/LTE. In *Anais do XXVI Workshop de Gerência e Operação de Redes e Serviços* (pp. 152-165). SBC. +* M. Yani, F. Ardilla, A. A. Saputra and N. Kubota, "Gradient-Free Deep Q-Networks Reinforcement learning: Benchmark and Evaluation," *2021 IEEE Symposium Series on Computational Intelligence (SSCI)*, 2021, pp. 1-5, doi: 10.1109/SSCI50451.2021.9659941. +* Yani, Mohamad, and Naoyuki Kubota. "Deep Convolutional Networks with Genetic Algorithm for Reinforcement Learning Problem." +* Mahendra, Muhammad Ihza, and Isman Kurniawan. "Optimizing Convolutional Neural Network by Using Genetic Algorithm for COVID-19 Detection in Chest X-Ray Image." *2021 International Conference on Data Science and Its Applications (ICoDSA)*. IEEE, 2021. +* Glibota, Vjeko. *Umjeravanje mikroskopskog prometnog modela primjenom genetskog algoritma*. Diss. University of Zagreb. Faculty of Transport and Traffic Sciences. Division of Intelligent Transport Systems and Logistics. Department of Intelligent Transport Systems, 2021. +* Zhu, Mingda. *Genetic Algorithm-based Parameter Identification for Ship Manoeuvring Model under Wind Disturbance*. MS thesis. NTNU, 2021. +* Abdalrahman, Ahmed, and Weihua Zhuang. "Dynamic pricing for differentiated pev charging services using deep reinforcement learning." *IEEE Transactions on Intelligent Transportation Systems* (2020). + +# More Links + +https://rodriguezanton.com/identifying-contact-states-for-2d-objects-using-pygad-and/ + +https://torvaney.github.io/projects/t9-optimised + +# For More Information + +There are different resources that can be used to get started with the genetic algorithm and building it in Python. + +## Tutorial: Implementing Genetic Algorithm in Python + +To start with coding the genetic algorithm, you can check the tutorial titled [**Genetic Algorithm Implementation in Python**](https://www.linkedin.com/pulse/genetic-algorithm-implementation-python-ahmed-gad) available at these links: + +- [LinkedIn](https://www.linkedin.com/pulse/genetic-algorithm-implementation-python-ahmed-gad) +- [Towards Data Science](https://towardsdatascience.com/genetic-algorithm-implementation-in-python-5ab67bb124a6) +- [KDnuggets](https://www.kdnuggets.com/2018/07/genetic-algorithm-implementation-python.html) + +[This tutorial](https://www.linkedin.com/pulse/genetic-algorithm-implementation-python-ahmed-gad) is prepared based on a previous version of the project but it still a good resource to start with coding the genetic algorithm. + +[![Genetic Algorithm Implementation in Python](https://user-images.githubusercontent.com/16560492/78830052-a3c19300-79e7-11ea-8b9b-4b343ea4049c.png)](https://www.linkedin.com/pulse/genetic-algorithm-implementation-python-ahmed-gad) + +## Tutorial: Introduction to Genetic Algorithm + +Get started with the genetic algorithm by reading the tutorial titled [**Introduction to Optimization with Genetic Algorithm**](https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad) which is available at these links: + +* [LinkedIn](https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad) +* [Towards Data Science](https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html) +* [KDnuggets](https://towardsdatascience.com/introduction-to-optimization-with-genetic-algorithm-2f5001d9964b) + +[![Introduction to Genetic Algorithm](https://user-images.githubusercontent.com/16560492/82078259-26252d00-96e1-11ea-9a02-52a99e1054b9.jpg)](https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad) + +## Tutorial: Build Neural Networks in Python + +Read about building neural networks in Python through the tutorial titled [**Artificial Neural Network Implementation using NumPy and Classification of the Fruits360 Image Dataset**](https://www.linkedin.com/pulse/artificial-neural-network-implementation-using-numpy-fruits360-gad) available at these links: + +* [LinkedIn](https://www.linkedin.com/pulse/artificial-neural-network-implementation-using-numpy-fruits360-gad) +* [Towards Data Science](https://towardsdatascience.com/artificial-neural-network-implementation-using-numpy-and-classification-of-the-fruits360-image-3c56affa4491) +* [KDnuggets](https://www.kdnuggets.com/2019/02/artificial-neural-network-implementation-using-numpy-and-image-classification.html) + +[![Building Neural Networks Python](https://user-images.githubusercontent.com/16560492/82078281-30472b80-96e1-11ea-8017-6a1f4383d602.jpg)](https://www.linkedin.com/pulse/artificial-neural-network-implementation-using-numpy-fruits360-gad) + +## Tutorial: Optimize Neural Networks with Genetic Algorithm + +Read about training neural networks using the genetic algorithm through the tutorial titled [**Artificial Neural Networks Optimization using Genetic Algorithm with Python**](https://www.linkedin.com/pulse/artificial-neural-networks-optimization-using-genetic-ahmed-gad) available at these links: + +- [LinkedIn](https://www.linkedin.com/pulse/artificial-neural-networks-optimization-using-genetic-ahmed-gad) +- [Towards Data Science](https://towardsdatascience.com/artificial-neural-networks-optimization-using-genetic-algorithm-with-python-1fe8ed17733e) +- [KDnuggets](https://www.kdnuggets.com/2019/03/artificial-neural-networks-optimization-genetic-algorithm-python.html) + +[![Training Neural Networks using Genetic Algorithm Python](https://user-images.githubusercontent.com/16560492/82078300-376e3980-96e1-11ea-821c-aa6b8ceb44d4.jpg)](https://www.linkedin.com/pulse/artificial-neural-networks-optimization-using-genetic-ahmed-gad) + +## Tutorial: Building CNN in Python + +To start with coding the genetic algorithm, you can check the tutorial titled [**Building Convolutional Neural Network using NumPy from Scratch**](https://www.linkedin.com/pulse/building-convolutional-neural-network-using-numpy-from-ahmed-gad) available at these links: + +- [LinkedIn](https://www.linkedin.com/pulse/building-convolutional-neural-network-using-numpy-from-ahmed-gad) +- [Towards Data Science](https://towardsdatascience.com/building-convolutional-neural-network-using-numpy-from-scratch-b30aac50e50a) +- [KDnuggets](https://www.kdnuggets.com/2018/04/building-convolutional-neural-network-numpy-scratch.html) +- [Chinese Translation](http://m.aliyun.com/yunqi/articles/585741) + +[This tutorial](https://www.linkedin.com/pulse/building-convolutional-neural-network-using-numpy-from-ahmed-gad)) is prepared based on a previous version of the project but it still a good resource to start with coding CNNs. + +[![Building CNN in Python](https://user-images.githubusercontent.com/16560492/82431022-6c3a1200-9a8e-11ea-8f1b-b055196d76e3.png)](https://www.linkedin.com/pulse/building-convolutional-neural-network-using-numpy-from-ahmed-gad) + +## Tutorial: Derivation of CNN from FCNN + +Get started with the genetic algorithm by reading the tutorial titled [**Derivation of Convolutional Neural Network from Fully Connected Network Step-By-Step**](https://www.linkedin.com/pulse/derivation-convolutional-neural-network-from-fully-connected-gad) which is available at these links: + +* [LinkedIn](https://www.linkedin.com/pulse/derivation-convolutional-neural-network-from-fully-connected-gad) +* [Towards Data Science](https://towardsdatascience.com/derivation-of-convolutional-neural-network-from-fully-connected-network-step-by-step-b42ebafa5275) +* [KDnuggets](https://www.kdnuggets.com/2018/04/derivation-convolutional-neural-network-fully-connected-step-by-step.html) + +[![Derivation of CNN from FCNN](https://user-images.githubusercontent.com/16560492/82431369-db176b00-9a8e-11ea-99bd-e845192873fc.png)](https://www.linkedin.com/pulse/derivation-convolutional-neural-network-from-fully-connected-gad) + +## Book: Practical Computer Vision Applications Using Deep Learning with CNNs + +You can also check my book cited as [**Ahmed Fawzy Gad 'Practical Computer Vision Applications Using Deep Learning with CNNs'. Dec. 2018, Apress, 978-1-4842-4167-7**](https://www.amazon.com/Practical-Computer-Vision-Applications-Learning/dp/1484241665) which discusses neural networks, convolutional neural networks, deep learning, genetic algorithm, and more. + +Find the book at these links: + +- [Amazon](https://www.amazon.com/Practical-Computer-Vision-Applications-Learning/dp/1484241665) +- [Springer](https://link.springer.com/book/10.1007/978-1-4842-4167-7) +- [Apress](https://www.apress.com/gp/book/9781484241660) +- [O'Reilly](https://www.oreilly.com/library/view/practical-computer-vision/9781484241677) +- [Google Books](https://books.google.com.eg/books?id=xLd9DwAAQBAJ) + +![Fig04](https://user-images.githubusercontent.com/16560492/78830077-ae7c2800-79e7-11ea-980b-53b6bd879eeb.jpg) + +# Contact Us + +* E-mail: ahmed.f.gad@gmail.com +* [LinkedIn](https://www.linkedin.com/in/ahmedfgad) +* [Amazon Author Page](https://amazon.com/author/ahmedgad) +* [Heartbeat](https://heartbeat.fritz.ai/@ahmedfgad) +* [Paperspace](https://blog.paperspace.com/author/ahmed) +* [KDnuggets](https://kdnuggets.com/author/ahmed-gad) +* [TowardsDataScience](https://towardsdatascience.com/@ahmedfgad) +* [GitHub](https://github.com/ahmedfgad) + +![PYGAD-LOGO](https://user-images.githubusercontent.com/16560492/101267295-c74c0180-375f-11eb-9ad0-f8e37bd796ce.png) + +Thank you for using [PyGAD](https://github.com/ahmedfgad/GeneticAlgorithmPython) :) \ No newline at end of file diff --git a/docs/md/torchga.md b/docs/md/torchga.md new file mode 100644 index 0000000..251b409 --- /dev/null +++ b/docs/md/torchga.md @@ -0,0 +1,792 @@ +# `pygad.torchga` Module + +This section of the PyGAD's library documentation discusses the **pygad.torchga** module. + +The `pygad.torchga` module has a helper class and 2 functions to train PyTorch models using the genetic algorithm (PyGAD). + +The contents of this module are: + +1. `TorchGA`: A class for creating an initial population of all parameters in the PyTorch model. +2. `model_weights_as_vector()`: A function to reshape the PyTorch model weights to a single vector. +3. `model_weights_as_dict()`: A function to restore the PyTorch model weights from a vector. +4. `predict()`: A function to make predictions based on the PyTorch model and a solution. + +More details are given in the next sections. + +# Steps Summary + +The summary of the steps used to train a PyTorch model using PyGAD is as follows: + +1. Create a PyTorch model. +2. Create an instance of the `pygad.torchga.TorchGA` class. +4. Prepare the training data. +5. Build the fitness function. +6. Create an instance of the `pygad.GA` class. +8. Run the genetic algorithm. + +# Create PyTorch Model + +Before discussing training a PyTorch model using PyGAD, the first thing to do is to create the PyTorch model. To get started, please check the [PyTorch library documentation](https://pytorch.org/docs/stable/index.html). + +Here is an example of a PyTorch model. + +```python +import torch + +input_layer = torch.nn.Linear(3, 5) +relu_layer = torch.nn.ReLU() +output_layer = torch.nn.Linear(5, 1) + +model = torch.nn.Sequential(input_layer, + relu_layer, + output_layer) +``` + +Feel free to add the layers of your choice. + +# `pygad.torchga.TorchGA` Class + +The `pygad.torchga` module has a class named `TorchGA` for creating an initial population for the genetic algorithm based on a PyTorch model. The constructor, methods, and attributes within the class are discussed in this section. + +## `__init__()` + +The `pygad.torchga.TorchGA` class constructor accepts the following parameters: + +- `model`: An instance of the PyTorch model. +- `num_solutions`: Number of solutions in the population. Each solution has different parameters of the model. + +## Instance Attributes + +All parameters in the `pygad.torchga.TorchGA` class constructor are used as instance attributes in addition to adding a new attribute called `population_weights`. + +Here is a list of all instance attributes: + +- `model` +- `num_solutions` +- `population_weights`: A nested list holding the weights of all solutions in the population. + +## Methods in the `TorchGA` Class + +This section discusses the methods available for instances of the `pygad.torchga.TorchGA` class. + +### `create_population()` + +The `create_population()` method creates the initial population of the genetic algorithm as a list of solutions where each solution represents different model parameters. The list of networks is assigned to the `population_weights` attribute of the instance. + +# Functions in the `pygad.torchga` Module + +This section discusses the functions in the `pygad.torchga` module. + +## `pygad.torchga.model_weights_as_vector()` + +The `model_weights_as_vector()` function accepts a single parameter named `model` representing the PyTorch model. It returns a vector holding all model weights. The reason for representing the model weights as a vector is that the genetic algorithm expects all parameters of any solution to be in a 1D vector form. + +The function accepts the following parameters: + +- `model`: The PyTorch model. + +It returns a 1D vector holding the model weights. + +## `pygad.torch.model_weights_as_dict()` + +The `model_weights_as_dict()` function accepts the following parameters: + +1. `model`: The PyTorch model. +2. `weights_vector`: The model parameters as a vector. + +It returns the restored model weights in the same form used by the `state_dict()` method. The returned dictionary is ready to be passed to the `load_state_dict()` method for setting the PyTorch model's parameters. + +## `pygad.torchga.predict()` + +The `predict()` function makes a prediction based on a solution. It accepts the following parameters: + +1. `model`: The PyTorch model. +2. `solution`: The solution evolved. +3. `data`: The test data inputs. + +It returns the predictions for the data samples. + +# Examples + +This section gives the complete code of some examples that build and train a PyTorch model using PyGAD. Each subsection builds a different network. + +## Example 1: Regression Example + +The next code builds a simple PyTorch model for regression. The next subsections discuss each part in the code. + +```python +import torch +import torchga +import pygad + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, torch_ga, model, loss_function + + predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) + + abs_error = loss_function(predictions, data_outputs).detach().numpy() + 0.00000001 + + solution_fitness = 1.0 / abs_error + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Create the PyTorch model. +input_layer = torch.nn.Linear(3, 5) +relu_layer = torch.nn.ReLU() +output_layer = torch.nn.Linear(5, 1) + +model = torch.nn.Sequential(input_layer, + relu_layer, + output_layer) +# print(model) + +# Create an instance of the pygad.torchga.TorchGA class to build the initial population. +torch_ga = torchga.TorchGA(model=model, + num_solutions=10) + +loss_function = torch.nn.L1Loss() + +# Data inputs +data_inputs = torch.tensor([[0.02, 0.1, 0.15], + [0.7, 0.6, 0.8], + [1.5, 1.2, 1.7], + [3.2, 2.9, 3.1]]) + +# Data outputs +data_outputs = torch.tensor([[0.1], + [0.6], + [1.3], + [2.5]]) + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 250 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = torch_ga.population_weights # Initial population of network weights + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make predictions based on the best solution. +predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) +print("Predictions : \n", predictions.detach().numpy()) + +abs_error = loss_function(predictions, data_outputs) +print("Absolute Error : ", abs_error.detach().numpy()) +``` + +### Create a PyTorch model + +According to the steps mentioned previously, the first step is to create a PyTorch model. Here is the code that builds the model using the Functional API. + +```python +import torch + +input_layer = torch.nn.Linear(3, 5) +relu_layer = torch.nn.ReLU() +output_layer = torch.nn.Linear(5, 1) + +model = torch.nn.Sequential(input_layer, + relu_layer, + output_layer) +``` + +### Create an Instance of the `pygad.torchga.TorchGA` Class + +The second step is to create an instance of the `pygad.torchga.TorchGA` class. There are 10 solutions per population. Change this number according to your needs. + +```python +import pygad.torchga + +torch_ga = torchga.TorchGA(model=model, + num_solutions=10) +``` + +### Prepare the Training Data + +The third step is to prepare the training data inputs and outputs. Here is an example where there are 4 samples. Each sample has 3 inputs and 1 output. + +```python +import numpy + +# Data inputs +data_inputs = numpy.array([[0.02, 0.1, 0.15], + [0.7, 0.6, 0.8], + [1.5, 1.2, 1.7], + [3.2, 2.9, 3.1]]) + +# Data outputs +data_outputs = numpy.array([[0.1], + [0.6], + [1.3], + [2.5]]) +``` + +### Build the Fitness Function + +The fourth step is to build the fitness function. This function must accept 2 parameters representing the solution and its index within the population. + +The next fitness function calculates the mean absolute error (MAE) of the PyTorch model based on the parameters in the solution. The reciprocal of the MAE is used as the fitness value. Feel free to use any other loss function to calculate the fitness value. + +```python +loss_function = torch.nn.L1Loss() + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, torch_ga, model, loss_function + + predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) + + abs_error = loss_function(predictions, data_outputs).detach().numpy() + 0.00000001 + + solution_fitness = 1.0 / abs_error + + return solution_fitness +``` + +### Create an Instance of the `pygad.GA` Class + +The fifth step is to instantiate the `pygad.GA` class. Note how the `initial_population` parameter is assigned to the initial weights of the PyTorch models. + +For more information, please check the [parameters this class accepts](https://pygad.readthedocs.io/en/latest/pygad.html#init). + +```python +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 250 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = torch_ga.population_weights # Initial population of network weights + +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) +``` + +### Run the Genetic Algorithm + +The sixth and last step is to run the genetic algorithm by calling the `run()` method. + +```python +ga_instance.run() +``` + +After the PyGAD completes its execution, then there is a figure that shows how the fitness value changes by generation. Call the `plot_fitness()` method to show the figure. + +```python +ga_instance.plot_fitness(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4) +``` + +Here is the figure. + +![PyTorch PyGAD XOR Regression 250 Generations](https://user-images.githubusercontent.com/16560492/103469779-22f5b480-4d37-11eb-80dc-95503065ebb1.png) + +To get information about the best solution found by PyGAD, use the `best_solution()` method. + +```python +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") +``` + +```python +Fitness value of the best solution = 145.42425295191546 +Index of the best solution : 0 +``` + +The next code restores the trained model weights using the `model_weights_as_dict()` function. The restored weights are used to calculate the predicted values. + +```python +predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) +print("Predictions : \n", predictions.detach().numpy()) +``` + +```python +Predictions : +[[0.08401088] + [0.60939324] + [1.3010881 ] + [2.5010352 ]] +``` + +The next code measures the trained model error. + +```python +abs_error = loss_function(predictions, data_outputs) +print("Absolute Error : ", abs_error.detach().numpy()) +``` + +``` +Absolute Error : 0.006876422 +``` + +## Example 2: XOR Binary Classification + +The next code creates a PyTorch model to build the XOR binary classification problem. Let's highlight the changes compared to the previous example. + +```python +import torch +import torchga +import pygad + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, torch_ga, model, loss_function + + predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) + + solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Create the PyTorch model. +input_layer = torch.nn.Linear(2, 4) +relu_layer = torch.nn.ReLU() +dense_layer = torch.nn.Linear(4, 2) +output_layer = torch.nn.Softmax(1) + +model = torch.nn.Sequential(input_layer, + relu_layer, + dense_layer, + output_layer) +# print(model) + +# Create an instance of the pygad.torchga.TorchGA class to build the initial population. +torch_ga = torchga.TorchGA(model=model, + num_solutions=10) + +loss_function = torch.nn.BCELoss() + +# XOR problem inputs +data_inputs = torch.tensor([[0.0, 0.0], + [0.0, 1.0], + [1.0, 0.0], + [1.0, 1.0]]) + +# XOR problem outputs +data_outputs = torch.tensor([[1.0, 0.0], + [0.0, 1.0], + [0.0, 1.0], + [1.0, 0.0]]) + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 250 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = torch_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make predictions based on the best solution. +predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) +print("Predictions : \n", predictions.detach().numpy()) + +# Calculate the binary crossentropy for the trained model. +print("Binary Crossentropy : ", loss_function(predictions, data_outputs).detach().numpy()) + +# Calculate the classification accuracy of the trained model. +a = torch.max(predictions, axis=1) +b = torch.max(data_outputs, axis=1) +accuracy = torch.sum(a.indices == b.indices) / len(data_outputs) +print("Accuracy : ", accuracy.detach().numpy()) +``` + +Compared to the previous regression example, here are the changes: + +* The PyTorch model is changed according to the nature of the problem. Now, it has 2 inputs and 2 outputs with an in-between hidden layer of 4 neurons. + +```python +input_layer = torch.nn.Linear(2, 4) +relu_layer = torch.nn.ReLU() +dense_layer = torch.nn.Linear(4, 2) +output_layer = torch.nn.Softmax(1) + +model = torch.nn.Sequential(input_layer, + relu_layer, + dense_layer, + output_layer) +``` + +* The train data is changed. Note that the output of each sample is a 1D vector of 2 values, 1 for each class. + +```python +# XOR problem inputs +data_inputs = torch.tensor([[0.0, 0.0], + [0.0, 1.0], + [1.0, 0.0], + [1.0, 1.0]]) + +# XOR problem outputs +data_outputs = torch.tensor([[1.0, 0.0], + [0.0, 1.0], + [0.0, 1.0], + [1.0, 0.0]]) +``` + +* The fitness value is calculated based on the binary cross entropy. + +```python +loss_function = torch.nn.BCELoss() +``` + +After the previous code completes, the next figure shows how the fitness value change by generation. + +![PyTorch PyGAD XOR Classification 250 Generations](https://user-images.githubusercontent.com/16560492/103469818-c646c980-4d37-11eb-98c3-d9d591acd5e2.png) + +Here is some information about the trained model. Its fitness value is `100000000.0`, loss is `0.0` and accuracy is 100%. + +```python +Fitness value of the best solution = 100000000.0 + +Index of the best solution : 0 + +Predictions : +[[1.0000000e+00 1.3627675e-10] + [3.8521746e-09 1.0000000e+00] + [4.2789325e-10 1.0000000e+00] + [1.0000000e+00 3.3668417e-09]] + +Binary Crossentropy : 0.0 + +Accuracy : 1.0 +``` + +## Example 3: Image Multi-Class Classification (Dense Layers) + +Here is the code. + +```python +import torch +import torchga +import pygad +import numpy + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, torch_ga, model, loss_function + + predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) + + solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Build the PyTorch model using the functional API. +input_layer = torch.nn.Linear(360, 50) +relu_layer = torch.nn.ReLU() +dense_layer = torch.nn.Linear(50, 4) +output_layer = torch.nn.Softmax(1) + +model = torch.nn.Sequential(input_layer, + relu_layer, + dense_layer, + output_layer) + +# Create an instance of the pygad.torchga.TorchGA class to build the initial population. +torch_ga = torchga.TorchGA(model=model, + num_solutions=10) + +loss_function = torch.nn.CrossEntropyLoss() + +# Data inputs +data_inputs = torch.from_numpy(numpy.load("dataset_features.npy")).float() + +# Data outputs +data_outputs = torch.from_numpy(numpy.load("outputs.npy")).long() +# The next 2 lines are equivelant to this Keras function to perform 1-hot encoding: tensorflow.keras.utils.to_categorical(data_outputs) +# temp_outs = numpy.zeros((data_outputs.shape[0], numpy.unique(data_outputs).size), dtype=numpy.uint8) +# temp_outs[numpy.arange(data_outputs.shape[0]), numpy.uint8(data_outputs)] = 1 + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 200 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = torch_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Fetch the parameters of the best solution. +best_solution_weights = torchga.model_weights_as_dict(model=model, + weights_vector=solution) +model.load_state_dict(best_solution_weights) +predictions = model(data_inputs) +# print("Predictions : \n", predictions) + +# Calculate the crossentropy loss of the trained model. +print("Crossentropy : ", loss_function(predictions, data_outputs).detach().numpy()) + +# Calculate the classification accuracy for the trained model. +accuracy = torch.sum(torch.max(predictions, axis=1).indices == data_outputs) / len(data_outputs) +print("Accuracy : ", accuracy.detach().numpy()) +``` + +Compared to the previous binary classification example, this example has multiple classes (4) and thus the loss is measured using cross entropy. + +```python +loss_function = torch.nn.CrossEntropyLoss() +``` + +### Prepare the Training Data + +Before building and training neural networks, the training data (input and output) needs to be prepared. The inputs and the outputs of the training data are NumPy arrays. + +The data used in this example is available as 2 files: + +1. [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy): Data inputs. https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy +2. [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy): Class labels. https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy + +The data consists of 4 classes of images. The image shape is `(100, 100, 3)`. The number of training samples is 1962. The feature vector extracted from each image has a length 360. + +```python +import numpy + +data_inputs = numpy.load("dataset_features.npy") + +data_outputs = numpy.load("outputs.npy") +``` + +The next figure shows how the fitness value changes. + +![PyTorch PyGAD Dense Image Classification 200 Generations](https://user-images.githubusercontent.com/16560492/103469855-5d138600-4d38-11eb-84b1-b5eff8faa7bc.png) + +Here are some statistics about the trained model. + +``` +Fitness value of the best solution = 1.3446997034434534 +Index of the best solution : 0 +Crossentropy : 0.74366045 +Accuracy : 1.0 +``` + +## Example 4: Image Multi-Class Classification (Conv Layers) + +Compared to the previous example that uses only dense layers, this example uses convolutional layers to classify the same dataset. + +Here is the complete code. + +```python +import torch +import torchga +import pygad +import numpy + +def fitness_func(ga_instance, solution, sol_idx): + global data_inputs, data_outputs, torch_ga, model, loss_function + + predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) + + solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001) + + return solution_fitness + +def on_generation(ga_instance): + print(f"Generation = {ga_instance.generations_completed}") + print(f"Fitness = {ga_instance.best_solution()[1]}") + +# Build the PyTorch model. +input_layer = torch.nn.Conv2d(in_channels=3, out_channels=5, kernel_size=7) +relu_layer1 = torch.nn.ReLU() +max_pool1 = torch.nn.MaxPool2d(kernel_size=5, stride=5) + +conv_layer2 = torch.nn.Conv2d(in_channels=5, out_channels=3, kernel_size=3) +relu_layer2 = torch.nn.ReLU() + +flatten_layer1 = torch.nn.Flatten() +# The value 768 is pre-computed by tracing the sizes of the layers' outputs. +dense_layer1 = torch.nn.Linear(in_features=768, out_features=15) +relu_layer3 = torch.nn.ReLU() + +dense_layer2 = torch.nn.Linear(in_features=15, out_features=4) +output_layer = torch.nn.Softmax(1) + +model = torch.nn.Sequential(input_layer, + relu_layer1, + max_pool1, + conv_layer2, + relu_layer2, + flatten_layer1, + dense_layer1, + relu_layer3, + dense_layer2, + output_layer) + +# Create an instance of the pygad.torchga.TorchGA class to build the initial population. +torch_ga = torchga.TorchGA(model=model, + num_solutions=10) + +loss_function = torch.nn.CrossEntropyLoss() + +# Data inputs +data_inputs = torch.from_numpy(numpy.load("dataset_inputs.npy")).float() +data_inputs = data_inputs.reshape((data_inputs.shape[0], data_inputs.shape[3], data_inputs.shape[1], data_inputs.shape[2])) + +# Data outputs +data_outputs = torch.from_numpy(numpy.load("dataset_outputs.npy")).long() + +# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/pygad.html#pygad-ga-class +num_generations = 200 # Number of generations. +num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool. +initial_population = torch_ga.population_weights # Initial population of network weights. + +# Create an instance of the pygad.GA class +ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + initial_population=initial_population, + fitness_func=fitness_func, + on_generation=on_generation) + +# Start the genetic algorithm evolution. +ga_instance.run() + +# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations. +ga_instance.plot_fitness(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4) + +# Returning the details of the best solution. +solution, solution_fitness, solution_idx = ga_instance.best_solution() +print(f"Fitness value of the best solution = {solution_fitness}") +print(f"Index of the best solution : {solution_idx}") + +# Make predictions based on the best solution. +predictions = pygad.torchga.predict(model=model, + solution=solution, + data=data_inputs) +# print("Predictions : \n", predictions) + +# Calculate the crossentropy for the trained model. +print("Crossentropy : ", loss_function(predictions, data_outputs).detach().numpy()) + +# Calculate the classification accuracy for the trained model. +accuracy = torch.sum(torch.max(predictions, axis=1).indices == data_outputs) / len(data_outputs) +print("Accuracy : ", accuracy.detach().numpy()) +``` + +Compared to the previous example, the only change is that the architecture uses convolutional and max-pooling layers. The shape of each input sample is 100x100x3. + +```python +input_layer = torch.nn.Conv2d(in_channels=3, out_channels=5, kernel_size=7) +relu_layer1 = torch.nn.ReLU() +max_pool1 = torch.nn.MaxPool2d(kernel_size=5, stride=5) + +conv_layer2 = torch.nn.Conv2d(in_channels=5, out_channels=3, kernel_size=3) +relu_layer2 = torch.nn.ReLU() + +flatten_layer1 = torch.nn.Flatten() +# The value 768 is pre-computed by tracing the sizes of the layers' outputs. +dense_layer1 = torch.nn.Linear(in_features=768, out_features=15) +relu_layer3 = torch.nn.ReLU() + +dense_layer2 = torch.nn.Linear(in_features=15, out_features=4) +output_layer = torch.nn.Softmax(1) + +model = torch.nn.Sequential(input_layer, + relu_layer1, + max_pool1, + conv_layer2, + relu_layer2, + flatten_layer1, + dense_layer1, + relu_layer3, + dense_layer2, + output_layer) +``` + +### Prepare the Training Data + +The data used in this example is available as 2 files: + +1. [dataset_inputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy): Data inputs. https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_inputs.npy +2. [dataset_outputs.npy](https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy): Class labels. https://github.com/ahmedfgad/NumPyCNN/blob/master/dataset_outputs.npy + +The data consists of 4 classes of images. The image shape is `(100, 100, 3)` and there are 20 images per class for a total of 80 training samples. For more information about the dataset, check the [Reading the Data](https://pygad.readthedocs.io/en/latest/cnn.html#reading-the-data) section of the `pygad.cnn` module. + +Simply download these 2 files and read them according to the next code. + +```python +import numpy + +data_inputs = numpy.load("dataset_inputs.npy") + +data_outputs = numpy.load("dataset_outputs.npy") +``` + +The next figure shows how the fitness value changes. + +![PyTorch PyGAD CNN Image Classification 200 Generations](https://user-images.githubusercontent.com/16560492/103469887-c7c4c180-4d38-11eb-98a7-1c5e73e918d0.png) + +Here are some statistics about the trained model. The model accuracy is 97.5% after the 200 generations. Note that just running the code again may give different results. + +``` +Fitness value of the best solution = 1.3009520689219258 +Index of the best solution : 0 +Crossentropy : 0.7686678 +Accuracy : 0.975 +``` + diff --git a/docs/md/utils.md b/docs/md/utils.md new file mode 100644 index 0000000..6859d97 --- /dev/null +++ b/docs/md/utils.md @@ -0,0 +1,529 @@ +# `pygad.torchga` Module + +This section of the PyGAD's library documentation discusses the **pygad.utils** module. + +PyGAD supports different types of operators for selecting the parents, applying the crossover, and mutation. More features will be added in the future. To ask for a new feature, please check the [Ask for Feature](https://pygad.readthedocs.io/en/latest/releases.html#ask-for-feature) section. + +The submodules in the `pygad.utils` module are: + +1. `crossover`: Has the `Crossover` class that implements the crossover operators. +2. `mutation`: Has the `Mutation` class that implements the mutation operators. +3. `parent_selection`: Has the `ParentSelection` class that implements the parent selection operators. +4. `nsga2`: Has the `NSGA2` class that implements the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). + +Note that the `pygad.GA` class extends all of these classes. So, the user can access any of the methods in such classes directly by the instance/object of the `pygad.GA` class. + +The next sections discuss each submodule. + +# `pygad.utils.crossover` Submodule + +The `pygad.utils.crossover` module has a class named `Crossover` with the supported crossover operations which are: + +1. Single point: Implemented using the `single_point_crossover()` method. +2. Two points: Implemented using the `two_points_crossover()` method. +3. Uniform: Implemented using the `uniform_crossover()` method. +4. Scattered: Implemented using the `scattered_crossover()` method. + +All crossover methods accept this parameter: + +1. `parents`: The parents to mate for producing the offspring. +2. `offspring_size`: The size of the offspring to produce. + +# `pygad.utils.mutation` Submodule + +The `pygad.utils.mutation` module has a class named `Mutation` with the supported mutation operations which are: + +1. Random: Implemented using the `random_mutation()` method. +2. Swap: Implemented using the `swap_mutation()` method. +3. Inversion: Implemented using the `inversion_mutation()` method. +4. Scramble: Implemented using the `scramble_mutation()` method. +5. Adaptive: Implemented using the `adaptive_mutation()` method. + +All mutation methods accept this parameter: + +1. `offspring`: The offspring to mutate. + +The `pygad.utils.mutation` module has some helper methods to assist applying the mutation operation: + +1. `mutation_by_space()`: Applies the mutation using the `gene_space` parameter. +2. `mutation_probs_by_space()`: Uses the mutation probabilities in the `mutation_probabilities` instance attribute to apply the mutation using the `gene_space` parameter. For each gene, if its probability is <= that the mutation probability, then it will be mutated based on the mutation space. +3. `mutation_process_gene_value()`: Generate/select values for the gene that satisfy the constraint. The values could be generated randomly or from the gene space. +4. `mutation_randomly()`: Applies the random mutation. +5. `mutation_probs_randomly()`: Uses the mutation probabilities in the `mutation_probabilities` instance attribute to apply the random mutation. For each gene, if its probability is <= that the mutation probability, then it will be mutated randomly. +6. `adaptive_mutation_population_fitness()`: A helper method to calculate the average fitness of the solutions before applying the adaptive mutation. +7. `adaptive_mutation_by_space()`: Applies the adaptive mutation based on the `gene_space` parameter. A number of genes are selected randomly for mutation. This number depends on the fitness of the solution. The random values are selected from the `gene_space` parameter. +8. `adaptive_mutation_probs_by_space()`: Uses the mutation probabilities to decide which genes to apply the adaptive mutation by space. +9. `adaptive_mutation_randomly()`: Applies the adaptive mutation based on randomly. A number of genes are selected randomly for mutation. This number depends on the fitness of the solution. The random values are selected based on the 2 parameters `andom_mutation_min_val` and `random_mutation_max_val`. +10. `adaptive_mutation_probs_randomly()`: Uses the mutation probabilities to decide which genes to apply the adaptive mutation randomly. + +# Adaptive Mutation + +In the regular genetic algorithm, the mutation works by selecting a single fixed mutation rate for all solutions regardless of their fitness values. So, regardless on whether this solution has high or low quality, the same number of genes are mutated all the time. + +The pitfalls of using a constant mutation rate for all solutions are summarized in this paper [Libelli, S. Marsili, and P. Alba. "Adaptive mutation in genetic algorithms." *Soft computing* 4.2 (2000): 76-80](https://idp.springer.com/authorize/casa?redirect_uri=https://link.springer.com/content/pdf/10.1007/s005000000042.pdf&casa_token=IT4NfJUvslcAAAAA:VegHW6tm2fe3e0R9cRKjuGKkKWXJTQSfNMT6z0kGbMsAllyK1NrEY3cEWg8bj7AJWEQPaqWIJxmHNBHg) as follows: + +> The weak point of "classical" GAs is the total randomness of mutation, which is applied equally to all chromosomes, irrespective of their fitness. Thus a very good chromosome is equally likely to be disrupted by mutation as a bad one. +> +> On the other hand, bad chromosomes are less likely to produce good ones through crossover, because of their lack of building blocks, until they remain unchanged. They would benefit the most from mutation and could be used to spread throughout the parameter space to increase the search thoroughness. So there are two conflicting needs in determining the best probability of mutation. +> +> Usually, a reasonable compromise in the case of a constant mutation is to keep the probability low to avoid disruption of good chromosomes, but this would prevent a high mutation rate of low-fitness chromosomes. Thus a constant probability of mutation would probably miss both goals and result in a slow improvement of the population. + +According to [Libelli, S. Marsili, and P. Alba.](https://idp.springer.com/authorize/casa?redirect_uri=https://link.springer.com/content/pdf/10.1007/s005000000042.pdf&casa_token=IT4NfJUvslcAAAAA:VegHW6tm2fe3e0R9cRKjuGKkKWXJTQSfNMT6z0kGbMsAllyK1NrEY3cEWg8bj7AJWEQPaqWIJxmHNBHg) work, the adaptive mutation solves the problems of constant mutation. + +Adaptive mutation works as follows: + +1. Calculate the average fitness value of the population (`f_avg`). +2. For each chromosome, calculate its fitness value (`f`). +3. If `ff_avg`, then this solution is regarded as a high-quality solution and thus the mutation rate should be kept low to avoid disrupting this high quality solution. + +In PyGAD, if `f=f_avg`, then the solution is regarded of high quality. + +The next figure summarizes the previous steps. + +![Adaptive-Mutation](https://user-images.githubusercontent.com/16560492/103468973-e3c26600-4d2c-11eb-8af3-b3bb39b50540.jpg) + +This strategy is applied in PyGAD. + +## Use Adaptive Mutation in PyGAD + +In [PyGAD 2.10.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-10-0), adaptive mutation is supported. To use it, just follow the following 2 simple steps: + +1. In the constructor of the `pygad.GA` class, set `mutation_type="adaptive"` to specify that the type of mutation is adaptive. +2. Specify the mutation rates for the low and high quality solutions using one of these 3 parameters according to your preference: `mutation_probability`, `mutation_num_genes`, and `mutation_percent_genes`. Please check the [documentation of each of these parameters](https://pygad.readthedocs.io/en/latest/pygad.html#init) for more information. + +When adaptive mutation is used, then the value assigned to any of the 3 parameters can be of any of these data types: + +1. `list` +2. `tuple` +3. `numpy.ndarray` + +Whatever the data type used, the length of the `list`, `tuple`, or the `numpy.ndarray` must be exactly 2. That is there are just 2 values: + +1. The first value is the mutation rate for the low-quality solutions. +2. The second value is the mutation rate for the high-quality solutions. + +PyGAD expects that the first value is higher than the second value and thus a warning is printed in case the first value is lower than the second one. + +Here are some examples to feed the mutation rates: + +```python +# mutation_probability +mutation_probability = [0.25, 0.1] +mutation_probability = (0.35, 0.17) +mutation_probability = numpy.array([0.15, 0.05]) + +# mutation_num_genes +mutation_num_genes = [4, 2] +mutation_num_genes = (3, 1) +mutation_num_genes = numpy.array([7, 2]) + +# mutation_percent_genes +mutation_percent_genes = [25, 12] +mutation_percent_genes = (15, 8) +mutation_percent_genes = numpy.array([21, 13]) +``` + +Assume that the average fitness is 12 and the fitness values of 2 solutions are 15 and 7. If the mutation probabilities are specified as follows: + +```python +mutation_probability = [0.25, 0.1] +``` + +Then the mutation probability of the first solution is 0.1 because its fitness is 15 which is higher than the average fitness 12. The mutation probability of the second solution is 0.25 because its fitness is 7 which is lower than the average fitness 12. + +Here is an example that uses adaptive mutation. + +```python +import pygad +import numpy + +function_inputs = [4,-2,3.5,5,-11,-4.7] # Function inputs. +desired_output = 44 # Function output. + +def fitness_func(ga_instance, solution, solution_idx): + # The fitness function calulates the sum of products between each input and its corresponding weight. + output = numpy.sum(solution*function_inputs) + # The value 0.000001 is used to avoid the Inf value when the denominator numpy.abs(output - desired_output) is 0.0. + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +# Creating an instance of the GA class inside the ga module. Some parameters are initialized within the constructor. +ga_instance = pygad.GA(num_generations=200, + fitness_func=fitness_func, + num_parents_mating=10, + sol_per_pop=20, + num_genes=len(function_inputs), + mutation_type="adaptive", + mutation_num_genes=(3, 1)) + +# Running the GA to optimize the parameters of the function. +ga_instance.run() + +ga_instance.plot_fitness(title="PyGAD with Adaptive Mutation", linewidth=5) +``` + +# `pygad.utils.parent_selection` Submodule + +The `pygad.utils.parent_selection` module has a class named `ParentSelection` with the supported parent selection operations which are: + +1. Steady-state: Implemented using the `steady_state_selection()` method. +2. Roulette wheel: Implemented using the `roulette_wheel_selection()` method. +3. Stochastic universal: Implemented using the `stochastic_universal_selection()`method. +4. Rank: Implemented using the `rank_selection()` method. +5. Random: Implemented using the `random_selection()` method. +6. Tournament: Implemented using the `tournament_selection()` method. +7. NSGA-II: Implemented using the `nsga2_selection()` method. +8. NSGA-II Tournament: Implemented using the `tournament_selection_nsga2()` method. + +All parent selection methods accept these parameters: + +1. `fitness`: The fitness of the entire population. +2. `num_parents`: The number of parents to select. + +It has the following helper methods: + +1. `wheel_cumulative_probs()`: A helper function to calculate the wheel probabilities for these 2 methods: 1) `roulette_wheel_selection()` 2) `rank_selection()` + +# `pygad.utils.nsga2` Submodule + +The `pygad.utils.nsga2` module has a class named `NSGA2` that implements NSGA-II. The methods inside this class are: + +1. `non_dominated_sorting()`: Returns all the pareto fronts by applying non-dominated sorting over the solutions. +2. `get_non_dominated_set()`: Returns the set of non-dominated solutions from the passed solutions. +3. `crowding_distance()`: Calculates the crowding distance for all solutions in the current pareto front. +4. `sort_solutions_nsga2()`: Sort the solutions. If the problem is single-objective, then the solutions are sorted by sorting the fitness values of the population. If it is multi-objective, then non-dominated sorting and crowding distance are applied to sort the solutions. + +# User-Defined Crossover, Mutation, and Parent Selection Operators + +Previously, the user can select the the type of the crossover, mutation, and parent selection operators by assigning the name of the operator to the following parameters of the `pygad.GA` class's constructor: + +1. `crossover_type` +2. `mutation_type` +3. `parent_selection_type` + +This way, the user can only use the built-in functions for each of these operators. + +Starting from [PyGAD 2.16.0](https://pygad.readthedocs.io/en/latest/releases.html#pygad-2-16-0), the user can create a custom crossover, mutation, and parent selection operators and assign these functions to the above parameters. Thus, a new operator can be plugged easily into the [PyGAD Lifecycle](https://pygad.readthedocs.io/en/latest/pygad.html#life-cycle-of-pygad). + +This is a sample code that does not use any custom function. + +```python +import pygad +import numpy + +equation_inputs = [4,-2,3.5] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func) + +ga_instance.run() +ga_instance.plot_fitness() +``` + +This section describes the expected input parameters and outputs. For simplicity, all of these custom functions all accept the instance of the `pygad.GA` class as the last parameter. + +## User-Defined Crossover Operator + +The user-defined crossover function is a Python function that accepts 3 parameters: + +1. The selected parents. +2. The size of the offspring as a tuple of 2 numbers: (the offspring size, number of genes). +3. The instance from the `pygad.GA` class. This instance helps to retrieve any property like `population`, `gene_type`, `gene_space`, etc. + +This function should return a NumPy array of shape equal to the value passed to the second parameter. + +The next code creates a template for the user-defined crossover operator. You can use any names for the parameters. Note how a NumPy array is returned. + +```python +def crossover_func(parents, offspring_size, ga_instance): + offspring = ... + ... + return numpy.array(offspring) +``` + +As an example, the next code creates a single-point crossover function. By randomly generating a random point (i.e. index of a gene), the function simply uses 2 parents to produce an offspring by copying the genes before the point from the first parent and the remaining from the second parent. + +```python +def crossover_func(parents, offspring_size, ga_instance): + offspring = [] + idx = 0 + while len(offspring) != offspring_size[0]: + parent1 = parents[idx % parents.shape[0], :].copy() + parent2 = parents[(idx + 1) % parents.shape[0], :].copy() + + random_split_point = numpy.random.choice(range(offspring_size[1])) + + parent1[random_split_point:] = parent2[random_split_point:] + + offspring.append(parent1) + + idx += 1 + + return numpy.array(offspring) +``` + +To use this user-defined function, simply assign its name to the `crossover_type` parameter in the constructor of the `pygad.GA` class. The next code gives an example. In this case, the custom function will be called in each generation rather than calling the built-in crossover functions defined in PyGAD. + +```python +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + crossover_type=crossover_func) +``` + +## User-Defined Mutation Operator + +A user-defined mutation function/operator can be created the same way a custom crossover operator/function is created. Simply, it is a Python function that accepts 2 parameters: + +1. The offspring to be mutated. +2. The instance from the `pygad.GA` class. This instance helps to retrieve any property like `population`, `gene_type`, `gene_space`, etc. + +The template for the user-defined mutation function is given in the next code. According to the user preference, the function should make some random changes to the genes. + +```python +def mutation_func(offspring, ga_instance): + ... + return offspring +``` + +The next code builds the random mutation where a single gene from each chromosome is mutated by adding a random number between 0 and 1 to the gene's value. + +```python +def mutation_func(offspring, ga_instance): + + for chromosome_idx in range(offspring.shape[0]): + random_gene_idx = numpy.random.choice(range(offspring.shape[1])) + + offspring[chromosome_idx, random_gene_idx] += numpy.random.random() + + return offspring +``` + +Here is how this function is assigned to the `mutation_type` parameter. + +```python +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + crossover_type=crossover_func, + mutation_type=mutation_func) +``` + +Note that there are other things to take into consideration like: + +- Making sure that each gene conforms to the data type(s) listed in the `gene_type` parameter. +- If the `gene_space` parameter is used, then the new value for the gene should conform to the values/ranges listed. +- Mutating a number of genes that conforms to the parameters `mutation_percent_genes`, `mutation_probability`, and `mutation_num_genes`. +- Whether mutation happens with or without replacement based on the `mutation_by_replacement` parameter. +- The minimum and maximum values from which a random value is generated based on the `random_mutation_min_val` and `random_mutation_max_val` parameters. +- Whether duplicates are allowed or not in the chromosome based on the `allow_duplicate_genes` parameter. + +and more. + +It all depends on your objective from building the mutation function. You may neglect or consider some of the considerations according to your objective. + +## User-Defined Parent Selection Operator + +No much to mention about building a user-defined parent selection function as things are similar to building a crossover or mutation function. Just create a Python function that accepts 3 parameters: + +1. The fitness values of the current population. +2. The number of parents needed. +3. The instance from the `pygad.GA` class. This instance helps to retrieve any property like `population`, `gene_type`, `gene_space`, etc. + +The function should return 2 outputs: + +1. The selected parents as a NumPy array. Its shape is equal to (the number of selected parents, `num_genes`). Note that the number of selected parents is equal to the value assigned to the second input parameter. +2. The indices of the selected parents inside the population. It is a 1D list with length equal to the number of selected parents. + +The outputs must be of type `numpy.ndarray`. + +Here is a template for building a custom parent selection function. + +```python +def parent_selection_func(fitness, num_parents, ga_instance): + ... + return parents, fitness_sorted[:num_parents] +``` + +The next code builds the steady-state parent selection where the best parents are selected. The number of parents is equal to the value in the `num_parents` parameter. + +```python +def parent_selection_func(fitness, num_parents, ga_instance): + + fitness_sorted = sorted(range(len(fitness)), key=lambda k: fitness[k]) + fitness_sorted.reverse() + + parents = numpy.empty((num_parents, ga_instance.population.shape[1])) + + for parent_num in range(num_parents): + parents[parent_num, :] = ga_instance.population[fitness_sorted[parent_num], :].copy() + + return parents, numpy.array(fitness_sorted[:num_parents]) +``` + +Finally, the defined function is assigned to the `parent_selection_type` parameter as in the next code. + +```python +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + crossover_type=crossover_func, + mutation_type=mutation_func, + parent_selection_type=parent_selection_func) +``` + +## Example + +By discussing how to customize the 3 operators, the next code uses the previous 3 user-defined functions instead of the built-in functions. + +```python +import pygad +import numpy + +equation_inputs = [4,-2,3.5] +desired_output = 44 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + + return fitness + +def parent_selection_func(fitness, num_parents, ga_instance): + + fitness_sorted = sorted(range(len(fitness)), key=lambda k: fitness[k]) + fitness_sorted.reverse() + + parents = numpy.empty((num_parents, ga_instance.population.shape[1])) + + for parent_num in range(num_parents): + parents[parent_num, :] = ga_instance.population[fitness_sorted[parent_num], :].copy() + + return parents, numpy.array(fitness_sorted[:num_parents]) + +def crossover_func(parents, offspring_size, ga_instance): + + offspring = [] + idx = 0 + while len(offspring) != offspring_size[0]: + parent1 = parents[idx % parents.shape[0], :].copy() + parent2 = parents[(idx + 1) % parents.shape[0], :].copy() + + random_split_point = numpy.random.choice(range(offspring_size[1])) + + parent1[random_split_point:] = parent2[random_split_point:] + + offspring.append(parent1) + + idx += 1 + + return numpy.array(offspring) + +def mutation_func(offspring, ga_instance): + + for chromosome_idx in range(offspring.shape[0]): + random_gene_idx = numpy.random.choice(range(offspring.shape[0])) + + offspring[chromosome_idx, random_gene_idx] += numpy.random.random() + + return offspring + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + crossover_type=crossover_func, + mutation_type=mutation_func, + parent_selection_type=parent_selection_func) + +ga_instance.run() +ga_instance.plot_fitness() +``` + +This is the same example but using methods instead of functions. + +```python +import pygad +import numpy + +equation_inputs = [4,-2,3.5] +desired_output = 44 + +class Test: + def fitness_func(self, ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + + return fitness + + def parent_selection_func(self, fitness, num_parents, ga_instance): + + fitness_sorted = sorted(range(len(fitness)), key=lambda k: fitness[k]) + fitness_sorted.reverse() + + parents = numpy.empty((num_parents, ga_instance.population.shape[1])) + + for parent_num in range(num_parents): + parents[parent_num, :] = ga_instance.population[fitness_sorted[parent_num], :].copy() + + return parents, numpy.array(fitness_sorted[:num_parents]) + + def crossover_func(self, parents, offspring_size, ga_instance): + + offspring = [] + idx = 0 + while len(offspring) != offspring_size[0]: + parent1 = parents[idx % parents.shape[0], :].copy() + parent2 = parents[(idx + 1) % parents.shape[0], :].copy() + + random_split_point = numpy.random.choice(range(offspring_size[0])) + + parent1[random_split_point:] = parent2[random_split_point:] + + offspring.append(parent1) + + idx += 1 + + return numpy.array(offspring) + + def mutation_func(self, offspring, ga_instance): + + for chromosome_idx in range(offspring.shape[0]): + random_gene_idx = numpy.random.choice(range(offspring.shape[1])) + + offspring[chromosome_idx, random_gene_idx] += numpy.random.random() + + return offspring + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=Test().fitness_func, + parent_selection_type=Test().parent_selection_func, + crossover_type=Test().crossover_func, + mutation_type=Test().mutation_func) + +ga_instance.run() +ga_instance.plot_fitness() +``` + diff --git a/docs/md/visualize.md b/docs/md/visualize.md new file mode 100644 index 0000000..74b0934 --- /dev/null +++ b/docs/md/visualize.md @@ -0,0 +1,317 @@ +# `pygad.visualize` Module + +This section of the PyGAD's library documentation discusses the **pygad.visualize** module. It offers the methods for results visualization in PyGAD. + +This section discusses the different options to visualize the results in PyGAD through these methods: + +1. `plot_fitness()`: Creates plots for the fitness. +2. `plot_genes()`: Creates plots for the genes. +3. `plot_new_solution_rate()`: Creates plots for the new solution rate. +4. `plot_pareto_front_curve()`: Creates plots for the pareto front for multi-objective problems. + +In the following code, the `save_solutions` flag is set to `True` which means all solutions are saved in the `solutions` attribute. The code runs for only 10 generations. + +```python +import pygad +import numpy + +equation_inputs = [4, -2, 3.5, 8, -2, 3.5, 8] +desired_output = 2671.1234 + +def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + +ga_instance = pygad.GA(num_generations=10, + sol_per_pop=10, + num_parents_mating=5, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_space=[range(1, 10), range(10, 20), range(15, 30), range(20, 40), range(25, 50), range(10, 30), range(20, 50)], + gene_type=int, + save_solutions=True) + +ga_instance.run() +``` + +Let's explore how to visualize the results by the above mentioned methods. + +# Fitness + +## `plot_fitness()` + +The `plot_fitness()` method shows the fitness value for each generation. It creates, shows, and returns a figure that summarizes how the fitness value(s) evolve(s) by generation. + +It works only after completing at least 1 generation. If no generation is completed (at least 1), an exception is raised. + +This method accepts the following parameters: + +1. `title`: Title of the figure. +2. `xlabel`: X-axis label. +3. `ylabel`: Y-axis label. +4. `linewidth`: Line width of the plot. Defaults to `3`. +5. `font_size`: Font size for the labels and title. Defaults to `14`. +6. `plot_type`: Type of the plot which can be either `"plot"` (default), `"scatter"`, or `"bar"`. +7. `color`: Color of the plot which defaults to the greenish color `"#64f20c"`. +8. `label`: The label used for the legend in the figures of multi-objective problems. It is not used for single-objective problems. It defaults to `None` which means no labels used. +9. `save_dir`: Directory to save the figure. + +### `plot_type="plot"` + +The simplest way to call this method is as follows leaving the `plot_type` with its default value `"plot"` to create a continuous line connecting the fitness values across all generations: + +```python +ga_instance.plot_fitness() +# ga_instance.plot_fitness(plot_type="plot") +``` + +![plot_fitness_plot](https://user-images.githubusercontent.com/16560492/122472609-d02f5280-cf8e-11eb-88a7-f9366ff6e7c6.png) + +### `plot_type="scatter"` + +The `plot_type` can also be set to `"scatter"` to create a scatter graph with each individual fitness represented as a dot. The size of these dots can be changed using the `linewidth` parameter. + +```python +ga_instance.plot_fitness(plot_type="scatter") +``` + +![plot_fitness_scatter](https://user-images.githubusercontent.com/16560492/122473159-75e2c180-cf8f-11eb-942d-31279b286dbd.png) + +### `plot_type="bar"` + +The third value for the `plot_type` parameter is `"bar"` to create a bar graph with each individual fitness represented as a bar. + +```python +ga_instance.plot_fitness(plot_type="bar") +``` + +![plot_fitness_bar](https://user-images.githubusercontent.com/16560492/122473340-b7736c80-cf8f-11eb-89c5-4f7db3b653cc.png) + +# New Solution Rate + +## `plot_new_solution_rate()` + +The `plot_new_solution_rate()` method presents the number of new solutions explored in each generation. This helps to figure out if the genetic algorithm is able to find new solutions as an indication of more possible evolution. If no new solutions are explored, this is an indication that no further evolution is possible. + +It works only after completing at least 1 generation. If no generation is completed (at least 1), an exception is raised. + +The `plot_new_solution_rate()` method accepts the same parameters as in the `plot_fitness()` method (it also have 3 possible values for `plot_type` parameter). Here are all the parameters it accepts: + +1. `title`: Title of the figure. +2. `xlabel`: X-axis label. +3. `ylabel`: Y-axis label. +4. `linewidth`: Line width of the plot. Defaults to `3`. +5. `font_size`: Font size for the labels and title. Defaults to `14`. +6. `plot_type`: Type of the plot which can be either `"plot"` (default), `"scatter"`, or `"bar"`. +7. `color`: Color of the plot which defaults to `"#3870FF"`. +8. `save_dir`: Directory to save the figure. + +### `plot_type="plot"` + +The default value for the `plot_type` parameter is `"plot"`. + +```python +ga_instance.plot_new_solution_rate() +# ga_instance.plot_new_solution_rate(plot_type="plot") +``` + +The next figure shows that, for example, generation 6 has the least number of new solutions which is 4. The number of new solutions in the first generation is always equal to the number of solutions in the population (i.e. the value assigned to the `sol_per_pop` parameter in the constructor of the `pygad.GA` class) which is 10 in this example. + +![plot_new_solution_rate_plot](https://user-images.githubusercontent.com/16560492/122475815-3322e880-cf93-11eb-9648-bf66f823234b.png) + +### `plot_type="scatter"` + +The previous graph can be represented as scattered points by setting `plot_type="scatter"`. + +```python +ga_instance.plot_new_solution_rate(plot_type="scatter") +``` + +![plot_new_solution_rate_scatter](https://user-images.githubusercontent.com/16560492/122476108-adec0380-cf93-11eb-80ac-7588bf90492f.png) + +### `plot_type="bar"` + +By setting `plot_type="scatter"`, each value is represented as a vertical bar. + +```python +ga_instance.plot_new_solution_rate(plot_type="bar") +``` + +![plot_new_solution_rate_bar](https://user-images.githubusercontent.com/16560492/122476173-c2c89700-cf93-11eb-9e77-d39737cd3a96.png) + +# Genes + +## `plot_genes()` + +The `plot_genes()` method is the third option to visualize the PyGAD results. The `plot_genes()` method creates, shows, and returns a figure that describes each gene. It has different options to create the figures which helps to: + +1. Explore the gene value for each generation by creating a normal plot. +2. Create a histogram for each gene. +3. Create a boxplot. + +It works only after completing at least 1 generation. If no generation is completed, an exception is raised. If no generation is completed (at least 1), an exception is raised. + +This method accepts the following parameters: + +1. `title`: Title of the figure. +2. `xlabel`: X-axis label. +3. `ylabel`: Y-axis label. +4. `linewidth`: Line width of the plot. Defaults to `3`. +5. `font_size`: Font size for the labels and title. Defaults to `14`. +6. `plot_type`: Type of the plot which can be either `"plot"` (default), `"scatter"`, or `"bar"`. +7. `graph_type`: Type of the graph which can be either `"plot"` (default), `"boxplot"`, or `"histogram"`. +8. `fill_color`: Fill color of the graph which defaults to `"#3870FF"`. This has no effect if `graph_type="plot"`. +9. `color`: Color of the plot which defaults to `"#3870FF"`. +10. `solutions`: Defaults to `"all"` which means use all solutions. If `"best"` then only the best solutions are used. +11. `save_dir`: Directory to save the figure. + +This method has 3 control variables: + +1. `graph_type="plot"`: Can be `"plot"` (default), `"boxplot"`, or `"histogram"`. +2. `plot_type="plot"`: Identical to the `plot_type` parameter explored in the `plot_fitness()` and `plot_new_solution_rate()` methods. +3. `solutions="all"`: Can be `"all"` (default) or `"best"`. + +These 3 parameters controls the style of the output figure. + +The `graph_type` parameter selects the type of the graph which helps to explore the gene values as: + +1. A normal plot. +2. A histogram. +3. A box and whisker plot. + +The `plot_type` parameter works only when the type of the graph is set to `"plot"`. + +The `solutions` parameter selects whether the genes come from all solutions in the population or from just the best solutions. + +An exception is raised if: + +* `solutions="all"` while `save_solutions=False` in the constructor of the `pygad.GA` class. . +* `solutions="best"` while `save_best_solutions=False` in the constructor of the `pygad.GA` class. . + +### `graph_type="plot"` + +When `graph_type="plot"`, then the figure creates a normal graph where the relationship between the gene values and the generation numbers is represented as a continuous plot, scattered points, or bars. + +#### `plot_type="plot"` + +Because the default value for both `graph_type` and `plot_type` is `"plot"`, then all of the lines below creates the same figure. This figure is helpful to know whether a gene value lasts for more generations as an indication of the best value for this gene. For example, the value 16 for the gene with index 5 (at column 2 and row 2 of the next graph) lasted for 83 generations. + +```python +ga_instance.plot_genes() + +ga_instance.plot_genes(graph_type="plot") + +ga_instance.plot_genes(plot_type="plot") + +ga_instance.plot_genes(graph_type="plot", + plot_type="plot") +``` + +![plot_genes_plot](https://user-images.githubusercontent.com/16560492/122477158-4a62d580-cf95-11eb-8c93-9b6e74cb814c.png) + +As the default value for the `solutions` parameter is `"all"`, then the following method calls generate the same plot. + +```python +ga_instance.plot_genes(solutions="all") + +ga_instance.plot_genes(graph_type="plot", + solutions="all") + +ga_instance.plot_genes(plot_type="plot", + solutions="all") + +ga_instance.plot_genes(graph_type="plot", + plot_type="plot", + solutions="all") +``` + +#### `plot_type="scatter"` + +The following calls of the `plot_genes()` method create the same scatter plot. + +```python +ga_instance.plot_genes(plot_type="scatter") + +ga_instance.plot_genes(graph_type="plot", + plot_type="scatter", + solutions='all') +``` + +![plot_genes_scatter](https://user-images.githubusercontent.com/16560492/122477273-73836600-cf95-11eb-828f-f357c7b0f815.png) + +#### `plot_type="bar"` + +```python +ga_instance.plot_genes(plot_type="bar") + +ga_instance.plot_genes(graph_type="plot", + plot_type="bar", + solutions='all') +``` + +![plot_genes_bar](https://user-images.githubusercontent.com/16560492/122477370-99106f80-cf95-11eb-8643-865b55e6b844.png) + +### `graph_type="boxplot"` + +By setting `graph_type` to `"boxplot"`, then a box and whisker graph is created. Now, the `plot_type` parameter has no effect. + +The following 2 calls of the `plot_genes()` method create the same figure as the default value for the `solutions` parameter is `"all"`. + +```python +ga_instance.plot_genes(graph_type="boxplot") + +ga_instance.plot_genes(graph_type="boxplot", + solutions='all') +``` + +![plot_genes_boxplot](https://user-images.githubusercontent.com/16560492/122479260-beeb4380-cf98-11eb-8f08-23707929b12c.png) + +### `graph_type="histogram"` + +For `graph_type="boxplot"`, then a histogram is created for each gene. Similar to `graph_type="boxplot"`, the `plot_type` parameter has no effect. + +The following 2 calls of the `plot_genes()` method create the same figure as the default value for the `solutions` parameter is `"all"`. + +```python +ga_instance.plot_genes(graph_type="histogram") + +ga_instance.plot_genes(graph_type="histogram", + solutions='all') +``` + +![plot_genes_histogram](https://user-images.githubusercontent.com/16560492/122477314-8007be80-cf95-11eb-9c95-da3f49204151.png) + +All the previous figures can be created for only the best solutions by setting `solutions="best"`. + +# Pareto Front + +## `plot_pareto_front_curve()` + +The `plot_pareto_front_curve()` method creates the Pareto front curve for multi-objective optimization problems. It creates, shows, and returns a figure that shows the Pareto front curve and points representing the fitness. It only works when 2 objectives are used. + +It works only after completing at least 1 generation. If no generation is completed (at least 1), an exception is raised. + +This method accepts the following parameters: + +1. `title`: Title of the figure. +2. `xlabel`: X-axis label. +3. `ylabel`: Y-axis label. +4. `linewidth`: Line width of the plot. Defaults to `3`. +5. `font_size`: Font size for the labels and title. Defaults to `14`. +6. `label`: The label used for the legend. +7. `color`: Color of the plot which defaults to the royal blue color `#FF6347`. +8. `color_fitness`: Color of the fitness points which defaults to the tomato red color `#4169E1`. +9. `grid`: Either `True` or `False` to control the visibility of the grid. +10. `alpha`: The transparency of the pareto front curve. +11. `marker`: The marker of the fitness points. +12. `save_dir`: Directory to save the figure. + +This is an example of calling the `plot_pareto_front_curve()` method. + +```python +ga_instance.plot_pareto_front_curve() +``` + +![plot_fitness_bar](https://github.com/user-attachments/assets/606d853c-7370-41a0-8ddb-857a4c6c7fb9) + diff --git a/docs/source/cnn.rst b/docs/source/cnn.rst index ce2bfe8..eabe5a1 100644 --- a/docs/source/cnn.rst +++ b/docs/source/cnn.rst @@ -107,40 +107,40 @@ Using the ``pygad.cnn.Conv2D`` class, convolution (conv) layers can be created. To create a convolution layer, just create a new instance of the class. The constructor accepts the following parameters: -- ``num_filters``: Number of filters. +- ``num_filters``: Number of filters. -- ``kernel_size``: Filter kernel size. +- ``kernel_size``: Filter kernel size. -- ``previous_layer``: A reference to the previous layer. Using the - ``previous_layer`` attribute, a linked list is created that connects - all network layers. For more information about this attribute, please - check the **previous_layer** attribute section of the ``pygad.nn`` - module documentation. +- ``previous_layer``: A reference to the previous layer. Using the + ``previous_layer`` attribute, a linked list is created that connects + all network layers. For more information about this attribute, please + check the **previous_layer** attribute section of the ``pygad.nn`` + module documentation. -- ``activation_function=None``: A string representing the activation - function to be used in this layer. Defaults to ``None`` which means - no activation function is applied while applying the convolution - layer. An activation layer can be added separately in this case. The - supported activation functions in the conv layer are ``relu`` and - ``sigmoid``. +- ``activation_function=None``: A string representing the activation + function to be used in this layer. Defaults to ``None`` which means no + activation function is applied while applying the convolution layer. + An activation layer can be added separately in this case. The + supported activation functions in the conv layer are ``relu`` and + ``sigmoid``. Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: -- ``filter_bank_size``: Size of the filter bank in this layer. +- ``filter_bank_size``: Size of the filter bank in this layer. -- ``initial_weights``: The initial weights for the conv layer. +- ``initial_weights``: The initial weights for the conv layer. -- ``trained_weights``: The trained weights of the conv layer. This - attribute is initialized by the value in the ``initial_weights`` - attribute. +- ``trained_weights``: The trained weights of the conv layer. This + attribute is initialized by the value in the ``initial_weights`` + attribute. -- ``layer_input_size`` +- ``layer_input_size`` -- ``layer_output_size`` +- ``layer_output_size`` -- ``layer_output`` +- ``layer_output`` Here is an example for creating a conv layer with 2 filters and a kernel size of 3. Note that the ``previous_layer`` parameter is assigned to the @@ -215,22 +215,22 @@ The ``pygad.cnn.MaxPooling2D`` class builds a max pooling layer for the CNN architecture. The constructor of this class accepts the following parameter: -- ``pool_size``: Size of the window. +- ``pool_size``: Size of the window. -- ``previous_layer``: A reference to the previous layer in the CNN - architecture. +- ``previous_layer``: A reference to the previous layer in the CNN + architecture. -- ``stride=2``: A stride that default to 2. +- ``stride=2``: A stride that default to 2. Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: -- ``layer_input_size`` +- ``layer_input_size`` -- ``layer_output_size`` +- ``layer_output_size`` -- ``layer_output`` +- ``layer_output`` .. _pygadcnnaveragepooling2d-class: @@ -252,13 +252,13 @@ constructor accepts only the ``previous_layer`` parameter. The following instance attributes exist: -- ``previous_layer`` +- ``previous_layer`` -- ``layer_input_size`` +- ``layer_input_size`` -- ``layer_output_size`` +- ``layer_output_size`` -- ``layer_output`` +- ``layer_output`` .. _pygadcnnrelu-class: @@ -272,13 +272,13 @@ The constructor accepts only the ``previous_layer`` parameter. The following instance attributes exist: -- ``previous_layer`` +- ``previous_layer`` -- ``layer_input_size`` +- ``layer_input_size`` -- ``layer_output_size`` +- ``layer_output_size`` -- ``layer_output`` +- ``layer_output`` .. _pygadcnnsigmoid-class: @@ -297,30 +297,30 @@ function. The ``pygad.cnn.Dense`` class implement the dense layer. Its constructor accepts the following parameters: -- ``num_neurons``: Number of neurons in the dense layer. +- ``num_neurons``: Number of neurons in the dense layer. -- ``previous_layer``: A reference to the previous layer. +- ``previous_layer``: A reference to the previous layer. -- ``activation_function``: A string representing the activation - function to be used in this layer. Defaults to ``"sigmoid"``. - Currently, the supported activation functions in the dense layer are - ``"sigmoid"``, ``"relu"``, and ``softmax``. +- ``activation_function``: A string representing the activation function + to be used in this layer. Defaults to ``"sigmoid"``. Currently, the + supported activation functions in the dense layer are ``"sigmoid"``, + ``"relu"``, and ``softmax``. Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, some new instance attributes are created which are: -- ``initial_weights``: The initial weights for the dense layer. +- ``initial_weights``: The initial weights for the dense layer. -- ``trained_weights``: The trained weights of the dense layer. This - attribute is initialized by the value in the ``initial_weights`` - attribute. +- ``trained_weights``: The trained weights of the dense layer. This + attribute is initialized by the value in the ``initial_weights`` + attribute. -- ``layer_input_size`` +- ``layer_input_size`` -- ``layer_output_size`` +- ``layer_output_size`` -- ``layer_output`` +- ``layer_output`` .. _pygadcnnmodel-class: @@ -330,12 +330,12 @@ created which are: An instance of the ``pygad.cnn.Model`` class represents a CNN model. The constructor of this class accepts the following parameters: -- ``last_layer``: A reference to the last layer in the CNN architecture - (i.e. dense layer). +- ``last_layer``: A reference to the last layer in the CNN architecture + (i.e. dense layer). -- ``epochs=10``: Number of epochs. +- ``epochs=10``: Number of epochs. -- ``learning_rate=0.01``: Learning rate. +- ``learning_rate=0.01``: Learning rate. Within the constructor, the accepted parameters are used as instance attributes. Besides the parameters, a new instance attribute named @@ -361,9 +361,9 @@ Trains the CNN model. Accepts the following parameters: -- ``train_inputs``: Training data inputs. +- ``train_inputs``: Training data inputs. -- ``train_outputs``: Training data outputs. +- ``train_outputs``: Training data outputs. This method trains the CNN model according to the number of epochs specified in the constructor of the ``pygad.cnn.Model`` class. @@ -397,7 +397,7 @@ Uses the trained CNN for making predictions. Accepts the following parameter: -- ``data_inputs``: The inputs to predict their label. +- ``data_inputs``: The inputs to predict their label. It returns a list holding the samples predictions. @@ -425,19 +425,19 @@ Steps to Build a Neural Network This section discusses how to use the ``pygad.cnn`` module for building a neural network. The summary of the steps are as follows: -- Reading the Data +- Reading the Data -- Building the CNN Architecture +- Building the CNN Architecture -- Building Model +- Building Model -- Model Summary +- Model Summary -- Training the CNN +- Training the CNN -- Making Predictions +- Making Predictions -- Calculating Some Statistics +- Calculating Some Statistics Reading the Data ---------------- diff --git a/docs/source/conf.py b/docs/source/conf.py index 3ae40c9..3055478 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -22,7 +22,7 @@ author = 'Ahmed Fawzy Gad' # The full version, including alpha/beta/rc tags -release = '3.4.0' +release = '3.5.0' master_doc = 'index' diff --git a/docs/source/helper.rst b/docs/source/helper.rst index dddfaac..dd4a7ed 100644 --- a/docs/source/helper.rst +++ b/docs/source/helper.rst @@ -1,29 +1,114 @@ -.. _pygadhelper-module: +.. _`pygadhelper`-module: ``pygad.helper`` Module ======================= This section of the PyGAD's library documentation discusses the -**pygad.helper** module. +``pygad.helper`` module. -Yet, this module has a submodule called ``unique`` that has a class -named ``Unique`` with the following helper methods. Such methods help to -check and fix duplicate values in the genes of a solution. +The ``pygad.helper`` module has 2 submodules: -- ``solve_duplicate_genes_randomly()``: Solves the duplicates in a - solution by randomly selecting new values for the duplicating genes. +1. ``pygad.helper.unique``: A module of methods for creating unique + genes. -- ``solve_duplicate_genes_by_space()``: Solves the duplicates in a - solution by selecting values for the duplicating genes from the gene - space +2. ``pygad.helper.misc``: A module of miscellaneous helper methods. -- ``unique_int_gene_from_range()``: Finds a unique integer value for - the gene. +.. _pygadhelperunique-module: -- ``unique_genes_by_space()``: Loops through all the duplicating genes - to find unique values that from their gene spaces to solve the - duplicates. For each duplicating gene, a call to the - ``unique_gene_by_space()`` is made. +``pygad.helper.unique`` Module +------------------------------ -- ``unique_gene_by_space()``: Returns a unique gene value for a single - gene based on its value space to solve the duplicates. +The ``pygad.helper.unique`` module has a class named ``Unique`` with the +following helper methods. Such methods help to check and fix duplicate +values in the genes of a solution. + +1. ``solve_duplicate_genes_randomly()``: Solves the duplicates in a + solution by randomly selecting new values for the duplicating genes. + +2. ``solve_duplicate_genes_by_space()``: Solves the duplicates in a + solution by selecting values for the duplicating genes from the gene + space + +3. ``unique_int_gene_from_range()``: Finds a unique integer value for + the gene out of a range defined by start and end points. + +4. ``unique_float_gene_from_range()``: Finds a unique float value for + the gene out of a range defined by start and end points. + +5. ``select_unique_value()``: Selects a unique value (if possible) from + a list of gene values. + +6. ``unique_genes_by_space()``: Loops through all the duplicating genes + to find unique values that from their gene spaces to solve the + duplicates. For each duplicating gene, a call to the + ``unique_gene_by_space()`` is made. + +7. ``unique_gene_by_space()``: Returns a unique gene value for a single + gene based on its value space to solve the duplicates. + +8. ``find_two_duplicates()``: Identifies the first occurrence of a + duplicate gene in the solution. + +9. ``unpack_gene_space()``: Unpacks the gene space for selecting a + value to resolve duplicates by converting ranges into lists of + values. + +10. ``solve_duplicates_deeply()``: Sometimes it is impossible to solve + the duplicate genes by simply randomly selecting another value for + either genes. This function solve the duplicates between 2 genes by + searching for a third gene that can make assist in the solution. + +.. _pygadhelpermisc-module: + +``pygad.helper.misc`` Module +---------------------------- + +The ``pygad.helper.misc`` module has a class called ``Helper`` with some +methods to help in different stages of the GA pipeline. It is introduced +in `PyGAD +3.5.0 `__. + +1. ``change_population_dtype_and_round()``: For each gene in the + population, round the gene value and change the data type. + +2. ``change_gene_dtype_and_round()``: Round the change the data type of + a single gene. + +3. ``mutation_change_gene_dtype_and_round()``: Decides whether mutation + is done by replacement or not. Then it rounds and change the data + type of the new gene value. + +4. ``validate_gene_constraint_callable_output()``: Validates the output + of the user-defined callable/function that checks whether the gene + constraint defined in the ``gene_constraint`` parameter is satisfied + or not. + +5. ``get_gene_dtype()``: Returns the gene data type from the + ``gene_type`` instance attribute. + +6. ``get_random_mutation_range()``: Returns the random mutation range + using the ``random_mutation_min_val`` and + ``random_mutation_min_val`` instance attributes. + +7. ``get_initial_population_range()``: Returns the initial population + values range using the ``init_range_low`` and ``init_range_high`` + instance attributes. + +8. ``generate_gene_value_from_space()``: Generates/selects a value for + a gene using the ``gene_space`` instance attribute. + +9. ``generate_gene_value_randomly()``: Generates a random value for the + gene. Only used if ``gene_space`` is ``None``. + +10. ``generate_gene_value()``: Generates a value for the gene. It checks + whether ``gene_space`` is ``None`` and calls either + ``generate_gene_value_randomly()`` or + ``generate_gene_value_from_space()``. + +11. ``filter_gene_values_by_constraint()``: Receives a list of values + for a gene. Then it filters such values using the gene constraint. + +12. ``get_valid_gene_constraint_values()``: Selects one valid gene value + that satisfy the gene constraint. It simply calls + ``generate_gene_value()`` to generate some gene values then it + filters such values using ``filter_gene_values_by_constraint()``. diff --git a/docs/source/index.rst b/docs/source/index.rst index 9b2513d..b90ee31 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -15,6 +15,11 @@ open-source Python library for building the genetic algorithm and optimizing machine learning algorithms. It works with `Keras `__ and `PyTorch `__. + Try the `Optimization Gadget `__, a free + cloud-based tool powered by PyGAD. It simplifies optimization by + reducing or eliminating the need for coding while providing + insightful visualizations. + `PyGAD `__ supports different types of crossover, mutation, and parent selection operators. `PyGAD `__ allows diff --git a/docs/source/pygad.rst b/docs/source/pygad.rst index df84336..7160c2a 100644 --- a/docs/source/pygad.rst +++ b/docs/source/pygad.rst @@ -298,6 +298,26 @@ The ``pygad.GA`` class constructor supports the following parameters: from the start to the end of the range specified by the 2 existing keys ``"low"`` and ``"high"``. +- ``gene_constraint=None``: A list of callables (i.e. functions) acting + as constraints for the gene values. Before selecting a value for a + gene, the callable is called to ensure the candidate value is valid. + Added in `PyGAD + 3.5.0 `__. + Check the `Gene + Constraint `__ + section for more information. + +- ``sample_size=100``: In some cases where a gene value is to be + selected, this variable defines the size of the sample from which a + value is selected randomly. Useful if either ``allow_duplicate_genes`` + or ``gene_constraint`` is used. If PyGAD failed to find a unique value + or a value that meets a gene constraint, it is recommended to + increases this parameter's value. Added in `PyGAD + 3.5.0 `__. + Check the `sample_size + Parameter `__ + section for more information. + - ``on_start=None``: Accepts a function/method to be called only once before the genetic algorithm starts its evolution. If function, then it must accept a single parameter representing the instance of the @@ -616,6 +636,13 @@ Other Methods 4. ``run_update_population()``: Update the ``population`` attribute after completing the processes of crossover and mutation. +There are many methods that are not designed for user usage. Some of +them are listed above but this is not a comprehensive list. The `release +history `__ +section usually covers them. Moreover, you can check the `PyGAD GitHub +repository `__ to +find more. + The next sections discuss the methods available in the ``pygad.GA`` class. diff --git a/docs/source/pygad_more.rst b/docs/source/pygad_more.rst index a992b1e..444f702 100644 --- a/docs/source/pygad_more.rst +++ b/docs/source/pygad_more.rst @@ -1,2454 +1,2636 @@ -More About PyGAD -================ - -Multi-Objective Optimization -============================ - -In `PyGAD -3.2.0 `__, -the library supports multi-objective optimization using the -non-dominated sorting genetic algorithm II (NSGA-II). The code is -exactly similar to the regular code used for single-objective -optimization except for 1 difference. It is the return value of the -fitness function. - -In single-objective optimization, the fitness function returns a single -numeric value. In this example, the variable ``fitness`` is expected to -be a numeric value. - -.. code:: python - - def fitness_func(ga_instance, solution, solution_idx): - ... - return fitness - -But in multi-objective optimization, the fitness function returns any of -these data types: - -1. ``list`` - -2. ``tuple`` - -3. ``numpy.ndarray`` - -.. code:: python - - def fitness_func(ga_instance, solution, solution_idx): - ... - return [fitness1, fitness2, ..., fitnessN] - -Whenever the fitness function returns an iterable of these data types, -then the problem is considered multi-objective. This holds even if there -is a single element in the returned iterable. - -Other than the fitness function, everything else could be the same in -both single and multi-objective problems. - -But it is recommended to use one of these 2 parent selection operators -to solve multi-objective problems: - -1. ``nsga2``: This selects the parents based on non-dominated sorting - and crowding distance. - -2. ``tournament_nsga2``: This selects the parents using tournament - selection which uses non-dominated sorting and crowding distance to - rank the solutions. - -This is a multi-objective optimization example that optimizes these 2 -linear functions: - -1. ``y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6`` - -2. ``y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12`` - -Where: - -1. ``(x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)`` and ``y=50`` - -2. ``(x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5)`` and ``y=30`` - -The 2 functions use the same parameters (weights) ``w1`` to ``w6``. - -The goal is to use PyGAD to find the optimal values for such weights -that satisfy the 2 functions ``y1`` and ``y2``. - -.. code:: python - - import pygad - import numpy - - """ - Given these 2 functions: - y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6 - y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12 - where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50 - and (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30 - What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize these 2 functions. - This is a multi-objective optimization problem. - - PyGAD considers the problem as multi-objective if the fitness function returns: - 1) List. - 2) Or tuple. - 3) Or numpy.ndarray. - """ - - function_inputs1 = [4,-2,3.5,5,-11,-4.7] # Function 1 inputs. - function_inputs2 = [-2,0.7,-9,1.4,3,5] # Function 2 inputs. - desired_output1 = 50 # Function 1 output. - desired_output2 = 30 # Function 2 output. - - def fitness_func(ga_instance, solution, solution_idx): - output1 = numpy.sum(solution*function_inputs1) - output2 = numpy.sum(solution*function_inputs2) - fitness1 = 1.0 / (numpy.abs(output1 - desired_output1) + 0.000001) - fitness2 = 1.0 / (numpy.abs(output2 - desired_output2) + 0.000001) - return [fitness1, fitness2] - - num_generations = 100 - num_parents_mating = 10 - - sol_per_pop = 20 - num_genes = len(function_inputs1) - - ga_instance = pygad.GA(num_generations=num_generations, - num_parents_mating=num_parents_mating, - sol_per_pop=sol_per_pop, - num_genes=num_genes, - fitness_func=fitness_func, - parent_selection_type='nsga2') - - ga_instance.run() - - ga_instance.plot_fitness(label=['Obj 1', 'Obj 2']) - - solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness) - print(f"Parameters of the best solution : {solution}") - print(f"Fitness value of the best solution = {solution_fitness}") - - prediction = numpy.sum(numpy.array(function_inputs1)*solution) - print(f"Predicted output 1 based on the best solution : {prediction}") - prediction = numpy.sum(numpy.array(function_inputs2)*solution) - print(f"Predicted output 2 based on the best solution : {prediction}") - -This is the result of the print statements. The predicted outputs are -close to the desired outputs. - -.. code:: - - Parameters of the best solution : [ 0.79676439 -2.98823386 -4.12677662 5.70539445 -2.02797016 -1.07243922] - Fitness value of the best solution = [ 1.68090829 349.8591915 ] - Predicted output 1 based on the best solution : 50.59491545442283 - Predicted output 2 based on the best solution : 29.99714270722312 - -This is the figure created by the ``plot_fitness()`` method. The fitness -of the first objective has the green color. The blue color is used for -the second objective fitness. - -|image1| - -.. _limit-the-gene-value-range-using-the-genespace-parameter: - -Limit the Gene Value Range using the ``gene_space`` Parameter -============================================================= - -In `PyGAD -2.11.0 `__, -the ``gene_space`` parameter supported a new feature to allow -customizing the range of accepted values for each gene. Let's take a -quick review of the ``gene_space`` parameter to build over it. - -The ``gene_space`` parameter allows the user to feed the space of values -of each gene. This way the accepted values for each gene is retracted to -the user-defined values. Assume there is a problem that has 3 genes -where each gene has different set of values as follows: - -1. Gene 1: ``[0.4, 12, -5, 21.2]`` - -2. Gene 2: ``[-2, 0.3]`` - -3. Gene 3: ``[1.2, 63.2, 7.4]`` - -Then, the ``gene_space`` for this problem is as given below. Note that -the order is very important. - -.. code:: python - - gene_space = [[0.4, 12, -5, 21.2], - [-2, 0.3], - [1.2, 63.2, 7.4]] - -In case all genes share the same set of values, then simply feed a -single list to the ``gene_space`` parameter as follows. In this case, -all genes can only take values from this list of 6 values. - -.. code:: python - - gene_space = [33, 7, 0.5, 95. 6.3, 0.74] - -The previous example restricts the gene values to just a set of fixed -number of discrete values. In case you want to use a range of discrete -values to the gene, then you can use the ``range()`` function. For -example, ``range(1, 7)`` means the set of allowed values for the gene -are ``1, 2, 3, 4, 5, and 6``. You can also use the ``numpy.arange()`` or -``numpy.linspace()`` functions for the same purpose. - -The previous discussion only works with a range of discrete values not -continuous values. In `PyGAD -2.11.0 `__, -the ``gene_space`` parameter can be assigned a dictionary that allows -the gene to have values from a continuous range. - -Assuming you want to restrict the gene within this half-open range [1 to -5) where 1 is included and 5 is not. Then simply create a dictionary -with 2 items where the keys of the 2 items are: - -1. ``'low'``: The minimum value in the range which is 1 in the example. - -2. ``'high'``: The maximum value in the range which is 5 in the example. - -The dictionary will look like that: - -.. code:: python - - {'low': 1, - 'high': 5} - -It is not acceptable to add more than 2 items in the dictionary or use -other keys than ``'low'`` and ``'high'``. - -For a 3-gene problem, the next code creates a dictionary for each gene -to restrict its values in a continuous range. For the first gene, it can -take any floating-point value from the range that starts from 1 -(inclusive) and ends at 5 (exclusive). - -.. code:: python - - gene_space = [{'low': 1, 'high': 5}, {'low': 0.3, 'high': 1.4}, {'low': -0.2, 'high': 4.5}] - -.. _more-about-the-genespace-parameter: - -More about the ``gene_space`` Parameter -======================================= - -The ``gene_space`` parameter customizes the space of values of each -gene. - -Assuming that all genes have the same global space which include the -values 0.3, 5.2, -4, and 8, then those values can be assigned to the -``gene_space`` parameter as a list, tuple, or range. Here is a list -assigned to this parameter. By doing that, then the gene values are -restricted to those assigned to the ``gene_space`` parameter. - -.. code:: python - - gene_space = [0.3, 5.2, -4, 8] - -If some genes have different spaces, then ``gene_space`` should accept a -nested list or tuple. In this case, the elements could be: - -1. Number (of ``int``, ``float``, or ``NumPy`` data types): A single - value to be assigned to the gene. This means this gene will have the - same value across all generations. - -2. ``list``, ``tuple``, ``numpy.ndarray``, or any range like ``range``, - ``numpy.arange()``, or ``numpy.linspace``: It holds the space for - each individual gene. But this space is usually discrete. That is - there is a set of finite values to select from. - -3. ``dict``: To sample a value for a gene from a continuous range. The - dictionary must have 2 mandatory keys which are ``"low"`` and - ``"high"`` in addition to an optional key which is ``"step"``. A - random value is returned between the values assigned to the items - with ``"low"`` and ``"high"`` keys. If the ``"step"`` exists, then - this works as the previous options (i.e. discrete set of values). - -4. ``None``: A gene with its space set to ``None`` is initialized - randomly from the range specified by the 2 parameters - ``init_range_low`` and ``init_range_high``. For mutation, its value - is mutated based on a random value from the range specified by the 2 - parameters ``random_mutation_min_val`` and - ``random_mutation_max_val``. If all elements in the ``gene_space`` - parameter are ``None``, the parameter will not have any effect. - -Assuming that a chromosome has 2 genes and each gene has a different -value space. Then the ``gene_space`` could be assigned a nested -list/tuple where each element determines the space of a gene. - -According to the next code, the space of the first gene is ``[0.4, -5]`` -which has 2 values and the space for the second gene is -``[0.5, -3.2, 8.8, -9]`` which has 4 values. - -.. code:: python - - gene_space = [[0.4, -5], [0.5, -3.2, 8.2, -9]] - -For a 2 gene chromosome, if the first gene space is restricted to the -discrete values from 0 to 4 and the second gene is restricted to the -values from 10 to 19, then it could be specified according to the next -code. - -.. code:: python - - gene_space = [range(5), range(10, 20)] - -The ``gene_space`` can also be assigned to a single range, as given -below, where the values of all genes are sampled from the same range. - -.. code:: python - - gene_space = numpy.arange(15) - -The ``gene_space`` can be assigned a dictionary to sample a value from a -continuous range. - -.. code:: python - - gene_space = {"low": 4, "high": 30} - -A step also can be assigned to the dictionary. This works as if a range -is used. - -.. code:: python - - gene_space = {"low": 4, "high": 30, "step": 2.5} - -.. - - Setting a ``dict`` like ``{"low": 0, "high": 10}`` in the - ``gene_space`` means that random values from the continuous range [0, - 10) are sampled. Note that ``0`` is included but ``10`` is not - included while sampling. Thus, the maximum value that could be - returned is less than ``10`` like ``9.9999``. But if the user decided - to round the genes using, for example, ``[float, 2]``, then this - value will become 10. So, the user should be careful to the inputs. - -If a ``None`` is assigned to only a single gene, then its value will be -randomly generated initially using the ``init_range_low`` and -``init_range_high`` parameters in the ``pygad.GA`` class's constructor. -During mutation, the value are sampled from the range defined by the 2 -parameters ``random_mutation_min_val`` and ``random_mutation_max_val``. -This is an example where the second gene is given a ``None`` value. - -.. code:: python - - gene_space = [range(5), None, numpy.linspace(10, 20, 300)] - -If the user did not assign the initial population to the -``initial_population`` parameter, the initial population is created -randomly based on the ``gene_space`` parameter. Moreover, the mutation -is applied based on this parameter. - -.. _how-mutation-works-with-the-genespace-parameter: - -How Mutation Works with the ``gene_space`` Parameter? ------------------------------------------------------ - -Mutation changes based on whether the ``gene_space`` has a continuous -range or discrete set of values. - -If a gene has its **static/discrete space** defined in the -``gene_space`` parameter, then mutation works by replacing the gene -value by a value randomly selected from the gene space. This happens for -both ``int`` and ``float`` data types. - -For example, the following ``gene_space`` has the static space -``[1, 2, 3]`` defined for the first gene. So, this gene can only have a -value out of these 3 values. - -.. code:: python - - Gene space: [[1, 2, 3], - None] - Solution: [1, 5] - -For a solution like ``[1, 5]``, then mutation happens for the first gene -by simply replacing its current value by a randomly selected value -(other than its current value if possible). So, the value 1 will be -replaced by either 2 or 3. - -For the second gene, its space is set to ``None``. So, traditional -mutation happens for this gene by: - -1. Generating a random value from the range defined by the - ``random_mutation_min_val`` and ``random_mutation_max_val`` - parameters. - -2. Adding this random value to the current gene's value. - -If its current value is 5 and the random value is ``-0.5``, then the new -value is 4.5. If the gene type is integer, then the value will be -rounded. - -On the other hand, if a gene has a **continuous space** defined in the -``gene_space`` parameter, then mutation occurs by adding a random value -to the current gene value. - -For example, the following ``gene_space`` has the continuous space -defined by the dictionary ``{'low': 1, 'high': 5}``. This applies to all -genes. So, mutation is applied to one or more selected genes by adding a -random value to the current gene value. - -.. code:: python - - Gene space: {'low': 1, 'high': 5} - Solution: [1.5, 3.4] - -Assuming ``random_mutation_min_val=-1`` and -``random_mutation_max_val=1``, then a random value such as ``0.3`` can -be added to the gene(s) participating in mutation. If only the first -gene is mutated, then its new value changes from ``1.5`` to -``1.5+0.3=1.8``. Note that PyGAD verifies that the new value is within -the range. In the worst scenarios, the value will be set to either -boundary of the continuous range. For example, if the gene value is 1.5 -and the random value is -0.55, then the new value is 0.95 which smaller -than the lower boundary 1. Thus, the gene value will be rounded to 1. - -If the dictionary has a step like the example below, then it is -considered a discrete range and mutation occurs by randomly selecting a -value from the set of values. In other words, no random value is added -to the gene value. - -.. code:: python - - Gene space: {'low': 1, 'high': 5, 'step': 0.5} - -Stop at Any Generation -====================== - -In `PyGAD -2.4.0 `__, -it is possible to stop the genetic algorithm after any generation. All -you need to do it to return the string ``"stop"`` in the callback -function ``on_generation``. When this callback function is implemented -and assigned to the ``on_generation`` parameter in the constructor of -the ``pygad.GA`` class, then the algorithm immediately stops after -completing its current generation. Let's discuss an example. - -Assume that the user wants to stop algorithm either after the 100 -generations or if a condition is met. The user may assign a value of 100 -to the ``num_generations`` parameter of the ``pygad.GA`` class -constructor. - -The condition that stops the algorithm is written in a callback function -like the one in the next code. If the fitness value of the best solution -exceeds 70, then the string ``"stop"`` is returned. - -.. code:: python - - def func_generation(ga_instance): - if ga_instance.best_solution()[1] >= 70: - return "stop" - -Stop Criteria -============= - -In `PyGAD -2.15.0 `__, -a new parameter named ``stop_criteria`` is added to the constructor of -the ``pygad.GA`` class. It helps to stop the evolution based on some -criteria. It can be assigned to one or more criterion. - -Each criterion is passed as ``str`` that consists of 2 parts: - -1. Stop word. - -2. Number. - -It takes this form: - -.. code:: python - - "word_num" - -The current 2 supported words are ``reach`` and ``saturate``. - -The ``reach`` word stops the ``run()`` method if the fitness value is -equal to or greater than a given fitness value. An example for ``reach`` -is ``"reach_40"`` which stops the evolution if the fitness is >= 40. - -``saturate`` stops the evolution if the fitness saturates for a given -number of consecutive generations. An example for ``saturate`` is -``"saturate_7"`` which means stop the ``run()`` method if the fitness -does not change for 7 consecutive generations. - -Here is an example that stops the evolution if either the fitness value -reached ``127.4`` or if the fitness saturates for ``15`` generations. - -.. code:: python - - import pygad - import numpy - - equation_inputs = [4, -2, 3.5, 8, 9, 4] - desired_output = 44 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution * equation_inputs) - - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - - return fitness - - ga_instance = pygad.GA(num_generations=200, - sol_per_pop=10, - num_parents_mating=4, - num_genes=len(equation_inputs), - fitness_func=fitness_func, - stop_criteria=["reach_127.4", "saturate_15"]) - - ga_instance.run() - print(f"Number of generations passed is {ga_instance.generations_completed}") - -Multi-Objective Stop Criteria ------------------------------ - -When multi-objective is used, then there are 2 options to use the -``stop_criteria`` parameter with the ``reach`` keyword: - -1. Pass a single value to use along the ``reach`` keyword to use across - all the objectives. - -2. Pass multiple values along the ``reach`` keyword. But the number of - values must equal the number of objectives. - -For the ``saturate`` keyword, it is independent to the number of -objectives. - -Suppose there are 3 objectives, this is a working example. It stops when -the fitness value of the 3 objectives reach or exceed 10, 20, and 30, -respectively. - -.. code:: python - - stop_criteria='reach_10_20_30' - -More than one criterion can be used together. In this case, pass the -``stop_criteria`` parameter as an iterable. This is an example. It stops -when either of these 2 conditions hold: - -1. The fitness values of the 3 objectives reach or exceed 10, 20, and - 30, respectively. - -2. The fitness values of the 3 objectives reach or exceed 90, -5.7, and - 10, respectively. - -.. code:: python - - stop_criteria=['reach_10_20_30', 'reach_90_-5.7_10'] - -Elitism Selection -================= - -In `PyGAD -2.18.0 `__, -a new parameter called ``keep_elitism`` is supported. It accepts an -integer to define the number of elitism (i.e. best solutions) to keep in -the next generation. This parameter defaults to ``1`` which means only -the best solution is kept in the next generation. - -In the next example, the ``keep_elitism`` parameter in the constructor -of the ``pygad.GA`` class is set to 2. Thus, the best 2 solutions in -each generation are kept in the next generation. - -.. code:: python - - import numpy - import pygad - - function_inputs = [4,-2,3.5,5,-11,-4.7] - desired_output = 44 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution*function_inputs) - fitness = 1.0 / numpy.abs(output - desired_output) - return fitness - - ga_instance = pygad.GA(num_generations=2, - num_parents_mating=3, - fitness_func=fitness_func, - num_genes=6, - sol_per_pop=5, - keep_elitism=2) - - ga_instance.run() - -The value passed to the ``keep_elitism`` parameter must satisfy 2 -conditions: - -1. It must be ``>= 0``. - -2. It must be ``<= sol_per_pop``. That is its value cannot exceed the - number of solutions in the current population. - -In the previous example, if the ``keep_elitism`` parameter is set equal -to the value passed to the ``sol_per_pop`` parameter, which is 5, then -there will be no evolution at all as in the next figure. This is because -all the 5 solutions are used as elitism in the next generation and no -offspring will be created. - -.. code:: python - - ... - - ga_instance = pygad.GA(..., - sol_per_pop=5, - keep_elitism=5) - - ga_instance.run() - -|image2| - -Note that if the ``keep_elitism`` parameter is effective (i.e. is -assigned a positive integer, not zero), then the ``keep_parents`` -parameter will have no effect. Because the default value of the -``keep_elitism`` parameter is 1, then the ``keep_parents`` parameter has -no effect by default. The ``keep_parents`` parameter is only effective -when ``keep_elitism=0``. - -Random Seed -=========== - -In `PyGAD -2.18.0 `__, -a new parameter called ``random_seed`` is supported. Its value is used -as a seed for the random function generators. - -PyGAD uses random functions in these 2 libraries: - -1. NumPy - -2. random - -The ``random_seed`` parameter defaults to ``None`` which means no seed -is used. As a result, different random numbers are generated for each -run of PyGAD. - -If this parameter is assigned a proper seed, then the results will be -reproducible. In the next example, the integer 2 is used as a random -seed. - -.. code:: python - - import numpy - import pygad - - function_inputs = [4,-2,3.5,5,-11,-4.7] - desired_output = 44 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution*function_inputs) - fitness = 1.0 / numpy.abs(output - desired_output) - return fitness - - ga_instance = pygad.GA(num_generations=2, - num_parents_mating=3, - fitness_func=fitness_func, - sol_per_pop=5, - num_genes=6, - random_seed=2) - - ga_instance.run() - best_solution, best_solution_fitness, best_match_idx = ga_instance.best_solution() - print(best_solution) - print(best_solution_fitness) - -This is the best solution found and its fitness value. - -.. code:: - - [ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267] - 0.04872203136549972 - -After running the code again, it will find the same result. - -.. code:: - - [ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267] - 0.04872203136549972 - -Continue without Losing Progress -================================ - -In `PyGAD -2.18.0 `__, -and thanks for `Felix Bernhard `__ for -opening `this GitHub -issue `__, -the values of these 4 instance attributes are no longer reset after each -call to the ``run()`` method. - -1. ``self.best_solutions`` - -2. ``self.best_solutions_fitness`` - -3. ``self.solutions`` - -4. ``self.solutions_fitness`` - -This helps the user to continue where the last run stopped without -losing the values of these 4 attributes. - -Now, the user can save the model by calling the ``save()`` method. - -.. code:: python - - import pygad - - def fitness_func(ga_instance, solution, solution_idx): - ... - return fitness - - ga_instance = pygad.GA(...) - - ga_instance.run() - - ga_instance.plot_fitness() - - ga_instance.save("pygad_GA") - -Then the saved model is loaded by calling the ``load()`` function. After -calling the ``run()`` method over the loaded instance, then the data -from the previous 4 attributes are not reset but extended with the new -data. - -.. code:: python - - import pygad - - def fitness_func(ga_instance, solution, solution_idx): - ... - return fitness - - loaded_ga_instance = pygad.load("pygad_GA") - - loaded_ga_instance.run() - - loaded_ga_instance.plot_fitness() - -The plot created by the ``plot_fitness()`` method will show the data -collected from both the runs. - -Note that the 2 attributes (``self.best_solutions`` and -``self.best_solutions_fitness``) only work if the -``save_best_solutions`` parameter is set to ``True``. Also, the 2 -attributes (``self.solutions`` and ``self.solutions_fitness``) only work -if the ``save_solutions`` parameter is ``True``. - -Change Population Size during Runtime -===================================== - -Starting from `PyGAD -3.3.0 `__, -the population size can changed during runtime. In other words, the -number of solutions/chromosomes and number of genes can be changed. - -The user has to carefully arrange the list of *parameters* and *instance -attributes* that have to be changed to keep the GA consistent before and -after changing the population size. Generally, change everything that -would be used during the GA evolution. - - CAUTION: If the user failed to change a parameter or an instance - attributes necessary to keep the GA running after the population size - changed, errors will arise. - -These are examples of the parameters that the user should decide whether -to change. The user should check the `list of -parameters `__ -and decide what to change. - -1. ``population``: The population. It *must* be changed. - -2. ``num_offspring``: The number of offspring to produce out of the - crossover and mutation operations. Change this parameter if the - number of offspring have to be changed to be consistent with the new - population size. - -3. ``num_parents_mating``: The number of solutions to select as parents. - Change this parameter if the number of parents have to be changed to - be consistent with the new population size. - -4. ``fitness_func``: If the way of calculating the fitness changes after - the new population size, then the fitness function have to be - changed. - -5. ``sol_per_pop``: The number of solutions per population. It is not - critical to change it but it is recommended to keep this number - consistent with the number of solutions in the ``population`` - parameter. - -These are examples of the instance attributes that might be changed. The -user should check the `list of instance -attributes `__ -and decide what to change. - -1. All the ``last_generation_*`` parameters - - 1. ``last_generation_fitness``: A 1D NumPy array of fitness values of - the population. - - 2. ``last_generation_parents`` and - ``last_generation_parents_indices``: Two NumPy arrays: 2D array - representing the parents and 1D array of the parents indices. - - 3. ``last_generation_elitism`` and - ``last_generation_elitism_indices``: Must be changed if - ``keep_elitism != 0``. The default value of ``keep_elitism`` is 1. - Two NumPy arrays: 2D array representing the elitism and 1D array - of the elitism indices. - -2. ``pop_size``: The population size. - -Prevent Duplicates in Gene Values -================================= - -In `PyGAD -2.13.0 `__, -a new bool parameter called ``allow_duplicate_genes`` is supported to -control whether duplicates are supported in the chromosome or not. In -other words, whether 2 or more genes might have the same exact value. - -If ``allow_duplicate_genes=True`` (which is the default case), genes may -have the same value. If ``allow_duplicate_genes=False``, then no 2 genes -will have the same value given that there are enough unique values for -the genes. - -The next code gives an example to use the ``allow_duplicate_genes`` -parameter. A callback generation function is implemented to print the -population after each generation. - -.. code:: python - - import pygad - - def fitness_func(ga_instance, solution, solution_idx): - return 0 - - def on_generation(ga): - print("Generation", ga.generations_completed) - print(ga.population) - - ga_instance = pygad.GA(num_generations=5, - sol_per_pop=5, - num_genes=4, - mutation_num_genes=3, - random_mutation_min_val=-5, - random_mutation_max_val=5, - num_parents_mating=2, - fitness_func=fitness_func, - gene_type=int, - on_generation=on_generation, - allow_duplicate_genes=False) - ga_instance.run() - -Here are the population after the 5 generations. Note how there are no -duplicate values. - -.. code:: python - - Generation 1 - [[ 2 -2 -3 3] - [ 0 1 2 3] - [ 5 -3 6 3] - [-3 1 -2 4] - [-1 0 -2 3]] - Generation 2 - [[-1 0 -2 3] - [-3 1 -2 4] - [ 0 -3 -2 6] - [-3 0 -2 3] - [ 1 -4 2 4]] - Generation 3 - [[ 1 -4 2 4] - [-3 0 -2 3] - [ 4 0 -2 1] - [-4 0 -2 -3] - [-4 2 0 3]] - Generation 4 - [[-4 2 0 3] - [-4 0 -2 -3] - [-2 5 4 -3] - [-1 2 -4 4] - [-4 2 0 -3]] - Generation 5 - [[-4 2 0 -3] - [-1 2 -4 4] - [ 3 4 -4 0] - [-1 0 2 -2] - [-4 2 -1 1]] - -The ``allow_duplicate_genes`` parameter is configured with use with the -``gene_space`` parameter. Here is an example where each of the 4 genes -has the same space of values that consists of 4 values (1, 2, 3, and 4). - -.. code:: python - - import pygad - - def fitness_func(ga_instance, solution, solution_idx): - return 0 - - def on_generation(ga): - print("Generation", ga.generations_completed) - print(ga.population) - - ga_instance = pygad.GA(num_generations=1, - sol_per_pop=5, - num_genes=4, - num_parents_mating=2, - fitness_func=fitness_func, - gene_type=int, - gene_space=[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], - on_generation=on_generation, - allow_duplicate_genes=False) - ga_instance.run() - -Even that all the genes share the same space of values, no 2 genes -duplicate their values as provided by the next output. - -.. code:: python - - Generation 1 - [[2 3 1 4] - [2 3 1 4] - [2 4 1 3] - [2 3 1 4] - [1 3 2 4]] - Generation 2 - [[1 3 2 4] - [2 3 1 4] - [1 3 2 4] - [2 3 4 1] - [1 3 4 2]] - Generation 3 - [[1 3 4 2] - [2 3 4 1] - [1 3 4 2] - [3 1 4 2] - [3 2 4 1]] - Generation 4 - [[3 2 4 1] - [3 1 4 2] - [3 2 4 1] - [1 2 4 3] - [1 3 4 2]] - Generation 5 - [[1 3 4 2] - [1 2 4 3] - [2 1 4 3] - [1 2 4 3] - [1 2 4 3]] - -You should care of giving enough values for the genes so that PyGAD is -able to find alternatives for the gene value in case it duplicates with -another gene. - -There might be 2 duplicate genes where changing either of the 2 -duplicating genes will not solve the problem. For example, if -``gene_space=[[3, 0, 1], [4, 1, 2], [0, 2], [3, 2, 0]]`` and the -solution is ``[3 2 0 0]``, then the values of the last 2 genes -duplicate. There are no possible changes in the last 2 genes to solve -the problem. - -This problem can be solved by randomly changing one of the -non-duplicating genes that may make a room for a unique value in one the -2 duplicating genes. For example, by changing the second gene from 2 to -4, then any of the last 2 genes can take the value 2 and solve the -duplicates. The resultant gene is then ``[3 4 2 0]``. But this option is -not yet supported in PyGAD. - -Solve Duplicates using a Third Gene ------------------------------------ - -When ``allow_duplicate_genes=False`` and a user-defined ``gene_space`` -is used, it sometimes happen that there is no room to solve the -duplicates between the 2 genes by simply replacing the value of one gene -by another gene. In `PyGAD -3.1.0 `__, -the duplicates are solved by looking for a third gene that will help in -solving the duplicates. The following examples explain how it works. - -Example 1: - -Let's assume that this gene space is used and there is a solution with 2 -duplicate genes with the same value 4. - -.. code:: python - - Gene space: [[2, 3], - [3, 4], - [4, 5], - [5, 6]] - Solution: [3, 4, 4, 5] - -By checking the gene space, the second gene can have the values -``[3, 4]`` and the third gene can have the values ``[4, 5]``. To solve -the duplicates, we have the value of any of these 2 genes. - -If the value of the second gene changes from 4 to 3, then it will be -duplicate with the first gene. If we are to change the value of the -third gene from 4 to 5, then it will duplicate with the fourth gene. As -a conclusion, trying to just selecting a different gene value for either -the second or third genes will introduce new duplicating genes. - -When there are 2 duplicate genes but there is no way to solve their -duplicates, then the solution is to change a third gene that makes a -room to solve the duplicates between the 2 genes. - -In our example, duplicates between the second and third genes can be -solved by, for example,: - -- Changing the first gene from 3 to 2 then changing the second gene from - 4 to 3. - -- Or changing the fourth gene from 5 to 6 then changing the third gene - from 4 to 5. - -Generally, this is how to solve such duplicates: - -1. For any duplicate gene **GENE1**, select another value. - -2. Check which other gene **GENEX** has duplicate with this new value. - -3. Find if **GENEX** can have another value that will not cause any more - duplicates. If so, go to step 7. - -4. If all the other values of **GENEX** will cause duplicates, then try - another gene **GENEY**. - -5. Repeat steps 3 and 4 until exploring all the genes. - -6. If there is no possibility to solve the duplicates, then there is not - way to solve the duplicates and we have to keep the duplicate value. - -7. If a value for a gene **GENEM** is found that will not cause more - duplicates, then use this value for the gene **GENEM**. - -8. Replace the value of the gene **GENE1** by the old value of the gene - **GENEM**. This solves the duplicates. - -This is an example to solve the duplicate for the solution -``[3, 4, 4, 5]``: - -1. Let's use the second gene with value 4. Because the space of this - gene is ``[3, 4]``, then the only other value we can select is 3. - -2. The first gene also have the value 3. - -3. The first gene has another value 2 that will not cause more - duplicates in the solution. Then go to step 7. - -4. Skip. - -5. Skip. - -6. Skip. - -7. The value of the first gene 3 will be replaced by the new value 2. - The new solution is [2, 4, 4, 5]. - -8. Replace the value of the second gene 4 by the old value of the first - gene which is 3. The new solution is [2, 3, 4, 5]. The duplicate is - solved. - -Example 2: - -.. code:: python - - Gene space: [[0, 1], - [1, 2], - [2, 3], - [3, 4]] - Solution: [1, 2, 2, 3] - -The quick summary is: - -- Change the value of the first gene from 1 to 0. The solution becomes - [0, 2, 2, 3]. - -- Change the value of the second gene from 2 to 1. The solution becomes - [0, 1, 2, 3]. The duplicate is solved. - -.. _more-about-the-genetype-parameter: - -More about the ``gene_type`` Parameter -====================================== - -The ``gene_type`` parameter allows the user to control the data type for -all genes at once or each individual gene. In `PyGAD -2.15.0 `__, -the ``gene_type`` parameter also supports customizing the precision for -``float`` data types. As a result, the ``gene_type`` parameter helps to: - -1. Select a data type for all genes with or without precision. - -2. Select a data type for each individual gene with or without - precision. - -Let's discuss things by examples. - -Data Type for All Genes without Precision ------------------------------------------ - -The data type for all genes can be specified by assigning the numeric -data type directly to the ``gene_type`` parameter. This is an example to -make all genes of ``int`` data types. - -.. code:: python - - gene_type=int - -Given that the supported numeric data types of PyGAD include Python's -``int`` and ``float`` in addition to all numeric types of ``NumPy``, -then any of these types can be assigned to the ``gene_type`` parameter. - -If no precision is specified for a ``float`` data type, then the -complete floating-point number is kept. - -The next code uses an ``int`` data type for all genes where the genes in -the initial and final population are only integers. - -.. code:: python - - import pygad - import numpy - - equation_inputs = [4, -2, 3.5, 8, -2] - desired_output = 2671.1234 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution * equation_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - return fitness - - ga_instance = pygad.GA(num_generations=10, - sol_per_pop=5, - num_parents_mating=2, - num_genes=len(equation_inputs), - fitness_func=fitness_func, - gene_type=int) - - print("Initial Population") - print(ga_instance.initial_population) - - ga_instance.run() - - print("Final Population") - print(ga_instance.population) - -.. code:: python - - Initial Population - [[ 1 -1 2 0 -3] - [ 0 -2 0 -3 -1] - [ 0 -1 -1 2 0] - [-2 3 -2 3 3] - [ 0 0 2 -2 -2]] - - Final Population - [[ 1 -1 2 2 0] - [ 1 -1 2 2 0] - [ 1 -1 2 2 0] - [ 1 -1 2 2 0] - [ 1 -1 2 2 0]] - -Data Type for All Genes with Precision --------------------------------------- - -A precision can only be specified for a ``float`` data type and cannot -be specified for integers. Here is an example to use a precision of 3 -for the ``float`` data type. In this case, all genes are of type -``float`` and their maximum precision is 3. - -.. code:: python - - gene_type=[float, 3] - -The next code uses prints the initial and final population where the -genes are of type ``float`` with precision 3. - -.. code:: python - - import pygad - import numpy - - equation_inputs = [4, -2, 3.5, 8, -2] - desired_output = 2671.1234 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution * equation_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - - return fitness - - ga_instance = pygad.GA(num_generations=10, - sol_per_pop=5, - num_parents_mating=2, - num_genes=len(equation_inputs), - fitness_func=fitness_func, - gene_type=[float, 3]) - - print("Initial Population") - print(ga_instance.initial_population) - - ga_instance.run() - - print("Final Population") - print(ga_instance.population) - -.. code:: python - - Initial Population - [[-2.417 -0.487 3.623 2.457 -2.362] - [-1.231 0.079 -1.63 1.629 -2.637] - [ 0.692 -2.098 0.705 0.914 -3.633] - [ 2.637 -1.339 -1.107 -0.781 -3.896] - [-1.495 1.378 -1.026 3.522 2.379]] - - Final Population - [[ 1.714 -1.024 3.623 3.185 -2.362] - [ 0.692 -1.024 3.623 3.185 -2.362] - [ 0.692 -1.024 3.623 3.375 -2.362] - [ 0.692 -1.024 4.041 3.185 -2.362] - [ 1.714 -0.644 3.623 3.185 -2.362]] - -Data Type for each Individual Gene without Precision ----------------------------------------------------- - -In `PyGAD -2.14.0 `__, -the ``gene_type`` parameter allows customizing the gene type for each -individual gene. This is by using a ``list``/``tuple``/``numpy.ndarray`` -with number of elements equal to the number of genes. For each element, -a type is specified for the corresponding gene. - -This is an example for a 5-gene problem where different types are -assigned to the genes. - -.. code:: python - - gene_type=[int, float, numpy.float16, numpy.int8, float] - -This is a complete code that prints the initial and final population for -a custom-gene data type. - -.. code:: python - - import pygad - import numpy - - equation_inputs = [4, -2, 3.5, 8, -2] - desired_output = 2671.1234 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution * equation_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - return fitness - - ga_instance = pygad.GA(num_generations=10, - sol_per_pop=5, - num_parents_mating=2, - num_genes=len(equation_inputs), - fitness_func=fitness_func, - gene_type=[int, float, numpy.float16, numpy.int8, float]) - - print("Initial Population") - print(ga_instance.initial_population) - - ga_instance.run() - - print("Final Population") - print(ga_instance.population) - -.. code:: python - - Initial Population - [[0 0.8615522360026828 0.7021484375 -2 3.5301821368185866] - [-3 2.648189378595294 -3.830078125 1 -0.9586271572917742] - [3 3.7729827570110714 1.2529296875 -3 1.395741994211889] - [0 1.0490687178053282 1.51953125 -2 0.7243617940450235] - [0 -0.6550158436937226 -2.861328125 -2 1.8212734549263097]] - - Final Population - [[3 3.7729827570110714 2.055 0 0.7243617940450235] - [3 3.7729827570110714 1.458 0 -0.14638754050305036] - [3 3.7729827570110714 1.458 0 0.0869406120516778] - [3 3.7729827570110714 1.458 0 0.7243617940450235] - [3 3.7729827570110714 1.458 0 -0.14638754050305036]] - -Data Type for each Individual Gene with Precision -------------------------------------------------- - -The precision can also be specified for the ``float`` data types as in -the next line where the second gene precision is 2 and last gene -precision is 1. - -.. code:: python - - gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]] - -This is a complete example where the initial and final populations are -printed where the genes comply with the data types and precisions -specified. - -.. code:: python - - import pygad - import numpy - - equation_inputs = [4, -2, 3.5, 8, -2] - desired_output = 2671.1234 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution * equation_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - return fitness - - ga_instance = pygad.GA(num_generations=10, - sol_per_pop=5, - num_parents_mating=2, - num_genes=len(equation_inputs), - fitness_func=fitness_func, - gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]]) - - print("Initial Population") - print(ga_instance.initial_population) - - ga_instance.run() - - print("Final Population") - print(ga_instance.population) - -.. code:: python - - Initial Population - [[-2 -1.22 1.716796875 -1 0.2] - [-1 -1.58 -3.091796875 0 -1.3] - [3 3.35 -0.107421875 1 -3.3] - [-2 -3.58 -1.779296875 0 0.6] - [2 -3.73 2.65234375 3 -0.5]] - - Final Population - [[2 -4.22 3.47 3 -1.3] - [2 -3.73 3.47 3 -1.3] - [2 -4.22 3.47 2 -1.3] - [2 -4.58 3.47 3 -1.3] - [2 -3.73 3.47 3 -1.3]] - -Parallel Processing in PyGAD -============================ - -Starting from `PyGAD -2.17.0 `__, -parallel processing becomes supported. This section explains how to use -parallel processing in PyGAD. - -According to the `PyGAD -lifecycle `__, -parallel processing can be parallelized in only 2 operations: - -1. Population fitness calculation. - -2. Mutation. - -The reason is that the calculations in these 2 operations are -independent (i.e. each solution/chromosome is handled independently from -the others) and can be distributed across different processes or -threads. - -For the mutation operation, it does not do intensive calculations on the -CPU. Its calculations are simple like flipping the values of some genes -from 0 to 1 or adding a random value to some genes. So, it does not take -much CPU processing time. Experiments proved that parallelizing the -mutation operation across the solutions increases the time instead of -reducing it. This is because running multiple processes or threads adds -overhead to manage them. Thus, parallel processing cannot be applied on -the mutation operation. - -For the population fitness calculation, parallel processing can help -make a difference and reduce the processing time. But this is -conditional on the type of calculations done in the fitness function. If -the fitness function makes intensive calculations and takes much -processing time from the CPU, then it is probably that parallel -processing will help to cut down the overall time. - -This section explains how parallel processing works in PyGAD and how to -use parallel processing in PyGAD - -How to Use Parallel Processing in PyGAD ---------------------------------------- - -Starting from `PyGAD -2.17.0 `__, -a new parameter called ``parallel_processing`` added to the constructor -of the ``pygad.GA`` class. - -.. code:: python - - import pygad - ... - ga_instance = pygad.GA(..., - parallel_processing=...) - ... - -This parameter allows the user to do the following: - -1. Enable parallel processing. - -2. Select whether processes or threads are used. - -3. Specify the number of processes or threads to be used. - -These are 3 possible values for the ``parallel_processing`` parameter: - -1. ``None``: (Default) It means no parallel processing is used. - -2. A positive integer referring to the number of threads to be used - (i.e. threads, not processes, are used. - -3. ``list``/``tuple``: If a list or a tuple of exactly 2 elements is - assigned, then: - - 1. The first element can be either ``'process'`` or ``'thread'`` to - specify whether processes or threads are used, respectively. - - 2. The second element can be: - - 1. A positive integer to select the maximum number of processes or - threads to be used - - 2. ``0`` to indicate that 0 processes or threads are used. It - means no parallel processing. This is identical to setting - ``parallel_processing=None``. - - 3. ``None`` to use the default value as calculated by the - ``concurrent.futures module``. - -These are examples of the values assigned to the ``parallel_processing`` -parameter: - -- ``parallel_processing=4``: Because the parameter is assigned a - positive integer, this means parallel processing is activated where 4 - threads are used. - -- ``parallel_processing=["thread", 5]``: Use parallel processing with 5 - threads. This is identical to ``parallel_processing=5``. - -- ``parallel_processing=["process", 8]``: Use parallel processing with 8 - processes. - -- ``parallel_processing=["process", 0]``: As the second element is given - the value 0, this means do not use parallel processing. This is - identical to ``parallel_processing=None``. - -Examples --------- - -The examples will help you know the difference between using processes -and threads. Moreover, it will give an idea when parallel processing -would make a difference and reduce the time. These are dummy examples -where the fitness function is made to always return 0. - -The first example uses 10 genes, 5 solutions in the population where -only 3 solutions mate, and 9999 generations. The fitness function uses a -``for`` loop with 100 iterations just to have some calculations. In the -constructor of the ``pygad.GA`` class, ``parallel_processing=None`` -means no parallel processing is used. - -.. code:: python - - import pygad - import time - - def fitness_func(ga_instance, solution, solution_idx): - for _ in range(99): - pass - return 0 - - ga_instance = pygad.GA(num_generations=9999, - num_parents_mating=3, - sol_per_pop=5, - num_genes=10, - fitness_func=fitness_func, - suppress_warnings=True, - parallel_processing=None) - - if __name__ == '__main__': - t1 = time.time() - - ga_instance.run() - - t2 = time.time() - print("Time is", t2-t1) - -When parallel processing is not used, the time it takes to run the -genetic algorithm is ``1.5`` seconds. - -In the comparison, let's do a second experiment where parallel -processing is used with 5 threads. In this case, it take ``5`` seconds. - -.. code:: python - - ... - ga_instance = pygad.GA(..., - parallel_processing=5) - ... - -For the third experiment, processes instead of threads are used. Also, -only 99 generations are used instead of 9999. The time it takes is -``99`` seconds. - -.. code:: python - - ... - ga_instance = pygad.GA(num_generations=99, - ..., - parallel_processing=["process", 5]) - ... - -This is the summary of the 3 experiments: - -1. No parallel processing & 9999 generations: 1.5 seconds. - -2. Parallel processing with 5 threads & 9999 generations: 5 seconds - -3. Parallel processing with 5 processes & 99 generations: 99 seconds - -Because the fitness function does not need much CPU time, the normal -processing takes the least time. Running processes for this simple -problem takes 99 compared to only 5 seconds for threads because managing -processes is much heavier than managing threads. Thus, most of the CPU -time is for swapping the processes instead of executing the code. - -In the second example, the loop makes 99999999 iterations and only 5 -generations are used. With no parallelization, it takes 22 seconds. - -.. code:: python - - import pygad - import time - - def fitness_func(ga_instance, solution, solution_idx): - for _ in range(99999999): - pass - return 0 - - ga_instance = pygad.GA(num_generations=5, - num_parents_mating=3, - sol_per_pop=5, - num_genes=10, - fitness_func=fitness_func, - suppress_warnings=True, - parallel_processing=None) - - if __name__ == '__main__': - t1 = time.time() - ga_instance.run() - t2 = time.time() - print("Time is", t2-t1) - -It takes 15 seconds when 10 processes are used. - -.. code:: python - - ... - ga_instance = pygad.GA(..., - parallel_processing=["process", 10]) - ... - -This is compared to 20 seconds when 10 threads are used. - -.. code:: python - - ... - ga_instance = pygad.GA(..., - parallel_processing=["thread", 10]) - ... - -Based on the second example, using parallel processing with 10 processes -takes the least time because there is much CPU work done. Generally, -processes are preferred over threads when most of the work in on the -CPU. Threads are preferred over processes in some situations like doing -input/output operations. - -*Before releasing* `PyGAD -2.17.0 `__\ *,* -`László -Fazekas `__ -*wrote an article to parallelize the fitness function with PyGAD. Check -it:* `How Genetic Algorithms Can Compete with Gradient Descent and -Backprop `__. - -Print Lifecycle Summary -======================= - -In `PyGAD -2.19.0 `__, -a new method called ``summary()`` is supported. It prints a Keras-like -summary of the PyGAD lifecycle showing the steps, callback functions, -parameters, etc. - -This method accepts the following parameters: - -- ``line_length=70``: An integer representing the length of the single - line in characters. - -- ``fill_character=" "``: A character to fill the lines. - -- ``line_character="-"``: A character for creating a line separator. - -- ``line_character2="="``: A secondary character to create a line - separator. - -- ``columns_equal_len=False``: The table rows are split into equal-sized - columns or split subjective to the width needed. - -- ``print_step_parameters=True``: Whether to print extra parameters - about each step inside the step. If ``print_step_parameters=False`` - and ``print_parameters_summary=True``, then the parameters of each - step are printed at the end of the table. - -- ``print_parameters_summary=True``: Whether to print parameters summary - at the end of the table. If ``print_step_parameters=False``, then the - parameters of each step are printed at the end of the table too. - -This is a quick example to create a PyGAD example. - -.. code:: python - - import pygad - import numpy - - function_inputs = [4,-2,3.5,5,-11,-4.7] - desired_output = 44 - - def genetic_fitness(solution, solution_idx): - output = numpy.sum(solution*function_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - return fitness - - def on_gen(ga): - pass - - def on_crossover_callback(a, b): - pass - - ga_instance = pygad.GA(num_generations=100, - num_parents_mating=10, - sol_per_pop=20, - num_genes=len(function_inputs), - on_crossover=on_crossover_callback, - on_generation=on_gen, - parallel_processing=2, - stop_criteria="reach_10", - fitness_batch_size=4, - crossover_probability=0.4, - fitness_func=genetic_fitness) - -Then call the ``summary()`` method to print the summary with the default -parameters. Note that entries for the crossover and generation callback -function are created because their callback functions are implemented -through the ``on_crossover_callback()`` and ``on_gen()``, respectively. - -.. code:: python - - ga_instance.summary() - -.. code:: bash - - ---------------------------------------------------------------------- - PyGAD Lifecycle - ====================================================================== - Step Handler Output Shape - ====================================================================== - Fitness Function genetic_fitness() (1) - Fitness batch size: 4 - ---------------------------------------------------------------------- - Parent Selection steady_state_selection() (10, 6) - Number of Parents: 10 - ---------------------------------------------------------------------- - Crossover single_point_crossover() (10, 6) - Crossover probability: 0.4 - ---------------------------------------------------------------------- - On Crossover on_crossover_callback() None - ---------------------------------------------------------------------- - Mutation random_mutation() (10, 6) - Mutation Genes: 1 - Random Mutation Range: (-1.0, 1.0) - Mutation by Replacement: False - Allow Duplicated Genes: True - ---------------------------------------------------------------------- - On Generation on_gen() None - Stop Criteria: [['reach', 10.0]] - ---------------------------------------------------------------------- - ====================================================================== - Population Size: (20, 6) - Number of Generations: 100 - Initial Population Range: (-4, 4) - Keep Elitism: 1 - Gene DType: [, None] - Parallel Processing: ['thread', 2] - Save Best Solutions: False - Save Solutions: False - ====================================================================== - -We can set the ``print_step_parameters`` and -``print_parameters_summary`` parameters to ``False`` to not print the -parameters. - -.. code:: python - - ga_instance.summary(print_step_parameters=False, - print_parameters_summary=False) - -.. code:: bash - - ---------------------------------------------------------------------- - PyGAD Lifecycle - ====================================================================== - Step Handler Output Shape - ====================================================================== - Fitness Function genetic_fitness() (1) - ---------------------------------------------------------------------- - Parent Selection steady_state_selection() (10, 6) - ---------------------------------------------------------------------- - Crossover single_point_crossover() (10, 6) - ---------------------------------------------------------------------- - On Crossover on_crossover_callback() None - ---------------------------------------------------------------------- - Mutation random_mutation() (10, 6) - ---------------------------------------------------------------------- - On Generation on_gen() None - ---------------------------------------------------------------------- - ====================================================================== - -Logging Outputs -=============== - -In `PyGAD -3.0.0 `__, -the ``print()`` statement is no longer used and the outputs are printed -using the `logging `__ -module. A a new parameter called ``logger`` is supported to accept the -user-defined logger. - -.. code:: python - - import logging - - logger = ... - - ga_instance = pygad.GA(..., - logger=logger, - ...) - -The default value for this parameter is ``None``. If there is no logger -passed (i.e. ``logger=None``), then a default logger is created to log -the messages to the console exactly like how the ``print()`` statement -works. - -Some advantages of using the the -`logging `__ module -instead of the ``print()`` statement are: - -1. The user has more control over the printed messages specially if - there is a project that uses multiple modules where each module - prints its messages. A logger can organize the outputs. - -2. Using the proper ``Handler``, the user can log the output messages to - files and not only restricted to printing it to the console. So, it - is much easier to record the outputs. - -3. The format of the printed messages can be changed by customizing the - ``Formatter`` assigned to the Logger. - -This section gives some quick examples to use the ``logging`` module and -then gives an example to use the logger with PyGAD. - -Logging to the Console ----------------------- - -This is an example to create a logger to log the messages to the -console. - -.. code:: python - - import logging - - # Create a logger - logger = logging.getLogger(__name__) - - # Set the logger level to debug so that all the messages are printed. - logger.setLevel(logging.DEBUG) - - # Create a stream handler to log the messages to the console. - stream_handler = logging.StreamHandler() - - # Set the handler level to debug. - stream_handler.setLevel(logging.DEBUG) - - # Create a formatter - formatter = logging.Formatter('%(message)s') - - # Add the formatter to handler. - stream_handler.setFormatter(formatter) - - # Add the stream handler to the logger - logger.addHandler(stream_handler) - -Now, we can log messages to the console with the format specified in the -``Formatter``. - -.. code:: python - - logger.debug('Debug message.') - logger.info('Info message.') - logger.warning('Warn message.') - logger.error('Error message.') - logger.critical('Critical message.') - -The outputs are identical to those returned using the ``print()`` -statement. - -.. code:: - - Debug message. - Info message. - Warn message. - Error message. - Critical message. - -By changing the format of the output messages, we can have more -information about each message. - -.. code:: python - - formatter = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') - -This is a sample output. - -.. code:: python - - 2023-04-03 18:46:27 DEBUG: Debug message. - 2023-04-03 18:46:27 INFO: Info message. - 2023-04-03 18:46:27 WARNING: Warn message. - 2023-04-03 18:46:27 ERROR: Error message. - 2023-04-03 18:46:27 CRITICAL: Critical message. - -Note that you may need to clear the handlers after finishing the -execution. This is to make sure no cached handlers are used in the next -run. If the cached handlers are not cleared, then the single output -message may be repeated. - -.. code:: python - - logger.handlers.clear() - -Logging to a File ------------------ - -This is another example to log the messages to a file named -``logfile.txt``. The formatter prints the following about each message: - -1. The date and time at which the message is logged. - -2. The log level. - -3. The message. - -4. The path of the file. - -5. The lone number of the log message. - -.. code:: python - - import logging - - level = logging.DEBUG - name = 'logfile.txt' - - logger = logging.getLogger(name) - logger.setLevel(level) - - file_handler = logging.FileHandler(name, 'a+', 'utf-8') - file_handler.setLevel(logging.DEBUG) - file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S') - file_handler.setFormatter(file_format) - logger.addHandler(file_handler) - -This is how the outputs look like. - -.. code:: python - - 2023-04-03 18:54:03 DEBUG: Debug message. - c:\users\agad069\desktop\logger\example2.py:46 - 2023-04-03 18:54:03 INFO: Info message. - c:\users\agad069\desktop\logger\example2.py:47 - 2023-04-03 18:54:03 WARNING: Warn message. - c:\users\agad069\desktop\logger\example2.py:48 - 2023-04-03 18:54:03 ERROR: Error message. - c:\users\agad069\desktop\logger\example2.py:49 - 2023-04-03 18:54:03 CRITICAL: Critical message. - c:\users\agad069\desktop\logger\example2.py:50 - -Consider clearing the handlers if necessary. - -.. code:: python - - logger.handlers.clear() - -Log to Both the Console and a File ----------------------------------- - -This is an example to create a single Logger associated with 2 handlers: - -1. A file handler. - -2. A stream handler. - -.. code:: python - - import logging - - level = logging.DEBUG - name = 'logfile.txt' - - logger = logging.getLogger(name) - logger.setLevel(level) - - file_handler = logging.FileHandler(name,'a+','utf-8') - file_handler.setLevel(logging.DEBUG) - file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S') - file_handler.setFormatter(file_format) - logger.addHandler(file_handler) - - console_handler = logging.StreamHandler() - console_handler.setLevel(logging.INFO) - console_format = logging.Formatter('%(message)s') - console_handler.setFormatter(console_format) - logger.addHandler(console_handler) - -When a log message is executed, then it is both printed to the console -and saved in the ``logfile.txt``. - -Consider clearing the handlers if necessary. - -.. code:: python - - logger.handlers.clear() - -PyGAD Example -------------- - -To use the logger in PyGAD, just create your custom logger and pass it -to the ``logger`` parameter. - -.. code:: python - - import logging - import pygad - import numpy - - level = logging.DEBUG - name = 'logfile.txt' - - logger = logging.getLogger(name) - logger.setLevel(level) - - file_handler = logging.FileHandler(name,'a+','utf-8') - file_handler.setLevel(logging.DEBUG) - file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') - file_handler.setFormatter(file_format) - logger.addHandler(file_handler) - - console_handler = logging.StreamHandler() - console_handler.setLevel(logging.INFO) - console_format = logging.Formatter('%(message)s') - console_handler.setFormatter(console_format) - logger.addHandler(console_handler) - - equation_inputs = [4, -2, 8] - desired_output = 2671.1234 - - def fitness_func(ga_instance, solution, solution_idx): - output = numpy.sum(solution * equation_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - return fitness - - def on_generation(ga_instance): - ga_instance.logger.info(f"Generation = {ga_instance.generations_completed}") - ga_instance.logger.info(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}") - - ga_instance = pygad.GA(num_generations=10, - sol_per_pop=40, - num_parents_mating=2, - keep_parents=2, - num_genes=len(equation_inputs), - fitness_func=fitness_func, - on_generation=on_generation, - logger=logger) - ga_instance.run() - - logger.handlers.clear() - -By executing this code, the logged messages are printed to the console -and also saved in the text file. - -.. code:: python - - 2023-04-03 19:04:27 INFO: Generation = 1 - 2023-04-03 19:04:27 INFO: Fitness = 0.00038086960368076276 - 2023-04-03 19:04:27 INFO: Generation = 2 - 2023-04-03 19:04:27 INFO: Fitness = 0.00038214871408010853 - 2023-04-03 19:04:27 INFO: Generation = 3 - 2023-04-03 19:04:27 INFO: Fitness = 0.0003832795907974678 - 2023-04-03 19:04:27 INFO: Generation = 4 - 2023-04-03 19:04:27 INFO: Fitness = 0.00038398612055017196 - 2023-04-03 19:04:27 INFO: Generation = 5 - 2023-04-03 19:04:27 INFO: Fitness = 0.00038442348890867516 - 2023-04-03 19:04:27 INFO: Generation = 6 - 2023-04-03 19:04:27 INFO: Fitness = 0.0003854406039137763 - 2023-04-03 19:04:27 INFO: Generation = 7 - 2023-04-03 19:04:27 INFO: Fitness = 0.00038646083174063284 - 2023-04-03 19:04:27 INFO: Generation = 8 - 2023-04-03 19:04:27 INFO: Fitness = 0.0003875169193024936 - 2023-04-03 19:04:27 INFO: Generation = 9 - 2023-04-03 19:04:27 INFO: Fitness = 0.0003888816727311021 - 2023-04-03 19:04:27 INFO: Generation = 10 - 2023-04-03 19:04:27 INFO: Fitness = 0.000389832593101348 - -Solve Non-Deterministic Problems -================================ - -PyGAD can be used to solve both deterministic and non-deterministic -problems. Deterministic are those that return the same fitness for the -same solution. For non-deterministic problems, a different fitness value -would be returned for the same solution. - -By default, PyGAD settings are set to solve deterministic problems. -PyGAD can save the explored solutions and their fitness to reuse in the -future. These instances attributes can save the solutions: - -1. ``solutions``: Exists if ``save_solutions=True``. - -2. ``best_solutions``: Exists if ``save_best_solutions=True``. - -3. ``last_generation_elitism``: Exists if ``keep_elitism`` > 0. - -4. ``last_generation_parents``: Exists if ``keep_parents`` > 0 or - ``keep_parents=-1``. - -To configure PyGAD for non-deterministic problems, we have to disable -saving the previous solutions. This is by setting these parameters: - -1. ``keep_elitism=0`` - -2. ``keep_parents=0`` - -3. ``keep_solutions=False`` - -4. ``keep_best_solutions=False`` - -.. code:: python - - import pygad - ... - ga_instance = pygad.GA(..., - keep_elitism=0, - keep_parents=0, - save_solutions=False, - save_best_solutions=False, - ...) - -This way PyGAD will not save any explored solution and thus the fitness -function have to be called for each individual solution. - -Reuse the Fitness instead of Calling the Fitness Function -========================================================= - -It may happen that a previously explored solution in generation X is -explored again in another generation Y (where Y > X). For some problems, -calling the fitness function takes much time. - -For deterministic problems, it is better to not call the fitness -function for an already explored solutions. Instead, reuse the fitness -of the old solution. PyGAD supports some options to help you save time -calling the fitness function for a previously explored solution. - -The parameters explored in this section can be set in the constructor of -the ``pygad.GA`` class. - -The ``cal_pop_fitness()`` method of the ``pygad.GA`` class checks these -parameters to see if there is a possibility of reusing the fitness -instead of calling the fitness function. - -.. _1-savesolutions: - -1. ``save_solutions`` ---------------------- - -It defaults to ``False``. If set to ``True``, then the population of -each generation is saved into the ``solutions`` attribute of the -``pygad.GA`` instance. In other words, every single solution is saved in -the ``solutions`` attribute. - -.. _2-savebestsolutions: - -2. ``save_best_solutions`` --------------------------- - -It defaults to ``False``. If ``True``, then it only saves the best -solution in every generation. - -.. _3-keepelitism: - -3. ``keep_elitism`` -------------------- - -It accepts an integer and defaults to 1. If set to a positive integer, -then it keeps the elitism of one generation available in the next -generation. - -.. _4-keepparents: - -4. ``keep_parents`` -------------------- - -It accepts an integer and defaults to -1. It set to ``-1`` or a positive -integer, then it keeps the parents of one generation available in the -next generation. - -Why the Fitness Function is not Called for Solution at Index 0? -=============================================================== - -PyGAD has a parameter called ``keep_elitism`` which defaults to 1. This -parameter defines the number of best solutions in generation **X** to -keep in the next generation **X+1**. The best solutions are just copied -from generation **X** to generation **X+1** without making any change. - -.. code:: python - - ga_instance = pygad.GA(..., - keep_elitism=1, - ...) - -The best solutions are copied at the beginning of the population. If -``keep_elitism=1``, this means the best solution in generation X is kept -in the next generation X+1 at index 0 of the population. If -``keep_elitism=2``, this means the 2 best solutions in generation X are -kept in the next generation X+1 at indices 0 and 1 of the population of -generation 1. - -Because the fitness of these best solutions are already calculated in -generation X, then their fitness values will not be recalculated at -generation X+1 (i.e. the fitness function will not be called for these -solutions again). Instead, their fitness values are just reused. This is -why you see that no solution with index 0 is passed to the fitness -function. - -To force calling the fitness function for each solution in every -generation, consider setting ``keep_elitism`` and ``keep_parents`` to 0. -Moreover, keep the 2 parameters ``save_solutions`` and -``save_best_solutions`` to their default value ``False``. - -.. code:: python - - ga_instance = pygad.GA(..., - keep_elitism=0, - keep_parents=0, - save_solutions=False, - save_best_solutions=False, - ...) - -Batch Fitness Calculation -========================= - -In `PyGAD -2.19.0 `__, -a new optional parameter called ``fitness_batch_size`` is supported. A -new optional parameter called ``fitness_batch_size`` is supported to -calculate the fitness function in batches. Thanks to `Linan -Qiu `__ for opening the `GitHub issue -#136 `__. - -Its values can be: - -- ``1`` or ``None``: If the ``fitness_batch_size`` parameter is assigned - the value ``1`` or ``None`` (default), then the normal flow is used - where the fitness function is called for each individual solution. - That is if there are 15 solutions, then the fitness function is called - 15 times. - -- ``1 < fitness_batch_size <= sol_per_pop``: If the - ``fitness_batch_size`` parameter is assigned a value satisfying this - condition ``1 < fitness_batch_size <= sol_per_pop``, then the - solutions are grouped into batches of size ``fitness_batch_size`` and - the fitness function is called once for each batch. In this case, the - fitness function must return a list/tuple/numpy.ndarray with a length - equal to the number of solutions passed. - -.. _example-without-fitnessbatchsize-parameter: - -Example without ``fitness_batch_size`` Parameter ------------------------------------------------- - -This is an example where the ``fitness_batch_size`` parameter is given -the value ``None`` (which is the default value). This is equivalent to -using the value ``1``. In this case, the fitness function will be called -for each solution. This means the fitness function ``fitness_func`` will -receive only a single solution. This is an example of the passed -arguments to the fitness function: - -.. code:: - - solution: [ 2.52860734, -0.94178795, 2.97545704, 0.84131987, -3.78447118, 2.41008358] - solution_idx: 3 - -The fitness function also must return a single numeric value as the -fitness for the passed solution. - -As we have a population of ``20`` solutions, then the fitness function -is called 20 times per generation. For 5 generations, then the fitness -function is called ``20*5 = 100`` times. In PyGAD, the fitness function -is called after the last generation too and this adds additional 20 -times. So, the total number of calls to the fitness function is -``20*5 + 20 = 120``. - -Note that the ``keep_elitism`` and ``keep_parents`` parameters are set -to ``0`` to make sure no fitness values are reused and to force calling -the fitness function for each individual solution. - -.. code:: python - - import pygad - import numpy - - function_inputs = [4,-2,3.5,5,-11,-4.7] - desired_output = 44 - - number_of_calls = 0 - - def fitness_func(ga_instance, solution, solution_idx): - global number_of_calls - number_of_calls = number_of_calls + 1 - output = numpy.sum(solution*function_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - return fitness - - ga_instance = pygad.GA(num_generations=5, - num_parents_mating=10, - sol_per_pop=20, - fitness_func=fitness_func, - fitness_batch_size=None, - # fitness_batch_size=1, - num_genes=len(function_inputs), - keep_elitism=0, - keep_parents=0) - - ga_instance.run() - print(number_of_calls) - -.. code:: - - 120 - -.. _example-with-fitnessbatchsize-parameter: - -Example with ``fitness_batch_size`` Parameter ---------------------------------------------- - -This is an example where the ``fitness_batch_size`` parameter is used -and assigned the value ``4``. This means the solutions will be grouped -into batches of ``4`` solutions. The fitness function will be called -once for each patch (i.e. called once for each 4 solutions). - -This is an example of the arguments passed to it: - -.. code:: python - - solutions: - [[ 3.1129432 -0.69123589 1.93792414 2.23772968 -1.54616001 -0.53930799] - [ 3.38508121 0.19890812 1.93792414 2.23095014 -3.08955597 3.10194128] - [ 2.37079504 -0.88819803 2.97545704 1.41742256 -3.95594055 2.45028256] - [ 2.52860734 -0.94178795 2.97545704 0.84131987 -3.78447118 2.41008358]] - solutions_indices: - [16, 17, 18, 19] - -As we have 20 solutions, then there are ``20/4 = 5`` patches. As a -result, the fitness function is called only 5 times per generation -instead of 20. For each call to the fitness function, it receives a -batch of 4 solutions. - -As we have 5 generations, then the function will be called ``5*5 = 25`` -times. Given the call to the fitness function after the last generation, -then the total number of calls is ``5*5 + 5 = 30``. - -.. code:: python - - import pygad - import numpy - - function_inputs = [4,-2,3.5,5,-11,-4.7] - desired_output = 44 - - number_of_calls = 0 - - def fitness_func_batch(ga_instance, solutions, solutions_indices): - global number_of_calls - number_of_calls = number_of_calls + 1 - batch_fitness = [] - for solution in solutions: - output = numpy.sum(solution*function_inputs) - fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) - batch_fitness.append(fitness) - return batch_fitness - - ga_instance = pygad.GA(num_generations=5, - num_parents_mating=10, - sol_per_pop=20, - fitness_func=fitness_func_batch, - fitness_batch_size=4, - num_genes=len(function_inputs), - keep_elitism=0, - keep_parents=0) - - ga_instance.run() - print(number_of_calls) - -.. code:: - - 30 - -When batch fitness calculation is used, then we saved ``120 - 30 = 90`` -calls to the fitness function. - -Use Functions and Methods to Build Fitness and Callbacks -======================================================== - -In PyGAD 2.19.0, it is possible to pass user-defined functions or -methods to the following parameters: - -1. ``fitness_func`` - -2. ``on_start`` - -3. ``on_fitness`` - -4. ``on_parents`` - -5. ``on_crossover`` - -6. ``on_mutation`` - -7. ``on_generation`` - -8. ``on_stop`` - -This section gives 2 examples to assign these parameters user-defined: - -1. Functions. - -2. Methods. - -Assign Functions ----------------- - -This is a dummy example where the fitness function returns a random -value. Note that the instance of the ``pygad.GA`` class is passed as the -last parameter of all functions. - -.. code:: python - - import pygad - import numpy - - def fitness_func(ga_instanse, solution, solution_idx): - return numpy.random.rand() - - def on_start(ga_instanse): - print("on_start") - - def on_fitness(ga_instanse, last_gen_fitness): - print("on_fitness") - - def on_parents(ga_instanse, last_gen_parents): - print("on_parents") - - def on_crossover(ga_instanse, last_gen_offspring): - print("on_crossover") - - def on_mutation(ga_instanse, last_gen_offspring): - print("on_mutation") - - def on_generation(ga_instanse): - print("on_generation\n") - - def on_stop(ga_instanse, last_gen_fitness): - print("on_stop") - - ga_instance = pygad.GA(num_generations=5, - num_parents_mating=4, - sol_per_pop=10, - num_genes=2, - on_start=on_start, - on_fitness=on_fitness, - on_parents=on_parents, - on_crossover=on_crossover, - on_mutation=on_mutation, - on_generation=on_generation, - on_stop=on_stop, - fitness_func=fitness_func) - - ga_instance.run() - -Assign Methods --------------- - -The next example has all the method defined inside the class ``Test``. -All of the methods accept an additional parameter representing the -method's object of the class ``Test``. - -All methods accept ``self`` as the first parameter and the instance of -the ``pygad.GA`` class as the last parameter. - -.. code:: python - - import pygad - import numpy - - class Test: - def fitness_func(self, ga_instanse, solution, solution_idx): - return numpy.random.rand() - - def on_start(self, ga_instanse): - print("on_start") - - def on_fitness(self, ga_instanse, last_gen_fitness): - print("on_fitness") - - def on_parents(self, ga_instanse, last_gen_parents): - print("on_parents") - - def on_crossover(self, ga_instanse, last_gen_offspring): - print("on_crossover") - - def on_mutation(self, ga_instanse, last_gen_offspring): - print("on_mutation") - - def on_generation(self, ga_instanse): - print("on_generation\n") - - def on_stop(self, ga_instanse, last_gen_fitness): - print("on_stop") - - ga_instance = pygad.GA(num_generations=5, - num_parents_mating=4, - sol_per_pop=10, - num_genes=2, - on_start=Test().on_start, - on_fitness=Test().on_fitness, - on_parents=Test().on_parents, - on_crossover=Test().on_crossover, - on_mutation=Test().on_mutation, - on_generation=Test().on_generation, - on_stop=Test().on_stop, - fitness_func=Test().fitness_func) - - ga_instance.run() - -.. |image1| image:: https://github.com/ahmedfgad/GeneticAlgorithmPython/assets/16560492/7896f8d8-01c5-4ff9-8d15-52191c309b63 -.. |image2| image:: https://user-images.githubusercontent.com/16560492/189273225-67ffad41-97ab-45e1-9324-429705e17b20.png +More About PyGAD +================ + +Multi-Objective Optimization +============================ + +In `PyGAD +3.2.0 `__, +the library supports multi-objective optimization using the +non-dominated sorting genetic algorithm II (NSGA-II). The code is +exactly similar to the regular code used for single-objective +optimization except for 1 difference. It is the return value of the +fitness function. + +In single-objective optimization, the fitness function returns a single +numeric value. In this example, the variable ``fitness`` is expected to +be a numeric value. + +.. code:: python + + def fitness_func(ga_instance, solution, solution_idx): + ... + return fitness + +But in multi-objective optimization, the fitness function returns any of +these data types: + +1. ``list`` + +2. ``tuple`` + +3. ``numpy.ndarray`` + +.. code:: python + + def fitness_func(ga_instance, solution, solution_idx): + ... + return [fitness1, fitness2, ..., fitnessN] + +Whenever the fitness function returns an iterable of these data types, +then the problem is considered multi-objective. This holds even if there +is a single element in the returned iterable. + +Other than the fitness function, everything else could be the same in +both single and multi-objective problems. + +But it is recommended to use one of these 2 parent selection operators +to solve multi-objective problems: + +1. ``nsga2``: This selects the parents based on non-dominated sorting + and crowding distance. + +2. ``tournament_nsga2``: This selects the parents using tournament + selection which uses non-dominated sorting and crowding distance to + rank the solutions. + +This is a multi-objective optimization example that optimizes these 2 +linear functions: + +1. ``y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6`` + +2. ``y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12`` + +Where: + +1. ``(x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)`` and ``y=50`` + +2. ``(x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5)`` and ``y=30`` + +The 2 functions use the same parameters (weights) ``w1`` to ``w6``. + +The goal is to use PyGAD to find the optimal values for such weights +that satisfy the 2 functions ``y1`` and ``y2``. + +.. code:: python + + import pygad + import numpy + + """ + Given these 2 functions: + y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6 + y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12 + where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50 + and (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30 + What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize these 2 functions. + This is a multi-objective optimization problem. + + PyGAD considers the problem as multi-objective if the fitness function returns: + 1) List. + 2) Or tuple. + 3) Or numpy.ndarray. + """ + + function_inputs1 = [4,-2,3.5,5,-11,-4.7] # Function 1 inputs. + function_inputs2 = [-2,0.7,-9,1.4,3,5] # Function 2 inputs. + desired_output1 = 50 # Function 1 output. + desired_output2 = 30 # Function 2 output. + + def fitness_func(ga_instance, solution, solution_idx): + output1 = numpy.sum(solution*function_inputs1) + output2 = numpy.sum(solution*function_inputs2) + fitness1 = 1.0 / (numpy.abs(output1 - desired_output1) + 0.000001) + fitness2 = 1.0 / (numpy.abs(output2 - desired_output2) + 0.000001) + return [fitness1, fitness2] + + num_generations = 100 + num_parents_mating = 10 + + sol_per_pop = 20 + num_genes = len(function_inputs1) + + ga_instance = pygad.GA(num_generations=num_generations, + num_parents_mating=num_parents_mating, + sol_per_pop=sol_per_pop, + num_genes=num_genes, + fitness_func=fitness_func, + parent_selection_type='nsga2') + + ga_instance.run() + + ga_instance.plot_fitness(label=['Obj 1', 'Obj 2']) + + solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness) + print(f"Parameters of the best solution : {solution}") + print(f"Fitness value of the best solution = {solution_fitness}") + + prediction = numpy.sum(numpy.array(function_inputs1)*solution) + print(f"Predicted output 1 based on the best solution : {prediction}") + prediction = numpy.sum(numpy.array(function_inputs2)*solution) + print(f"Predicted output 2 based on the best solution : {prediction}") + +This is the result of the print statements. The predicted outputs are +close to the desired outputs. + +.. code:: + + Parameters of the best solution : [ 0.79676439 -2.98823386 -4.12677662 5.70539445 -2.02797016 -1.07243922] + Fitness value of the best solution = [ 1.68090829 349.8591915 ] + Predicted output 1 based on the best solution : 50.59491545442283 + Predicted output 2 based on the best solution : 29.99714270722312 + +This is the figure created by the ``plot_fitness()`` method. The fitness +of the first objective has the green color. The blue color is used for +the second objective fitness. + +|image1| + +.. _limit-the-gene-value-range-using-the-genespace-parameter: + +Limit the Gene Value Range using the ``gene_space`` Parameter +============================================================= + +In `PyGAD +2.11.0 `__, +the ``gene_space`` parameter supported a new feature to allow +customizing the range of accepted values for each gene. Let's take a +quick review of the ``gene_space`` parameter to build over it. + +The ``gene_space`` parameter allows the user to feed the space of values +of each gene. This way the accepted values for each gene is retracted to +the user-defined values. Assume there is a problem that has 3 genes +where each gene has different set of values as follows: + +1. Gene 1: ``[0.4, 12, -5, 21.2]`` + +2. Gene 2: ``[-2, 0.3]`` + +3. Gene 3: ``[1.2, 63.2, 7.4]`` + +Then, the ``gene_space`` for this problem is as given below. Note that +the order is very important. + +.. code:: python + + gene_space = [[0.4, 12, -5, 21.2], + [-2, 0.3], + [1.2, 63.2, 7.4]] + +In case all genes share the same set of values, then simply feed a +single list to the ``gene_space`` parameter as follows. In this case, +all genes can only take values from this list of 6 values. + +.. code:: python + + gene_space = [33, 7, 0.5, 95. 6.3, 0.74] + +The previous example restricts the gene values to just a set of fixed +number of discrete values. In case you want to use a range of discrete +values to the gene, then you can use the ``range()`` function. For +example, ``range(1, 7)`` means the set of allowed values for the gene +are ``1, 2, 3, 4, 5, and 6``. You can also use the ``numpy.arange()`` or +``numpy.linspace()`` functions for the same purpose. + +The previous discussion only works with a range of discrete values not +continuous values. In `PyGAD +2.11.0 `__, +the ``gene_space`` parameter can be assigned a dictionary that allows +the gene to have values from a continuous range. + +Assuming you want to restrict the gene within this half-open range [1 to +5) where 1 is included and 5 is not. Then simply create a dictionary +with 2 items where the keys of the 2 items are: + +1. ``'low'``: The minimum value in the range which is 1 in the example. + +2. ``'high'``: The maximum value in the range which is 5 in the example. + +The dictionary will look like that: + +.. code:: python + + {'low': 1, + 'high': 5} + +It is not acceptable to add more than 2 items in the dictionary or use +other keys than ``'low'`` and ``'high'``. + +For a 3-gene problem, the next code creates a dictionary for each gene +to restrict its values in a continuous range. For the first gene, it can +take any floating-point value from the range that starts from 1 +(inclusive) and ends at 5 (exclusive). + +.. code:: python + + gene_space = [{'low': 1, 'high': 5}, {'low': 0.3, 'high': 1.4}, {'low': -0.2, 'high': 4.5}] + +.. _more-about-the-genespace-parameter: + +More about the ``gene_space`` Parameter +======================================= + +The ``gene_space`` parameter customizes the space of values of each +gene. + +Assuming that all genes have the same global space which include the +values 0.3, 5.2, -4, and 8, then those values can be assigned to the +``gene_space`` parameter as a list, tuple, or range. Here is a list +assigned to this parameter. By doing that, then the gene values are +restricted to those assigned to the ``gene_space`` parameter. + +.. code:: python + + gene_space = [0.3, 5.2, -4, 8] + +If some genes have different spaces, then ``gene_space`` should accept a +nested list or tuple. In this case, the elements could be: + +1. Number (of ``int``, ``float``, or ``NumPy`` data types): A single + value to be assigned to the gene. This means this gene will have the + same value across all generations. + +2. ``list``, ``tuple``, ``numpy.ndarray``, or any range like ``range``, + ``numpy.arange()``, or ``numpy.linspace``: It holds the space for + each individual gene. But this space is usually discrete. That is + there is a set of finite values to select from. + +3. ``dict``: To sample a value for a gene from a continuous range. The + dictionary must have 2 mandatory keys which are ``"low"`` and + ``"high"`` in addition to an optional key which is ``"step"``. A + random value is returned between the values assigned to the items + with ``"low"`` and ``"high"`` keys. If the ``"step"`` exists, then + this works as the previous options (i.e. discrete set of values). + +4. ``None``: A gene with its space set to ``None`` is initialized + randomly from the range specified by the 2 parameters + ``init_range_low`` and ``init_range_high``. For mutation, its value + is mutated based on a random value from the range specified by the 2 + parameters ``random_mutation_min_val`` and + ``random_mutation_max_val``. If all elements in the ``gene_space`` + parameter are ``None``, the parameter will not have any effect. + +Assuming that a chromosome has 2 genes and each gene has a different +value space. Then the ``gene_space`` could be assigned a nested +list/tuple where each element determines the space of a gene. + +According to the next code, the space of the first gene is ``[0.4, -5]`` +which has 2 values and the space for the second gene is +``[0.5, -3.2, 8.8, -9]`` which has 4 values. + +.. code:: python + + gene_space = [[0.4, -5], [0.5, -3.2, 8.2, -9]] + +For a 2 gene chromosome, if the first gene space is restricted to the +discrete values from 0 to 4 and the second gene is restricted to the +values from 10 to 19, then it could be specified according to the next +code. + +.. code:: python + + gene_space = [range(5), range(10, 20)] + +The ``gene_space`` can also be assigned to a single range, as given +below, where the values of all genes are sampled from the same range. + +.. code:: python + + gene_space = numpy.arange(15) + +The ``gene_space`` can be assigned a dictionary to sample a value from a +continuous range. + +.. code:: python + + gene_space = {"low": 4, "high": 30} + +A step also can be assigned to the dictionary. This works as if a range +is used. + +.. code:: python + + gene_space = {"low": 4, "high": 30, "step": 2.5} + +.. + + Setting a ``dict`` like ``{"low": 0, "high": 10}`` in the + ``gene_space`` means that random values from the continuous range [0, + 10) are sampled. Note that ``0`` is included but ``10`` is not + included while sampling. Thus, the maximum value that could be + returned is less than ``10`` like ``9.9999``. But if the user decided + to round the genes using, for example, ``[float, 2]``, then this + value will become 10. So, the user should be careful to the inputs. + +If a ``None`` is assigned to only a single gene, then its value will be +randomly generated initially using the ``init_range_low`` and +``init_range_high`` parameters in the ``pygad.GA`` class's constructor. +During mutation, the value are sampled from the range defined by the 2 +parameters ``random_mutation_min_val`` and ``random_mutation_max_val``. +This is an example where the second gene is given a ``None`` value. + +.. code:: python + + gene_space = [range(5), None, numpy.linspace(10, 20, 300)] + +If the user did not assign the initial population to the +``initial_population`` parameter, the initial population is created +randomly based on the ``gene_space`` parameter. Moreover, the mutation +is applied based on this parameter. + +.. _how-mutation-works-with-the-genespace-parameter: + +How Mutation Works with the ``gene_space`` Parameter? +----------------------------------------------------- + +Mutation changes based on whether the ``gene_space`` has a continuous +range or discrete set of values. + +If a gene has its **static/discrete space** defined in the +``gene_space`` parameter, then mutation works by replacing the gene +value by a value randomly selected from the gene space. This happens for +both ``int`` and ``float`` data types. + +For example, the following ``gene_space`` has the static space +``[1, 2, 3]`` defined for the first gene. So, this gene can only have a +value out of these 3 values. + +.. code:: python + + Gene space: [[1, 2, 3], + None] + Solution: [1, 5] + +For a solution like ``[1, 5]``, then mutation happens for the first gene +by simply replacing its current value by a randomly selected value +(other than its current value if possible). So, the value 1 will be +replaced by either 2 or 3. + +For the second gene, its space is set to ``None``. So, traditional +mutation happens for this gene by: + +1. Generating a random value from the range defined by the + ``random_mutation_min_val`` and ``random_mutation_max_val`` + parameters. + +2. Adding this random value to the current gene's value. + +If its current value is 5 and the random value is ``-0.5``, then the new +value is 4.5. If the gene type is integer, then the value will be +rounded. + +On the other hand, if a gene has a **continuous space** defined in the +``gene_space`` parameter, then mutation occurs by adding a random value +to the current gene value. + +For example, the following ``gene_space`` has the continuous space +defined by the dictionary ``{'low': 1, 'high': 5}``. This applies to all +genes. So, mutation is applied to one or more selected genes by adding a +random value to the current gene value. + +.. code:: python + + Gene space: {'low': 1, 'high': 5} + Solution: [1.5, 3.4] + +Assuming ``random_mutation_min_val=-1`` and +``random_mutation_max_val=1``, then a random value such as ``0.3`` can +be added to the gene(s) participating in mutation. If only the first +gene is mutated, then its new value changes from ``1.5`` to +``1.5+0.3=1.8``. Note that PyGAD verifies that the new value is within +the range. In the worst scenarios, the value will be set to either +boundary of the continuous range. For example, if the gene value is 1.5 +and the random value is -0.55, then the new value is 0.95 which smaller +than the lower boundary 1. Thus, the gene value will be rounded to 1. + +If the dictionary has a step like the example below, then it is +considered a discrete range and mutation occurs by randomly selecting a +value from the set of values. In other words, no random value is added +to the gene value. + +.. code:: python + + Gene space: {'low': 1, 'high': 5, 'step': 0.5} + +Gene Constraint +=============== + +In `PyGAD +3.5.0 `__, +a new parameter called ``gene_constraint`` is added to the constructor +of the ``pygad.GA`` class. An instance attribute of the same name is +created for any instance of the ``pygad.GA`` class. + +The ``gene_constraint`` parameter allows the users to define constraints +to be enforced (as much as possible) when selecting a value for a gene. +For example, this constraint is enforced when applying mutation to make +sure the new gene value after mutation meets the gene constraint. + +The default value of this parameter is ``None`` which means no genes +have constraints. It can be assigned a list but the length of this list +must be equal to the number of genes as specified by the ``num_gene`` +parameter. + +When assigned a list, the allowed values for each element are: + +1. ``None``: No constraint for the gene. + +2. ``callable``: A callable/function that accepts 2 parameters: + + 1. The solution where the gene exists. + + 2. A list or NumPy array of candidate values for the gene. + +It is the user's responsibility to build such callables to filter the +passed list of values and return a new list with the values that meets +the gene constraint. If no value meets the constraint, return an empty +list or NumPy array. + +For example, if the gene must be smaller than 5, then use this callable: + +.. code:: python + + lambda solution,values: [val for val in values if val<5] + +The first parameter is the solution where the target gene exists. It is +passed just in case you would like to compare the gene value with other +genes. The second parameter is the list of candidate values for the +gene. The objective of the lambda function is to filter the values and +return only the valid values that are less than 5. + +A lambda function is used in this case but we can use a regular +function: + +.. code:: python + + def constraint_func(solution,values): + return [val for val in values if val<5] + +Assuming ``num_genes`` is 2, then here is a valid value for the +``gene_constraint`` parameter. + +.. code:: python + + import pygad + + def fitness_func(...): + ... + return fitness + + ga_instance = pygad.GA( + num_genes=2, + sample_size=200, + ... + gene_constraint= + [ + lambda solution,values: [val for val in values if val<5], + lambda solution,values: [val for val in values if val>[solution[0]] + ] + ) + +The first lambda function filters the values for the first gene by only +considering the gene values that are less than 5. If the passed values +is ``[-5, 2, 6, 13, 3, 4, 0]``, then the returned filtered values will +be ``[-5, 2, 3, 4, 0]``. + +The constraint for the second gene makes sure the selected value is +larger than the value of the first gene. Assuming the values for the 2 +parameters are: + +1. ``solution=[1, 4]`` + +2. ``values=[17, 2, -1, 0.5, -2.1, 1.4]`` + +Then the value of the first gene in the passed solution is ``1``. By +filtering the passed values using the callable corresponding to the +second gene, then the returned values will be ``[17, 2, 1.4]`` because +these are the only values that are larger than the first gene value of +``1``. + +Sometimes it is normal for PyGAD to fail to find a gene value that +satisfies the constraint. For example, if the possible gene values are +only ``[20,30,40]`` and the gene constraint restricts the values to be +greater than 50, then it is impossible to meet the constraint. + +For some other cases, the constraint can be met but with some changes. +For example, increasing the range from which a value is sampled. If the +``gene_space`` is used and assigned ``range(10)``, then the gene +constraint can be met by using ``range(50)`` so that we can find values +greater than 50. + +Even if the the gene space is already assigned ``range(1000)``, it might +still not find values meeting the constraints This is because PyGAD +samples a number of values equal to the ``sample_size`` parameter which +defaults to *100*. + +Out of the range of *1000* numbers, all the 100 values might not be +satisfying the constraint. This issue could be solved by simply +assigning a larger value for the ``sample_size`` parameter. + + PyGAD does not yet handle the **dependencies** among the genes in the + ``gene_constraint`` parameter. + + For example, gene 0 might depend on gene 1. To efficiently enforce + the constraints, the constraint for gene 1 must be enforced first (if + not ``None``) then the constraint for gene 0. + + PyGAD applies constraints sequentially, starting from the first gene + to the last. To ensure correct behavior when genes depend on each + other, structure your GA problem so that if gene X depends on gene Y, + then gene Y appears earlier in the chromosome (solution) than gene X. + +Full Example +------------ + +For a full example, please check the +```examples/example_gene_constraint.py`` +script `__. + +.. _samplesize-parameter: + +``sample_size`` Parameter +========================= + +In `PyGAD +3.5.0 `__, +a new parameter called ``sample_size``. It is used in some situations +where PyGAD seeks a single value for a gene out of a range. Two of the +important use cases are: + +1. Find a unique value for the gene. This is when the + ``allow_duplicate_genes`` parameter is set to ``False`` to reject the + duplicate gene values within the same solution. + +2. Find a value that satisfies the ``gene_constraint`` parameter. + +Given that we are sampling values from a continuous range as defined by +the 2 attributes: + +1. ``random_mutation_min_val=0`` + +2. ``random_mutation_max_val=100`` + +PyGAD samples a fixed number of values out of this continuous range. The +number of values in the sample is defined by the ``sample_size`` +parameter which defaults to ``100``. + +If the objective is to find a unique value or enforce the gene +constraint, then the 100 values are filtered to keep only the values +that keep the gene unique or meet the constraint. + +Sometimes 100 values is not enough and PyGAD sometimes fails to find a +good value. In this case, it is highly recommended to increase the +``sample_size`` parameter. This is to create a larger sample to increase +the chance of finding a value that meets our objectives. + +Stop at Any Generation +====================== + +In `PyGAD +2.4.0 `__, +it is possible to stop the genetic algorithm after any generation. All +you need to do it to return the string ``"stop"`` in the callback +function ``on_generation``. When this callback function is implemented +and assigned to the ``on_generation`` parameter in the constructor of +the ``pygad.GA`` class, then the algorithm immediately stops after +completing its current generation. Let's discuss an example. + +Assume that the user wants to stop algorithm either after the 100 +generations or if a condition is met. The user may assign a value of 100 +to the ``num_generations`` parameter of the ``pygad.GA`` class +constructor. + +The condition that stops the algorithm is written in a callback function +like the one in the next code. If the fitness value of the best solution +exceeds 70, then the string ``"stop"`` is returned. + +.. code:: python + + def func_generation(ga_instance): + if ga_instance.best_solution()[1] >= 70: + return "stop" + +Stop Criteria +============= + +In `PyGAD +2.15.0 `__, +a new parameter named ``stop_criteria`` is added to the constructor of +the ``pygad.GA`` class. It helps to stop the evolution based on some +criteria. It can be assigned to one or more criterion. + +Each criterion is passed as ``str`` that consists of 2 parts: + +1. Stop word. + +2. Number. + +It takes this form: + +.. code:: python + + "word_num" + +The current 2 supported words are ``reach`` and ``saturate``. + +The ``reach`` word stops the ``run()`` method if the fitness value is +equal to or greater than a given fitness value. An example for ``reach`` +is ``"reach_40"`` which stops the evolution if the fitness is >= 40. + +``saturate`` stops the evolution if the fitness saturates for a given +number of consecutive generations. An example for ``saturate`` is +``"saturate_7"`` which means stop the ``run()`` method if the fitness +does not change for 7 consecutive generations. + +Here is an example that stops the evolution if either the fitness value +reached ``127.4`` or if the fitness saturates for ``15`` generations. + +.. code:: python + + import pygad + import numpy + + equation_inputs = [4, -2, 3.5, 8, 9, 4] + desired_output = 44 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + + return fitness + + ga_instance = pygad.GA(num_generations=200, + sol_per_pop=10, + num_parents_mating=4, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + stop_criteria=["reach_127.4", "saturate_15"]) + + ga_instance.run() + print(f"Number of generations passed is {ga_instance.generations_completed}") + +Multi-Objective Stop Criteria +----------------------------- + +When multi-objective is used, then there are 2 options to use the +``stop_criteria`` parameter with the ``reach`` keyword: + +1. Pass a single value to use along the ``reach`` keyword to use across + all the objectives. + +2. Pass multiple values along the ``reach`` keyword. But the number of + values must equal the number of objectives. + +For the ``saturate`` keyword, it is independent to the number of +objectives. + +Suppose there are 3 objectives, this is a working example. It stops when +the fitness value of the 3 objectives reach or exceed 10, 20, and 30, +respectively. + +.. code:: python + + stop_criteria='reach_10_20_30' + +More than one criterion can be used together. In this case, pass the +``stop_criteria`` parameter as an iterable. This is an example. It stops +when either of these 2 conditions hold: + +1. The fitness values of the 3 objectives reach or exceed 10, 20, and + 30, respectively. + +2. The fitness values of the 3 objectives reach or exceed 90, -5.7, and + 10, respectively. + +.. code:: python + + stop_criteria=['reach_10_20_30', 'reach_90_-5.7_10'] + +Elitism Selection +================= + +In `PyGAD +2.18.0 `__, +a new parameter called ``keep_elitism`` is supported. It accepts an +integer to define the number of elitism (i.e. best solutions) to keep in +the next generation. This parameter defaults to ``1`` which means only +the best solution is kept in the next generation. + +In the next example, the ``keep_elitism`` parameter in the constructor +of the ``pygad.GA`` class is set to 2. Thus, the best 2 solutions in +each generation are kept in the next generation. + +.. code:: python + + import numpy + import pygad + + function_inputs = [4,-2,3.5,5,-11,-4.7] + desired_output = 44 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / numpy.abs(output - desired_output) + return fitness + + ga_instance = pygad.GA(num_generations=2, + num_parents_mating=3, + fitness_func=fitness_func, + num_genes=6, + sol_per_pop=5, + keep_elitism=2) + + ga_instance.run() + +The value passed to the ``keep_elitism`` parameter must satisfy 2 +conditions: + +1. It must be ``>= 0``. + +2. It must be ``<= sol_per_pop``. That is its value cannot exceed the + number of solutions in the current population. + +In the previous example, if the ``keep_elitism`` parameter is set equal +to the value passed to the ``sol_per_pop`` parameter, which is 5, then +there will be no evolution at all as in the next figure. This is because +all the 5 solutions are used as elitism in the next generation and no +offspring will be created. + +.. code:: python + + ... + + ga_instance = pygad.GA(..., + sol_per_pop=5, + keep_elitism=5) + + ga_instance.run() + +|image2| + +Note that if the ``keep_elitism`` parameter is effective (i.e. is +assigned a positive integer, not zero), then the ``keep_parents`` +parameter will have no effect. Because the default value of the +``keep_elitism`` parameter is 1, then the ``keep_parents`` parameter has +no effect by default. The ``keep_parents`` parameter is only effective +when ``keep_elitism=0``. + +Random Seed +=========== + +In `PyGAD +2.18.0 `__, +a new parameter called ``random_seed`` is supported. Its value is used +as a seed for the random function generators. + +PyGAD uses random functions in these 2 libraries: + +1. NumPy + +2. random + +The ``random_seed`` parameter defaults to ``None`` which means no seed +is used. As a result, different random numbers are generated for each +run of PyGAD. + +If this parameter is assigned a proper seed, then the results will be +reproducible. In the next example, the integer 2 is used as a random +seed. + +.. code:: python + + import numpy + import pygad + + function_inputs = [4,-2,3.5,5,-11,-4.7] + desired_output = 44 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / numpy.abs(output - desired_output) + return fitness + + ga_instance = pygad.GA(num_generations=2, + num_parents_mating=3, + fitness_func=fitness_func, + sol_per_pop=5, + num_genes=6, + random_seed=2) + + ga_instance.run() + best_solution, best_solution_fitness, best_match_idx = ga_instance.best_solution() + print(best_solution) + print(best_solution_fitness) + +This is the best solution found and its fitness value. + +.. code:: + + [ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267] + 0.04872203136549972 + +After running the code again, it will find the same result. + +.. code:: + + [ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267] + 0.04872203136549972 + +Continue without Losing Progress +================================ + +In `PyGAD +2.18.0 `__, +and thanks for `Felix Bernhard `__ for +opening `this GitHub +issue `__, +the values of these 4 instance attributes are no longer reset after each +call to the ``run()`` method. + +1. ``self.best_solutions`` + +2. ``self.best_solutions_fitness`` + +3. ``self.solutions`` + +4. ``self.solutions_fitness`` + +This helps the user to continue where the last run stopped without +losing the values of these 4 attributes. + +Now, the user can save the model by calling the ``save()`` method. + +.. code:: python + + import pygad + + def fitness_func(ga_instance, solution, solution_idx): + ... + return fitness + + ga_instance = pygad.GA(...) + + ga_instance.run() + + ga_instance.plot_fitness() + + ga_instance.save("pygad_GA") + +Then the saved model is loaded by calling the ``load()`` function. After +calling the ``run()`` method over the loaded instance, then the data +from the previous 4 attributes are not reset but extended with the new +data. + +.. code:: python + + import pygad + + def fitness_func(ga_instance, solution, solution_idx): + ... + return fitness + + loaded_ga_instance = pygad.load("pygad_GA") + + loaded_ga_instance.run() + + loaded_ga_instance.plot_fitness() + +The plot created by the ``plot_fitness()`` method will show the data +collected from both the runs. + +Note that the 2 attributes (``self.best_solutions`` and +``self.best_solutions_fitness``) only work if the +``save_best_solutions`` parameter is set to ``True``. Also, the 2 +attributes (``self.solutions`` and ``self.solutions_fitness``) only work +if the ``save_solutions`` parameter is ``True``. + +Change Population Size during Runtime +===================================== + +Starting from `PyGAD +3.3.0 `__, +the population size can changed during runtime. In other words, the +number of solutions/chromosomes and number of genes can be changed. + +The user has to carefully arrange the list of *parameters* and *instance +attributes* that have to be changed to keep the GA consistent before and +after changing the population size. Generally, change everything that +would be used during the GA evolution. + + CAUTION: If the user failed to change a parameter or an instance + attributes necessary to keep the GA running after the population size + changed, errors will arise. + +These are examples of the parameters that the user should decide whether +to change. The user should check the `list of +parameters `__ +and decide what to change. + +1. ``population``: The population. It *must* be changed. + +2. ``num_offspring``: The number of offspring to produce out of the + crossover and mutation operations. Change this parameter if the + number of offspring have to be changed to be consistent with the new + population size. + +3. ``num_parents_mating``: The number of solutions to select as parents. + Change this parameter if the number of parents have to be changed to + be consistent with the new population size. + +4. ``fitness_func``: If the way of calculating the fitness changes after + the new population size, then the fitness function have to be + changed. + +5. ``sol_per_pop``: The number of solutions per population. It is not + critical to change it but it is recommended to keep this number + consistent with the number of solutions in the ``population`` + parameter. + +These are examples of the instance attributes that might be changed. The +user should check the `list of instance +attributes `__ +and decide what to change. + +1. All the ``last_generation_*`` parameters + + 1. ``last_generation_fitness``: A 1D NumPy array of fitness values of + the population. + + 2. ``last_generation_parents`` and + ``last_generation_parents_indices``: Two NumPy arrays: 2D array + representing the parents and 1D array of the parents indices. + + 3. ``last_generation_elitism`` and + ``last_generation_elitism_indices``: Must be changed if + ``keep_elitism != 0``. The default value of ``keep_elitism`` is 1. + Two NumPy arrays: 2D array representing the elitism and 1D array + of the elitism indices. + +2. ``pop_size``: The population size. + +Prevent Duplicates in Gene Values +================================= + +In `PyGAD +2.13.0 `__, +a new bool parameter called ``allow_duplicate_genes`` is supported to +control whether duplicates are supported in the chromosome or not. In +other words, whether 2 or more genes might have the same exact value. + +If ``allow_duplicate_genes=True`` (which is the default case), genes may +have the same value. If ``allow_duplicate_genes=False``, then no 2 genes +will have the same value given that there are enough unique values for +the genes. + +The next code gives an example to use the ``allow_duplicate_genes`` +parameter. A callback generation function is implemented to print the +population after each generation. + +.. code:: python + + import pygad + + def fitness_func(ga_instance, solution, solution_idx): + return 0 + + def on_generation(ga): + print("Generation", ga.generations_completed) + print(ga.population) + + ga_instance = pygad.GA(num_generations=5, + sol_per_pop=5, + num_genes=4, + mutation_num_genes=3, + random_mutation_min_val=-5, + random_mutation_max_val=5, + num_parents_mating=2, + fitness_func=fitness_func, + gene_type=int, + on_generation=on_generation, + sample_size=200, + allow_duplicate_genes=False) + ga_instance.run() + +Here are the population after the 5 generations. Note how there are no +duplicate values. + +.. code:: python + + Generation 1 + [[ 2 -2 -3 3] + [ 0 1 2 3] + [ 5 -3 6 3] + [-3 1 -2 4] + [-1 0 -2 3]] + Generation 2 + [[-1 0 -2 3] + [-3 1 -2 4] + [ 0 -3 -2 6] + [-3 0 -2 3] + [ 1 -4 2 4]] + Generation 3 + [[ 1 -4 2 4] + [-3 0 -2 3] + [ 4 0 -2 1] + [-4 0 -2 -3] + [-4 2 0 3]] + Generation 4 + [[-4 2 0 3] + [-4 0 -2 -3] + [-2 5 4 -3] + [-1 2 -4 4] + [-4 2 0 -3]] + Generation 5 + [[-4 2 0 -3] + [-1 2 -4 4] + [ 3 4 -4 0] + [-1 0 2 -2] + [-4 2 -1 1]] + +The ``allow_duplicate_genes`` parameter is configured with use with the +``gene_space`` parameter. Here is an example where each of the 4 genes +has the same space of values that consists of 4 values (1, 2, 3, and 4). + +.. code:: python + + import pygad + + def fitness_func(ga_instance, solution, solution_idx): + return 0 + + def on_generation(ga): + print("Generation", ga.generations_completed) + print(ga.population) + + ga_instance = pygad.GA(num_generations=1, + sol_per_pop=5, + num_genes=4, + num_parents_mating=2, + fitness_func=fitness_func, + gene_type=int, + gene_space=[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], + on_generation=on_generation, + sample_size=200, + allow_duplicate_genes=False) + ga_instance.run() + +Even that all the genes share the same space of values, no 2 genes +duplicate their values as provided by the next output. + +.. code:: python + + Generation 1 + [[2 3 1 4] + [2 3 1 4] + [2 4 1 3] + [2 3 1 4] + [1 3 2 4]] + Generation 2 + [[1 3 2 4] + [2 3 1 4] + [1 3 2 4] + [2 3 4 1] + [1 3 4 2]] + Generation 3 + [[1 3 4 2] + [2 3 4 1] + [1 3 4 2] + [3 1 4 2] + [3 2 4 1]] + Generation 4 + [[3 2 4 1] + [3 1 4 2] + [3 2 4 1] + [1 2 4 3] + [1 3 4 2]] + Generation 5 + [[1 3 4 2] + [1 2 4 3] + [2 1 4 3] + [1 2 4 3] + [1 2 4 3]] + +You should care of giving enough values for the genes so that PyGAD is +able to find alternatives for the gene value in case it duplicates with +another gene. + +If PyGAD failed to find a unique gene while there is still room to find +a unique value, one possible option is to set the ``sample_size`` +parameter to a larger value. Check the `sample_size +Parameter `__ +section for more information. + +Limitation +---------- + +There might be 2 duplicate genes where changing either of the 2 +duplicating genes will not solve the problem. For example, if +``gene_space=[[3, 0, 1], [4, 1, 2], [0, 2], [3, 2, 0]]`` and the +solution is ``[3 2 0 0]``, then the values of the last 2 genes +duplicate. There are no possible changes in the last 2 genes to solve +the problem. + +This problem can be solved by randomly changing one of the +non-duplicating genes that may make a room for a unique value in one the +2 duplicating genes. For example, by changing the second gene from 2 to +4, then any of the last 2 genes can take the value 2 and solve the +duplicates. The resultant gene is then ``[3 4 2 0]``. But this option is +not yet supported in PyGAD. + +Solve Duplicates using a Third Gene +----------------------------------- + +When ``allow_duplicate_genes=False`` and a user-defined ``gene_space`` +is used, it sometimes happen that there is no room to solve the +duplicates between the 2 genes by simply replacing the value of one gene +by another gene. In `PyGAD +3.1.0 `__, +the duplicates are solved by looking for a third gene that will help in +solving the duplicates. The following examples explain how it works. + +Example 1: + +Let's assume that this gene space is used and there is a solution with 2 +duplicate genes with the same value 4. + +.. code:: python + + Gene space: [[2, 3], + [3, 4], + [4, 5], + [5, 6]] + Solution: [3, 4, 4, 5] + +By checking the gene space, the second gene can have the values +``[3, 4]`` and the third gene can have the values ``[4, 5]``. To solve +the duplicates, we have the value of any of these 2 genes. + +If the value of the second gene changes from 4 to 3, then it will be +duplicate with the first gene. If we are to change the value of the +third gene from 4 to 5, then it will duplicate with the fourth gene. As +a conclusion, trying to just selecting a different gene value for either +the second or third genes will introduce new duplicating genes. + +When there are 2 duplicate genes but there is no way to solve their +duplicates, then the solution is to change a third gene that makes a +room to solve the duplicates between the 2 genes. + +In our example, duplicates between the second and third genes can be +solved by, for example,: + +- Changing the first gene from 3 to 2 then changing the second gene from + 4 to 3. + +- Or changing the fourth gene from 5 to 6 then changing the third gene + from 4 to 5. + +Generally, this is how to solve such duplicates: + +1. For any duplicate gene **GENE1**, select another value. + +2. Check which other gene **GENEX** has duplicate with this new value. + +3. Find if **GENEX** can have another value that will not cause any more + duplicates. If so, go to step 7. + +4. If all the other values of **GENEX** will cause duplicates, then try + another gene **GENEY**. + +5. Repeat steps 3 and 4 until exploring all the genes. + +6. If there is no possibility to solve the duplicates, then there is not + way to solve the duplicates and we have to keep the duplicate value. + +7. If a value for a gene **GENEM** is found that will not cause more + duplicates, then use this value for the gene **GENEM**. + +8. Replace the value of the gene **GENE1** by the old value of the gene + **GENEM**. This solves the duplicates. + +This is an example to solve the duplicate for the solution +``[3, 4, 4, 5]``: + +1. Let's use the second gene with value 4. Because the space of this + gene is ``[3, 4]``, then the only other value we can select is 3. + +2. The first gene also have the value 3. + +3. The first gene has another value 2 that will not cause more + duplicates in the solution. Then go to step 7. + +4. Skip. + +5. Skip. + +6. Skip. + +7. The value of the first gene 3 will be replaced by the new value 2. + The new solution is [2, 4, 4, 5]. + +8. Replace the value of the second gene 4 by the old value of the first + gene which is 3. The new solution is [2, 3, 4, 5]. The duplicate is + solved. + +Example 2: + +.. code:: python + + Gene space: [[0, 1], + [1, 2], + [2, 3], + [3, 4]] + Solution: [1, 2, 2, 3] + +The quick summary is: + +- Change the value of the first gene from 1 to 0. The solution becomes + [0, 2, 2, 3]. + +- Change the value of the second gene from 2 to 1. The solution becomes + [0, 1, 2, 3]. The duplicate is solved. + +.. _more-about-the-genetype-parameter: + +More about the ``gene_type`` Parameter +====================================== + +The ``gene_type`` parameter allows the user to control the data type for +all genes at once or each individual gene. In `PyGAD +2.15.0 `__, +the ``gene_type`` parameter also supports customizing the precision for +``float`` data types. As a result, the ``gene_type`` parameter helps to: + +1. Select a data type for all genes with or without precision. + +2. Select a data type for each individual gene with or without + precision. + +Let's discuss things by examples. + +Data Type for All Genes without Precision +----------------------------------------- + +The data type for all genes can be specified by assigning the numeric +data type directly to the ``gene_type`` parameter. This is an example to +make all genes of ``int`` data types. + +.. code:: python + + gene_type=int + +Given that the supported numeric data types of PyGAD include Python's +``int`` and ``float`` in addition to all numeric types of ``NumPy``, +then any of these types can be assigned to the ``gene_type`` parameter. + +If no precision is specified for a ``float`` data type, then the +complete floating-point number is kept. + +The next code uses an ``int`` data type for all genes where the genes in +the initial and final population are only integers. + +.. code:: python + + import pygad + import numpy + + equation_inputs = [4, -2, 3.5, 8, -2] + desired_output = 2671.1234 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + + ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=int) + + print("Initial Population") + print(ga_instance.initial_population) + + ga_instance.run() + + print("Final Population") + print(ga_instance.population) + +.. code:: python + + Initial Population + [[ 1 -1 2 0 -3] + [ 0 -2 0 -3 -1] + [ 0 -1 -1 2 0] + [-2 3 -2 3 3] + [ 0 0 2 -2 -2]] + + Final Population + [[ 1 -1 2 2 0] + [ 1 -1 2 2 0] + [ 1 -1 2 2 0] + [ 1 -1 2 2 0] + [ 1 -1 2 2 0]] + +Data Type for All Genes with Precision +-------------------------------------- + +A precision can only be specified for a ``float`` data type and cannot +be specified for integers. Here is an example to use a precision of 3 +for the ``float`` data type. In this case, all genes are of type +``float`` and their maximum precision is 3. + +.. code:: python + + gene_type=[float, 3] + +The next code uses prints the initial and final population where the +genes are of type ``float`` with precision 3. + +.. code:: python + + import pygad + import numpy + + equation_inputs = [4, -2, 3.5, 8, -2] + desired_output = 2671.1234 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + + return fitness + + ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=[float, 3]) + + print("Initial Population") + print(ga_instance.initial_population) + + ga_instance.run() + + print("Final Population") + print(ga_instance.population) + +.. code:: python + + Initial Population + [[-2.417 -0.487 3.623 2.457 -2.362] + [-1.231 0.079 -1.63 1.629 -2.637] + [ 0.692 -2.098 0.705 0.914 -3.633] + [ 2.637 -1.339 -1.107 -0.781 -3.896] + [-1.495 1.378 -1.026 3.522 2.379]] + + Final Population + [[ 1.714 -1.024 3.623 3.185 -2.362] + [ 0.692 -1.024 3.623 3.185 -2.362] + [ 0.692 -1.024 3.623 3.375 -2.362] + [ 0.692 -1.024 4.041 3.185 -2.362] + [ 1.714 -0.644 3.623 3.185 -2.362]] + +Data Type for each Individual Gene without Precision +---------------------------------------------------- + +In `PyGAD +2.14.0 `__, +the ``gene_type`` parameter allows customizing the gene type for each +individual gene. This is by using a ``list``/``tuple``/``numpy.ndarray`` +with number of elements equal to the number of genes. For each element, +a type is specified for the corresponding gene. + +This is an example for a 5-gene problem where different types are +assigned to the genes. + +.. code:: python + + gene_type=[int, float, numpy.float16, numpy.int8, float] + +This is a complete code that prints the initial and final population for +a custom-gene data type. + +.. code:: python + + import pygad + import numpy + + equation_inputs = [4, -2, 3.5, 8, -2] + desired_output = 2671.1234 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + + ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=[int, float, numpy.float16, numpy.int8, float]) + + print("Initial Population") + print(ga_instance.initial_population) + + ga_instance.run() + + print("Final Population") + print(ga_instance.population) + +.. code:: python + + Initial Population + [[0 0.8615522360026828 0.7021484375 -2 3.5301821368185866] + [-3 2.648189378595294 -3.830078125 1 -0.9586271572917742] + [3 3.7729827570110714 1.2529296875 -3 1.395741994211889] + [0 1.0490687178053282 1.51953125 -2 0.7243617940450235] + [0 -0.6550158436937226 -2.861328125 -2 1.8212734549263097]] + + Final Population + [[3 3.7729827570110714 2.055 0 0.7243617940450235] + [3 3.7729827570110714 1.458 0 -0.14638754050305036] + [3 3.7729827570110714 1.458 0 0.0869406120516778] + [3 3.7729827570110714 1.458 0 0.7243617940450235] + [3 3.7729827570110714 1.458 0 -0.14638754050305036]] + +Data Type for each Individual Gene with Precision +------------------------------------------------- + +The precision can also be specified for the ``float`` data types as in +the next line where the second gene precision is 2 and last gene +precision is 1. + +.. code:: python + + gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]] + +This is a complete example where the initial and final populations are +printed where the genes comply with the data types and precisions +specified. + +.. code:: python + + import pygad + import numpy + + equation_inputs = [4, -2, 3.5, 8, -2] + desired_output = 2671.1234 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + + ga_instance = pygad.GA(num_generations=10, + sol_per_pop=5, + num_parents_mating=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]]) + + print("Initial Population") + print(ga_instance.initial_population) + + ga_instance.run() + + print("Final Population") + print(ga_instance.population) + +.. code:: python + + Initial Population + [[-2 -1.22 1.716796875 -1 0.2] + [-1 -1.58 -3.091796875 0 -1.3] + [3 3.35 -0.107421875 1 -3.3] + [-2 -3.58 -1.779296875 0 0.6] + [2 -3.73 2.65234375 3 -0.5]] + + Final Population + [[2 -4.22 3.47 3 -1.3] + [2 -3.73 3.47 3 -1.3] + [2 -4.22 3.47 2 -1.3] + [2 -4.58 3.47 3 -1.3] + [2 -3.73 3.47 3 -1.3]] + +Parallel Processing in PyGAD +============================ + +Starting from `PyGAD +2.17.0 `__, +parallel processing becomes supported. This section explains how to use +parallel processing in PyGAD. + +According to the `PyGAD +lifecycle `__, +parallel processing can be parallelized in only 2 operations: + +1. Population fitness calculation. + +2. Mutation. + +The reason is that the calculations in these 2 operations are +independent (i.e. each solution/chromosome is handled independently from +the others) and can be distributed across different processes or +threads. + +For the mutation operation, it does not do intensive calculations on the +CPU. Its calculations are simple like flipping the values of some genes +from 0 to 1 or adding a random value to some genes. So, it does not take +much CPU processing time. Experiments proved that parallelizing the +mutation operation across the solutions increases the time instead of +reducing it. This is because running multiple processes or threads adds +overhead to manage them. Thus, parallel processing cannot be applied on +the mutation operation. + +For the population fitness calculation, parallel processing can help +make a difference and reduce the processing time. But this is +conditional on the type of calculations done in the fitness function. If +the fitness function makes intensive calculations and takes much +processing time from the CPU, then it is probably that parallel +processing will help to cut down the overall time. + +This section explains how parallel processing works in PyGAD and how to +use parallel processing in PyGAD + +How to Use Parallel Processing in PyGAD +--------------------------------------- + +Starting from `PyGAD +2.17.0 `__, +a new parameter called ``parallel_processing`` added to the constructor +of the ``pygad.GA`` class. + +.. code:: python + + import pygad + ... + ga_instance = pygad.GA(..., + parallel_processing=...) + ... + +This parameter allows the user to do the following: + +1. Enable parallel processing. + +2. Select whether processes or threads are used. + +3. Specify the number of processes or threads to be used. + +These are 3 possible values for the ``parallel_processing`` parameter: + +1. ``None``: (Default) It means no parallel processing is used. + +2. A positive integer referring to the number of threads to be used + (i.e. threads, not processes, are used. + +3. ``list``/``tuple``: If a list or a tuple of exactly 2 elements is + assigned, then: + + 1. The first element can be either ``'process'`` or ``'thread'`` to + specify whether processes or threads are used, respectively. + + 2. The second element can be: + + 1. A positive integer to select the maximum number of processes or + threads to be used + + 2. ``0`` to indicate that 0 processes or threads are used. It + means no parallel processing. This is identical to setting + ``parallel_processing=None``. + + 3. ``None`` to use the default value as calculated by the + ``concurrent.futures module``. + +These are examples of the values assigned to the ``parallel_processing`` +parameter: + +- ``parallel_processing=4``: Because the parameter is assigned a + positive integer, this means parallel processing is activated where 4 + threads are used. + +- ``parallel_processing=["thread", 5]``: Use parallel processing with 5 + threads. This is identical to ``parallel_processing=5``. + +- ``parallel_processing=["process", 8]``: Use parallel processing with 8 + processes. + +- ``parallel_processing=["process", 0]``: As the second element is given + the value 0, this means do not use parallel processing. This is + identical to ``parallel_processing=None``. + +Examples +-------- + +The examples will help you know the difference between using processes +and threads. Moreover, it will give an idea when parallel processing +would make a difference and reduce the time. These are dummy examples +where the fitness function is made to always return 0. + +The first example uses 10 genes, 5 solutions in the population where +only 3 solutions mate, and 9999 generations. The fitness function uses a +``for`` loop with 100 iterations just to have some calculations. In the +constructor of the ``pygad.GA`` class, ``parallel_processing=None`` +means no parallel processing is used. + +.. code:: python + + import pygad + import time + + def fitness_func(ga_instance, solution, solution_idx): + for _ in range(99): + pass + return 0 + + ga_instance = pygad.GA(num_generations=9999, + num_parents_mating=3, + sol_per_pop=5, + num_genes=10, + fitness_func=fitness_func, + suppress_warnings=True, + parallel_processing=None) + + if __name__ == '__main__': + t1 = time.time() + + ga_instance.run() + + t2 = time.time() + print("Time is", t2-t1) + +When parallel processing is not used, the time it takes to run the +genetic algorithm is ``1.5`` seconds. + +In the comparison, let's do a second experiment where parallel +processing is used with 5 threads. In this case, it take ``5`` seconds. + +.. code:: python + + ... + ga_instance = pygad.GA(..., + parallel_processing=5) + ... + +For the third experiment, processes instead of threads are used. Also, +only 99 generations are used instead of 9999. The time it takes is +``99`` seconds. + +.. code:: python + + ... + ga_instance = pygad.GA(num_generations=99, + ..., + parallel_processing=["process", 5]) + ... + +This is the summary of the 3 experiments: + +1. No parallel processing & 9999 generations: 1.5 seconds. + +2. Parallel processing with 5 threads & 9999 generations: 5 seconds + +3. Parallel processing with 5 processes & 99 generations: 99 seconds + +Because the fitness function does not need much CPU time, the normal +processing takes the least time. Running processes for this simple +problem takes 99 compared to only 5 seconds for threads because managing +processes is much heavier than managing threads. Thus, most of the CPU +time is for swapping the processes instead of executing the code. + +In the second example, the loop makes 99999999 iterations and only 5 +generations are used. With no parallelization, it takes 22 seconds. + +.. code:: python + + import pygad + import time + + def fitness_func(ga_instance, solution, solution_idx): + for _ in range(99999999): + pass + return 0 + + ga_instance = pygad.GA(num_generations=5, + num_parents_mating=3, + sol_per_pop=5, + num_genes=10, + fitness_func=fitness_func, + suppress_warnings=True, + parallel_processing=None) + + if __name__ == '__main__': + t1 = time.time() + ga_instance.run() + t2 = time.time() + print("Time is", t2-t1) + +It takes 15 seconds when 10 processes are used. + +.. code:: python + + ... + ga_instance = pygad.GA(..., + parallel_processing=["process", 10]) + ... + +This is compared to 20 seconds when 10 threads are used. + +.. code:: python + + ... + ga_instance = pygad.GA(..., + parallel_processing=["thread", 10]) + ... + +Based on the second example, using parallel processing with 10 processes +takes the least time because there is much CPU work done. Generally, +processes are preferred over threads when most of the work in on the +CPU. Threads are preferred over processes in some situations like doing +input/output operations. + +*Before releasing* `PyGAD +2.17.0 `__\ *,* +`László +Fazekas `__ +*wrote an article to parallelize the fitness function with PyGAD. Check +it:* `How Genetic Algorithms Can Compete with Gradient Descent and +Backprop `__. + +Print Lifecycle Summary +======================= + +In `PyGAD +2.19.0 `__, +a new method called ``summary()`` is supported. It prints a Keras-like +summary of the PyGAD lifecycle showing the steps, callback functions, +parameters, etc. + +This method accepts the following parameters: + +- ``line_length=70``: An integer representing the length of the single + line in characters. + +- ``fill_character=" "``: A character to fill the lines. + +- ``line_character="-"``: A character for creating a line separator. + +- ``line_character2="="``: A secondary character to create a line + separator. + +- ``columns_equal_len=False``: The table rows are split into equal-sized + columns or split subjective to the width needed. + +- ``print_step_parameters=True``: Whether to print extra parameters + about each step inside the step. If ``print_step_parameters=False`` + and ``print_parameters_summary=True``, then the parameters of each + step are printed at the end of the table. + +- ``print_parameters_summary=True``: Whether to print parameters summary + at the end of the table. If ``print_step_parameters=False``, then the + parameters of each step are printed at the end of the table too. + +This is a quick example to create a PyGAD example. + +.. code:: python + + import pygad + import numpy + + function_inputs = [4,-2,3.5,5,-11,-4.7] + desired_output = 44 + + def genetic_fitness(solution, solution_idx): + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + + def on_gen(ga): + pass + + def on_crossover_callback(a, b): + pass + + ga_instance = pygad.GA(num_generations=100, + num_parents_mating=10, + sol_per_pop=20, + num_genes=len(function_inputs), + on_crossover=on_crossover_callback, + on_generation=on_gen, + parallel_processing=2, + stop_criteria="reach_10", + fitness_batch_size=4, + crossover_probability=0.4, + fitness_func=genetic_fitness) + +Then call the ``summary()`` method to print the summary with the default +parameters. Note that entries for the crossover and generation callback +function are created because their callback functions are implemented +through the ``on_crossover_callback()`` and ``on_gen()``, respectively. + +.. code:: python + + ga_instance.summary() + +.. code:: bash + + ---------------------------------------------------------------------- + PyGAD Lifecycle + ====================================================================== + Step Handler Output Shape + ====================================================================== + Fitness Function genetic_fitness() (1) + Fitness batch size: 4 + ---------------------------------------------------------------------- + Parent Selection steady_state_selection() (10, 6) + Number of Parents: 10 + ---------------------------------------------------------------------- + Crossover single_point_crossover() (10, 6) + Crossover probability: 0.4 + ---------------------------------------------------------------------- + On Crossover on_crossover_callback() None + ---------------------------------------------------------------------- + Mutation random_mutation() (10, 6) + Mutation Genes: 1 + Random Mutation Range: (-1.0, 1.0) + Mutation by Replacement: False + Allow Duplicated Genes: True + ---------------------------------------------------------------------- + On Generation on_gen() None + Stop Criteria: [['reach', 10.0]] + ---------------------------------------------------------------------- + ====================================================================== + Population Size: (20, 6) + Number of Generations: 100 + Initial Population Range: (-4, 4) + Keep Elitism: 1 + Gene DType: [, None] + Parallel Processing: ['thread', 2] + Save Best Solutions: False + Save Solutions: False + ====================================================================== + +We can set the ``print_step_parameters`` and +``print_parameters_summary`` parameters to ``False`` to not print the +parameters. + +.. code:: python + + ga_instance.summary(print_step_parameters=False, + print_parameters_summary=False) + +.. code:: bash + + ---------------------------------------------------------------------- + PyGAD Lifecycle + ====================================================================== + Step Handler Output Shape + ====================================================================== + Fitness Function genetic_fitness() (1) + ---------------------------------------------------------------------- + Parent Selection steady_state_selection() (10, 6) + ---------------------------------------------------------------------- + Crossover single_point_crossover() (10, 6) + ---------------------------------------------------------------------- + On Crossover on_crossover_callback() None + ---------------------------------------------------------------------- + Mutation random_mutation() (10, 6) + ---------------------------------------------------------------------- + On Generation on_gen() None + ---------------------------------------------------------------------- + ====================================================================== + +Logging Outputs +=============== + +In `PyGAD +3.0.0 `__, +the ``print()`` statement is no longer used and the outputs are printed +using the `logging `__ +module. A a new parameter called ``logger`` is supported to accept the +user-defined logger. + +.. code:: python + + import logging + + logger = ... + + ga_instance = pygad.GA(..., + logger=logger, + ...) + +The default value for this parameter is ``None``. If there is no logger +passed (i.e. ``logger=None``), then a default logger is created to log +the messages to the console exactly like how the ``print()`` statement +works. + +Some advantages of using the the +`logging `__ module +instead of the ``print()`` statement are: + +1. The user has more control over the printed messages specially if + there is a project that uses multiple modules where each module + prints its messages. A logger can organize the outputs. + +2. Using the proper ``Handler``, the user can log the output messages to + files and not only restricted to printing it to the console. So, it + is much easier to record the outputs. + +3. The format of the printed messages can be changed by customizing the + ``Formatter`` assigned to the Logger. + +This section gives some quick examples to use the ``logging`` module and +then gives an example to use the logger with PyGAD. + +Logging to the Console +---------------------- + +This is an example to create a logger to log the messages to the +console. + +.. code:: python + + import logging + + # Create a logger + logger = logging.getLogger(__name__) + + # Set the logger level to debug so that all the messages are printed. + logger.setLevel(logging.DEBUG) + + # Create a stream handler to log the messages to the console. + stream_handler = logging.StreamHandler() + + # Set the handler level to debug. + stream_handler.setLevel(logging.DEBUG) + + # Create a formatter + formatter = logging.Formatter('%(message)s') + + # Add the formatter to handler. + stream_handler.setFormatter(formatter) + + # Add the stream handler to the logger + logger.addHandler(stream_handler) + +Now, we can log messages to the console with the format specified in the +``Formatter``. + +.. code:: python + + logger.debug('Debug message.') + logger.info('Info message.') + logger.warning('Warn message.') + logger.error('Error message.') + logger.critical('Critical message.') + +The outputs are identical to those returned using the ``print()`` +statement. + +.. code:: + + Debug message. + Info message. + Warn message. + Error message. + Critical message. + +By changing the format of the output messages, we can have more +information about each message. + +.. code:: python + + formatter = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + +This is a sample output. + +.. code:: python + + 2023-04-03 18:46:27 DEBUG: Debug message. + 2023-04-03 18:46:27 INFO: Info message. + 2023-04-03 18:46:27 WARNING: Warn message. + 2023-04-03 18:46:27 ERROR: Error message. + 2023-04-03 18:46:27 CRITICAL: Critical message. + +Note that you may need to clear the handlers after finishing the +execution. This is to make sure no cached handlers are used in the next +run. If the cached handlers are not cleared, then the single output +message may be repeated. + +.. code:: python + + logger.handlers.clear() + +Logging to a File +----------------- + +This is another example to log the messages to a file named +``logfile.txt``. The formatter prints the following about each message: + +1. The date and time at which the message is logged. + +2. The log level. + +3. The message. + +4. The path of the file. + +5. The lone number of the log message. + +.. code:: python + + import logging + + level = logging.DEBUG + name = 'logfile.txt' + + logger = logging.getLogger(name) + logger.setLevel(level) + + file_handler = logging.FileHandler(name, 'a+', 'utf-8') + file_handler.setLevel(logging.DEBUG) + file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S') + file_handler.setFormatter(file_format) + logger.addHandler(file_handler) + +This is how the outputs look like. + +.. code:: python + + 2023-04-03 18:54:03 DEBUG: Debug message. - c:\users\agad069\desktop\logger\example2.py:46 + 2023-04-03 18:54:03 INFO: Info message. - c:\users\agad069\desktop\logger\example2.py:47 + 2023-04-03 18:54:03 WARNING: Warn message. - c:\users\agad069\desktop\logger\example2.py:48 + 2023-04-03 18:54:03 ERROR: Error message. - c:\users\agad069\desktop\logger\example2.py:49 + 2023-04-03 18:54:03 CRITICAL: Critical message. - c:\users\agad069\desktop\logger\example2.py:50 + +Consider clearing the handlers if necessary. + +.. code:: python + + logger.handlers.clear() + +Log to Both the Console and a File +---------------------------------- + +This is an example to create a single Logger associated with 2 handlers: + +1. A file handler. + +2. A stream handler. + +.. code:: python + + import logging + + level = logging.DEBUG + name = 'logfile.txt' + + logger = logging.getLogger(name) + logger.setLevel(level) + + file_handler = logging.FileHandler(name,'a+','utf-8') + file_handler.setLevel(logging.DEBUG) + file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S') + file_handler.setFormatter(file_format) + logger.addHandler(file_handler) + + console_handler = logging.StreamHandler() + console_handler.setLevel(logging.INFO) + console_format = logging.Formatter('%(message)s') + console_handler.setFormatter(console_format) + logger.addHandler(console_handler) + +When a log message is executed, then it is both printed to the console +and saved in the ``logfile.txt``. + +Consider clearing the handlers if necessary. + +.. code:: python + + logger.handlers.clear() + +PyGAD Example +------------- + +To use the logger in PyGAD, just create your custom logger and pass it +to the ``logger`` parameter. + +.. code:: python + + import logging + import pygad + import numpy + + level = logging.DEBUG + name = 'logfile.txt' + + logger = logging.getLogger(name) + logger.setLevel(level) + + file_handler = logging.FileHandler(name,'a+','utf-8') + file_handler.setLevel(logging.DEBUG) + file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S') + file_handler.setFormatter(file_format) + logger.addHandler(file_handler) + + console_handler = logging.StreamHandler() + console_handler.setLevel(logging.INFO) + console_format = logging.Formatter('%(message)s') + console_handler.setFormatter(console_format) + logger.addHandler(console_handler) + + equation_inputs = [4, -2, 8] + desired_output = 2671.1234 + + def fitness_func(ga_instance, solution, solution_idx): + output = numpy.sum(solution * equation_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + + def on_generation(ga_instance): + ga_instance.logger.info(f"Generation = {ga_instance.generations_completed}") + ga_instance.logger.info(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}") + + ga_instance = pygad.GA(num_generations=10, + sol_per_pop=40, + num_parents_mating=2, + keep_parents=2, + num_genes=len(equation_inputs), + fitness_func=fitness_func, + on_generation=on_generation, + logger=logger) + ga_instance.run() + + logger.handlers.clear() + +By executing this code, the logged messages are printed to the console +and also saved in the text file. + +.. code:: python + + 2023-04-03 19:04:27 INFO: Generation = 1 + 2023-04-03 19:04:27 INFO: Fitness = 0.00038086960368076276 + 2023-04-03 19:04:27 INFO: Generation = 2 + 2023-04-03 19:04:27 INFO: Fitness = 0.00038214871408010853 + 2023-04-03 19:04:27 INFO: Generation = 3 + 2023-04-03 19:04:27 INFO: Fitness = 0.0003832795907974678 + 2023-04-03 19:04:27 INFO: Generation = 4 + 2023-04-03 19:04:27 INFO: Fitness = 0.00038398612055017196 + 2023-04-03 19:04:27 INFO: Generation = 5 + 2023-04-03 19:04:27 INFO: Fitness = 0.00038442348890867516 + 2023-04-03 19:04:27 INFO: Generation = 6 + 2023-04-03 19:04:27 INFO: Fitness = 0.0003854406039137763 + 2023-04-03 19:04:27 INFO: Generation = 7 + 2023-04-03 19:04:27 INFO: Fitness = 0.00038646083174063284 + 2023-04-03 19:04:27 INFO: Generation = 8 + 2023-04-03 19:04:27 INFO: Fitness = 0.0003875169193024936 + 2023-04-03 19:04:27 INFO: Generation = 9 + 2023-04-03 19:04:27 INFO: Fitness = 0.0003888816727311021 + 2023-04-03 19:04:27 INFO: Generation = 10 + 2023-04-03 19:04:27 INFO: Fitness = 0.000389832593101348 + +Solve Non-Deterministic Problems +================================ + +PyGAD can be used to solve both deterministic and non-deterministic +problems. Deterministic are those that return the same fitness for the +same solution. For non-deterministic problems, a different fitness value +would be returned for the same solution. + +By default, PyGAD settings are set to solve deterministic problems. +PyGAD can save the explored solutions and their fitness to reuse in the +future. These instances attributes can save the solutions: + +1. ``solutions``: Exists if ``save_solutions=True``. + +2. ``best_solutions``: Exists if ``save_best_solutions=True``. + +3. ``last_generation_elitism``: Exists if ``keep_elitism`` > 0. + +4. ``last_generation_parents``: Exists if ``keep_parents`` > 0 or + ``keep_parents=-1``. + +To configure PyGAD for non-deterministic problems, we have to disable +saving the previous solutions. This is by setting these parameters: + +1. ``keep_elitism=0`` + +2. ``keep_parents=0`` + +3. ``keep_solutions=False`` + +4. ``keep_best_solutions=False`` + +.. code:: python + + import pygad + ... + ga_instance = pygad.GA(..., + keep_elitism=0, + keep_parents=0, + save_solutions=False, + save_best_solutions=False, + ...) + +This way PyGAD will not save any explored solution and thus the fitness +function have to be called for each individual solution. + +Reuse the Fitness instead of Calling the Fitness Function +========================================================= + +It may happen that a previously explored solution in generation X is +explored again in another generation Y (where Y > X). For some problems, +calling the fitness function takes much time. + +For deterministic problems, it is better to not call the fitness +function for an already explored solutions. Instead, reuse the fitness +of the old solution. PyGAD supports some options to help you save time +calling the fitness function for a previously explored solution. + +The parameters explored in this section can be set in the constructor of +the ``pygad.GA`` class. + +The ``cal_pop_fitness()`` method of the ``pygad.GA`` class checks these +parameters to see if there is a possibility of reusing the fitness +instead of calling the fitness function. + +.. _1-savesolutions: + +1. ``save_solutions`` +--------------------- + +It defaults to ``False``. If set to ``True``, then the population of +each generation is saved into the ``solutions`` attribute of the +``pygad.GA`` instance. In other words, every single solution is saved in +the ``solutions`` attribute. + +.. _2-savebestsolutions: + +2. ``save_best_solutions`` +-------------------------- + +It defaults to ``False``. If ``True``, then it only saves the best +solution in every generation. + +.. _3-keepelitism: + +3. ``keep_elitism`` +------------------- + +It accepts an integer and defaults to 1. If set to a positive integer, +then it keeps the elitism of one generation available in the next +generation. + +.. _4-keepparents: + +4. ``keep_parents`` +------------------- + +It accepts an integer and defaults to -1. It set to ``-1`` or a positive +integer, then it keeps the parents of one generation available in the +next generation. + +Why the Fitness Function is not Called for Solution at Index 0? +=============================================================== + +PyGAD has a parameter called ``keep_elitism`` which defaults to 1. This +parameter defines the number of best solutions in generation **X** to +keep in the next generation **X+1**. The best solutions are just copied +from generation **X** to generation **X+1** without making any change. + +.. code:: python + + ga_instance = pygad.GA(..., + keep_elitism=1, + ...) + +The best solutions are copied at the beginning of the population. If +``keep_elitism=1``, this means the best solution in generation X is kept +in the next generation X+1 at index 0 of the population. If +``keep_elitism=2``, this means the 2 best solutions in generation X are +kept in the next generation X+1 at indices 0 and 1 of the population of +generation 1. + +Because the fitness of these best solutions are already calculated in +generation X, then their fitness values will not be recalculated at +generation X+1 (i.e. the fitness function will not be called for these +solutions again). Instead, their fitness values are just reused. This is +why you see that no solution with index 0 is passed to the fitness +function. + +To force calling the fitness function for each solution in every +generation, consider setting ``keep_elitism`` and ``keep_parents`` to 0. +Moreover, keep the 2 parameters ``save_solutions`` and +``save_best_solutions`` to their default value ``False``. + +.. code:: python + + ga_instance = pygad.GA(..., + keep_elitism=0, + keep_parents=0, + save_solutions=False, + save_best_solutions=False, + ...) + +Batch Fitness Calculation +========================= + +In `PyGAD +2.19.0 `__, +a new optional parameter called ``fitness_batch_size`` is supported. A +new optional parameter called ``fitness_batch_size`` is supported to +calculate the fitness function in batches. Thanks to `Linan +Qiu `__ for opening the `GitHub issue +#136 `__. + +Its values can be: + +- ``1`` or ``None``: If the ``fitness_batch_size`` parameter is assigned + the value ``1`` or ``None`` (default), then the normal flow is used + where the fitness function is called for each individual solution. + That is if there are 15 solutions, then the fitness function is called + 15 times. + +- ``1 < fitness_batch_size <= sol_per_pop``: If the + ``fitness_batch_size`` parameter is assigned a value satisfying this + condition ``1 < fitness_batch_size <= sol_per_pop``, then the + solutions are grouped into batches of size ``fitness_batch_size`` and + the fitness function is called once for each batch. In this case, the + fitness function must return a list/tuple/numpy.ndarray with a length + equal to the number of solutions passed. + +.. _example-without-fitnessbatchsize-parameter: + +Example without ``fitness_batch_size`` Parameter +------------------------------------------------ + +This is an example where the ``fitness_batch_size`` parameter is given +the value ``None`` (which is the default value). This is equivalent to +using the value ``1``. In this case, the fitness function will be called +for each solution. This means the fitness function ``fitness_func`` will +receive only a single solution. This is an example of the passed +arguments to the fitness function: + +.. code:: + + solution: [ 2.52860734, -0.94178795, 2.97545704, 0.84131987, -3.78447118, 2.41008358] + solution_idx: 3 + +The fitness function also must return a single numeric value as the +fitness for the passed solution. + +As we have a population of ``20`` solutions, then the fitness function +is called 20 times per generation. For 5 generations, then the fitness +function is called ``20*5 = 100`` times. In PyGAD, the fitness function +is called after the last generation too and this adds additional 20 +times. So, the total number of calls to the fitness function is +``20*5 + 20 = 120``. + +Note that the ``keep_elitism`` and ``keep_parents`` parameters are set +to ``0`` to make sure no fitness values are reused and to force calling +the fitness function for each individual solution. + +.. code:: python + + import pygad + import numpy + + function_inputs = [4,-2,3.5,5,-11,-4.7] + desired_output = 44 + + number_of_calls = 0 + + def fitness_func(ga_instance, solution, solution_idx): + global number_of_calls + number_of_calls = number_of_calls + 1 + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + return fitness + + ga_instance = pygad.GA(num_generations=5, + num_parents_mating=10, + sol_per_pop=20, + fitness_func=fitness_func, + fitness_batch_size=None, + # fitness_batch_size=1, + num_genes=len(function_inputs), + keep_elitism=0, + keep_parents=0) + + ga_instance.run() + print(number_of_calls) + +.. code:: + + 120 + +.. _example-with-fitnessbatchsize-parameter: + +Example with ``fitness_batch_size`` Parameter +--------------------------------------------- + +This is an example where the ``fitness_batch_size`` parameter is used +and assigned the value ``4``. This means the solutions will be grouped +into batches of ``4`` solutions. The fitness function will be called +once for each patch (i.e. called once for each 4 solutions). + +This is an example of the arguments passed to it: + +.. code:: python + + solutions: + [[ 3.1129432 -0.69123589 1.93792414 2.23772968 -1.54616001 -0.53930799] + [ 3.38508121 0.19890812 1.93792414 2.23095014 -3.08955597 3.10194128] + [ 2.37079504 -0.88819803 2.97545704 1.41742256 -3.95594055 2.45028256] + [ 2.52860734 -0.94178795 2.97545704 0.84131987 -3.78447118 2.41008358]] + solutions_indices: + [16, 17, 18, 19] + +As we have 20 solutions, then there are ``20/4 = 5`` patches. As a +result, the fitness function is called only 5 times per generation +instead of 20. For each call to the fitness function, it receives a +batch of 4 solutions. + +As we have 5 generations, then the function will be called ``5*5 = 25`` +times. Given the call to the fitness function after the last generation, +then the total number of calls is ``5*5 + 5 = 30``. + +.. code:: python + + import pygad + import numpy + + function_inputs = [4,-2,3.5,5,-11,-4.7] + desired_output = 44 + + number_of_calls = 0 + + def fitness_func_batch(ga_instance, solutions, solutions_indices): + global number_of_calls + number_of_calls = number_of_calls + 1 + batch_fitness = [] + for solution in solutions: + output = numpy.sum(solution*function_inputs) + fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001) + batch_fitness.append(fitness) + return batch_fitness + + ga_instance = pygad.GA(num_generations=5, + num_parents_mating=10, + sol_per_pop=20, + fitness_func=fitness_func_batch, + fitness_batch_size=4, + num_genes=len(function_inputs), + keep_elitism=0, + keep_parents=0) + + ga_instance.run() + print(number_of_calls) + +.. code:: + + 30 + +When batch fitness calculation is used, then we saved ``120 - 30 = 90`` +calls to the fitness function. + +Use Functions and Methods to Build Fitness and Callbacks +======================================================== + +In PyGAD 2.19.0, it is possible to pass user-defined functions or +methods to the following parameters: + +1. ``fitness_func`` + +2. ``on_start`` + +3. ``on_fitness`` + +4. ``on_parents`` + +5. ``on_crossover`` + +6. ``on_mutation`` + +7. ``on_generation`` + +8. ``on_stop`` + +This section gives 2 examples to assign these parameters user-defined: + +1. Functions. + +2. Methods. + +Assign Functions +---------------- + +This is a dummy example where the fitness function returns a random +value. Note that the instance of the ``pygad.GA`` class is passed as the +last parameter of all functions. + +.. code:: python + + import pygad + import numpy + + def fitness_func(ga_instanse, solution, solution_idx): + return numpy.random.rand() + + def on_start(ga_instanse): + print("on_start") + + def on_fitness(ga_instanse, last_gen_fitness): + print("on_fitness") + + def on_parents(ga_instanse, last_gen_parents): + print("on_parents") + + def on_crossover(ga_instanse, last_gen_offspring): + print("on_crossover") + + def on_mutation(ga_instanse, last_gen_offspring): + print("on_mutation") + + def on_generation(ga_instanse): + print("on_generation\n") + + def on_stop(ga_instanse, last_gen_fitness): + print("on_stop") + + ga_instance = pygad.GA(num_generations=5, + num_parents_mating=4, + sol_per_pop=10, + num_genes=2, + on_start=on_start, + on_fitness=on_fitness, + on_parents=on_parents, + on_crossover=on_crossover, + on_mutation=on_mutation, + on_generation=on_generation, + on_stop=on_stop, + fitness_func=fitness_func) + + ga_instance.run() + +Assign Methods +-------------- + +The next example has all the method defined inside the class ``Test``. +All of the methods accept an additional parameter representing the +method's object of the class ``Test``. + +All methods accept ``self`` as the first parameter and the instance of +the ``pygad.GA`` class as the last parameter. + +.. code:: python + + import pygad + import numpy + + class Test: + def fitness_func(self, ga_instanse, solution, solution_idx): + return numpy.random.rand() + + def on_start(self, ga_instanse): + print("on_start") + + def on_fitness(self, ga_instanse, last_gen_fitness): + print("on_fitness") + + def on_parents(self, ga_instanse, last_gen_parents): + print("on_parents") + + def on_crossover(self, ga_instanse, last_gen_offspring): + print("on_crossover") + + def on_mutation(self, ga_instanse, last_gen_offspring): + print("on_mutation") + + def on_generation(self, ga_instanse): + print("on_generation\n") + + def on_stop(self, ga_instanse, last_gen_fitness): + print("on_stop") + + ga_instance = pygad.GA(num_generations=5, + num_parents_mating=4, + sol_per_pop=10, + num_genes=2, + on_start=Test().on_start, + on_fitness=Test().on_fitness, + on_parents=Test().on_parents, + on_crossover=Test().on_crossover, + on_mutation=Test().on_mutation, + on_generation=Test().on_generation, + on_stop=Test().on_stop, + fitness_func=Test().fitness_func) + + ga_instance.run() + +.. |image1| image:: https://github.com/ahmedfgad/GeneticAlgorithmPython/assets/16560492/7896f8d8-01c5-4ff9-8d15-52191c309b63 +.. |image2| image:: https://user-images.githubusercontent.com/16560492/189273225-67ffad41-97ab-45e1-9324-429705e17b20.png diff --git a/docs/source/releases.rst b/docs/source/releases.rst index fddddf9..254d458 100644 --- a/docs/source/releases.rst +++ b/docs/source/releases.rst @@ -1625,6 +1625,136 @@ Release Date 07 January 2025 fitness before the GA completes. https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/291 +.. _pygad-350: + +PyGAD 3.5.0 +----------- + +Release Date 07 July 2025 + +1. Fix a bug when minus sign (-) is used inside the ``stop_criteria`` + parameter for multi-objective problems. + https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/314 + +2. Fix a bug when the ``stop_criteria`` parameter is passed as an + iterable (e.g. list) for multi-objective problems (e.g. + ``['reach_50_60', 'reach_20, 40']``). + https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/314 + +3. Call the ``get_matplotlib()`` function from the ``plot_genes()`` + method inside the ``pygad.visualize.plot.Plot`` class to import the + matplotlib library. + https://github.com/ahmedfgad/GeneticAlgorithmPython/issues/315 + +4. Create a new helper method called ``select_unique_value()`` inside + the ``pygad/helper/unique.py`` script to select a unique gene from + an array of values. + +5. Create a new helper method called ``get_random_mutation_range()`` + inside the ``pygad/utils/mutation.py`` script that returns the + random mutation range (min and max) for a single gene by its index. + +6. Create a new helper method called + ``change_random_mutation_value_dtype`` inside the + ``pygad/utils/mutation.py`` script that changes the data type of the + value used to apply random mutation. + +7. Create a new helper method called ``round_random_mutation_value()`` + inside the ``pygad/utils/mutation.py`` script that rounds the value + used to apply random mutation. + +8. Create the ``pygad/helper/misc.py`` script with a class called + ``Helper`` that has the following helper methods: + + 1. ``change_population_dtype_and_round()``: For each gene in the + population, round the gene value and change the data type. + + 2. ``change_gene_dtype_and_round()``: Round the change the data + type of a single gene. + + 3. ``mutation_change_gene_dtype_and_round()``: Decides whether + mutation is done by replacement or not. Then it rounds and + change the data type of the new gene value. + + 4. ``validate_gene_constraint_callable_output()``: Validates the + output of the user-defined callable/function that checks whether + the gene constraint defined in the ``gene_constraint`` parameter + is satisfied or not. + + 5. ``get_gene_dtype()``: Returns the gene data type from the + ``gene_type`` instance attribute. + + 6. ``get_random_mutation_range()``: Returns the random mutation + range using the ``random_mutation_min_val`` and + ``random_mutation_min_val`` instance attributes. + + 7. ``get_initial_population_range()``: Returns the initial + population values range using the ``init_range_low`` and + ``init_range_high`` instance attributes. + + 8. ``generate_gene_value_from_space()``: Generates/selects a value + for a gene using the ``gene_space`` instance attribute. + + 9. ``generate_gene_value_randomly()``: Generates a random value for + the gene. Only used if ``gene_space`` is ``None``. + + 10. ``generate_gene_value()``: Generates a value for the gene. It + checks whether ``gene_space`` is ``None`` and calls either + ``generate_gene_value_randomly()`` or + ``generate_gene_value_from_space()``. + + 11. ``filter_gene_values_by_constraint()``: Receives a list of + values for a gene. Then it filters such values using the gene + constraint. + + 12. ``get_valid_gene_constraint_values()``: Selects one valid gene + value that satisfy the gene constraint. It simply calls + ``generate_gene_value()`` to generate some gene values then it + filters such values using + ``filter_gene_values_by_constraint()``. + +9. Create a new helper method called + ``mutation_process_random_value()`` inside the + ``pygad/utils/mutation.py`` script that generates constrained random + values for mutation. It calls either ``generate_gene_value()`` or + ``get_valid_gene_constraint_values()`` based on whether the + ``gene_constraint`` parameter is used or not. + +10. A new parameter called ``gene_constraint`` is added. It accepts a + list of callables (i.e. functions) acting as constraints for the + gene values. Before selecting a value for a gene, the callable is + called to ensure the candidate value is valid. Check the `Gene + Constraint `__ + section for more information. + +11. A new parameter called ``sample_size`` is added. To select a gene + value that respects a constraint, this variable defines the size of + the sample from which a value is selected randomly. Useful if either + ``allow_duplicate_genes`` or ``gene_constraint`` is used. An + instance attribute of the same name is created in the instances of + the ``pygad.GA`` class. Check the `sample_size + Parameter `__ + section for more information. + +12. Use the ``sample_size`` parameter instead of ``num_trials`` in the + methods ``solve_duplicate_genes_randomly()`` and + ``unique_float_gene_from_range()`` inside the + ``pygad/helper/unique.py`` script. It is the maximum number of + values to generate as the search space when looking for a unique + float value out of a range. + +13. Fixed a bug in population initialization when + ``allow_duplicate_genes=False``. Previously, gene values were + checked for duplicates before rounding, which could allow + near-duplicates like 7.61 and 7.62 to pass. After rounding (e.g., + both becoming 7.6), this resulted in unintended duplicates. The fix + ensures gene values are now rounded before duplicate checks, + preventing such cases. + +14. More tests are created. + +15. More examples are created. + PyGAD Projects at GitHub ======================== diff --git a/docs/source/utils.rst b/docs/source/utils.rst index d3a6951..15dcfb3 100644 --- a/docs/source/utils.rst +++ b/docs/source/utils.rst @@ -71,12 +71,59 @@ the supported mutation operations which are: 4. Scramble: Implemented using the ``scramble_mutation()`` method. -5. Scramble: Implemented using the ``adaptive_mutation()`` method. +5. Adaptive: Implemented using the ``adaptive_mutation()`` method. All mutation methods accept this parameter: 1. ``offspring``: The offspring to mutate. +The ``pygad.utils.mutation`` module has some helper methods to assist +applying the mutation operation: + +1. ``mutation_by_space()``: Applies the mutation using the + ``gene_space`` parameter. + +2. ``mutation_probs_by_space()``: Uses the mutation probabilities in + the ``mutation_probabilities`` instance attribute to apply the + mutation using the ``gene_space`` parameter. For each gene, if its + probability is <= that the mutation probability, then it will be + mutated based on the mutation space. + +3. ``mutation_process_gene_value()``: Generate/select values for the + gene that satisfy the constraint. The values could be generated + randomly or from the gene space. + +4. ``mutation_randomly()``: Applies the random mutation. + +5. ``mutation_probs_randomly()``: Uses the mutation probabilities in + the ``mutation_probabilities`` instance attribute to apply the + random mutation. For each gene, if its probability is <= that the + mutation probability, then it will be mutated randomly. + +6. ``adaptive_mutation_population_fitness()``: A helper method to + calculate the average fitness of the solutions before applying the + adaptive mutation. + +7. ``adaptive_mutation_by_space()``: Applies the adaptive mutation + based on the ``gene_space`` parameter. A number of genes are + selected randomly for mutation. This number depends on the fitness + of the solution. The random values are selected from the + ``gene_space`` parameter. + +8. ``adaptive_mutation_probs_by_space()``: Uses the mutation + probabilities to decide which genes to apply the adaptive mutation + by space. + +9. ``adaptive_mutation_randomly()``: Applies the adaptive mutation + based on randomly. A number of genes are selected randomly for + mutation. This number depends on the fitness of the solution. The + random values are selected based on the 2 parameters + ``andom_mutation_min_val`` and ``random_mutation_max_val``. + +10. ``adaptive_mutation_probs_randomly()``: Uses the mutation + probabilities to decide which genes to apply the adaptive mutation + randomly. + Adaptive Mutation ================= @@ -132,8 +179,7 @@ In PyGAD, if ``f=f_avg``, then the solution is regarded of high quality. The next figure summarizes the previous steps. -.. image:: https://user-images.githubusercontent.com/16560492/103468973-e3c26600-4d2c-11eb-8af3-b3bb39b50540.jpg - :alt: +|image1| This strategy is applied in PyGAD. @@ -268,7 +314,7 @@ are: 7. NSGA-II: Implemented using the ``nsga2_selection()`` method. 8. NSGA-II Tournament: Implemented using the - ``tournament_nsga2_selection()`` method. + ``tournament_selection_nsga2()`` method. All parent selection methods accept these parameters: @@ -276,6 +322,12 @@ All parent selection methods accept these parameters: 2. ``num_parents``: The number of parents to select. +It has the following helper methods: + +1. ``wheel_cumulative_probs()``: A helper function to calculate the + wheel probabilities for these 2 methods: 1) + ``roulette_wheel_selection()`` 2) ``rank_selection()`` + .. _pygadutilsnsga2-submodule: ``pygad.utils.nsga2`` Submodule @@ -472,25 +524,25 @@ parameter. Note that there are other things to take into consideration like: -- Making sure that each gene conforms to the data type(s) listed in the - ``gene_type`` parameter. +- Making sure that each gene conforms to the data type(s) listed in the + ``gene_type`` parameter. -- If the ``gene_space`` parameter is used, then the new value for the - gene should conform to the values/ranges listed. +- If the ``gene_space`` parameter is used, then the new value for the + gene should conform to the values/ranges listed. -- Mutating a number of genes that conforms to the parameters - ``mutation_percent_genes``, ``mutation_probability``, and - ``mutation_num_genes``. +- Mutating a number of genes that conforms to the parameters + ``mutation_percent_genes``, ``mutation_probability``, and + ``mutation_num_genes``. -- Whether mutation happens with or without replacement based on the - ``mutation_by_replacement`` parameter. +- Whether mutation happens with or without replacement based on the + ``mutation_by_replacement`` parameter. -- The minimum and maximum values from which a random value is generated - based on the ``random_mutation_min_val`` and - ``random_mutation_max_val`` parameters. +- The minimum and maximum values from which a random value is generated + based on the ``random_mutation_min_val`` and + ``random_mutation_max_val`` parameters. -- Whether duplicates are allowed or not in the chromosome based on the - ``allow_duplicate_genes`` parameter. +- Whether duplicates are allowed or not in the chromosome based on the + ``allow_duplicate_genes`` parameter. and more. @@ -705,3 +757,5 @@ This is the same example but using methods instead of functions. ga_instance.run() ga_instance.plot_fitness() + +.. |image1| image:: https://user-images.githubusercontent.com/16560492/103468973-e3c26600-4d2c-11eb-8af3-b3bb39b50540.jpg diff --git a/examples/example_gene_constraint.py b/examples/example_gene_constraint.py index ae9db32..2666ae2 100644 --- a/examples/example_gene_constraint.py +++ b/examples/example_gene_constraint.py @@ -3,7 +3,6 @@ """ An example of using the gene_constraint parameter. - """ function_inputs = [4,-2,3.5,5,-11,-4.7] diff --git a/pygad/utils/mutation.py b/pygad/utils/mutation.py index 255cced..d51b7ed 100644 --- a/pygad/utils/mutation.py +++ b/pygad/utils/mutation.py @@ -20,8 +20,10 @@ def random_mutation(self, offspring): """ Applies the random mutation which changes the values of a number of genes randomly. The random value is selected either using the 'gene_space' parameter or the 2 parameters 'random_mutation_min_val' and 'random_mutation_max_val'. - It accepts a single parameter: + + It accepts: -offspring: The offspring to mutate. + It returns an array of the mutated offspring. """ @@ -48,8 +50,8 @@ def random_mutation(self, offspring): def mutation_by_space(self, offspring): """ - Applies the random mutation using the mutation values' space. - It accepts a single parameter: + Applies the mutation using the gene_space parameter. + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring using the mutation space. """ @@ -88,8 +90,8 @@ def mutation_by_space(self, offspring): def mutation_probs_by_space(self, offspring): """ - Applies the random mutation using the mutation values' space and the mutation probability. For each gene, if its probability is <= that mutation probability, then it will be mutated based on the mutation space. - It accepts a single parameter: + Applies the random mutation using the mutation values' space and the mutation probability. For each gene, if its probability is <= that the mutation probability, then it will be mutated based on the mutation space. + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring using the mutation space. """ @@ -178,8 +180,8 @@ def mutation_process_gene_value(self, def mutation_randomly(self, offspring): """ - Applies the random mutation the mutation probability. For each gene, if its probability is <= that mutation probability, then it will be mutated randomly. - It accepts a single parameter: + Applies the random mutation. + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -216,7 +218,7 @@ def mutation_probs_randomly(self, offspring): """ Applies the random mutation using the mutation probability. For each gene, if its probability is <= that mutation probability, then it will be mutated randomly. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -254,7 +256,7 @@ def swap_mutation(self, offspring): """ Applies the swap mutation which interchanges the values of 2 randomly selected genes. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -272,7 +274,7 @@ def inversion_mutation(self, offspring): """ Applies the inversion mutation which selects a subset of genes and inverts them (in order). - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -289,7 +291,7 @@ def scramble_mutation(self, offspring): """ Applies the scramble mutation which selects a subset of genes and shuffles their order randomly. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -308,7 +310,7 @@ def adaptive_mutation_population_fitness(self, offspring): """ A helper method to calculate the average fitness of the solutions before applying the adaptive mutation. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns the average fitness to be used in adaptive mutation. """ @@ -460,7 +462,7 @@ def adaptive_mutation(self, offspring): """ Applies the adaptive mutation which changes the values of a number of genes randomly. In adaptive mutation, the number of genes to mutate differs based on the fitness value of the solution. The random value is selected either using the 'gene_space' parameter or the 2 parameters 'random_mutation_min_val' and 'random_mutation_max_val'. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -493,7 +495,7 @@ def adaptive_mutation_by_space(self, offspring): Applies the adaptive mutation based on the 2 parameters 'mutation_num_genes' and 'gene_space'. A number of genes equal are selected randomly for mutation. This number depends on the fitness of the solution. The random values are selected from the 'gene_space' parameter. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -565,7 +567,7 @@ def adaptive_mutation_randomly(self, offspring): Applies the adaptive mutation based on the 'mutation_num_genes' parameter. A number of genes equal are selected randomly for mutation. This number depends on the fitness of the solution. The random values are selected based on the 2 parameters 'random_mutation_min_val' and 'random_mutation_max_val'. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -629,7 +631,7 @@ def adaptive_mutation_probs_by_space(self, offspring): Applies the adaptive mutation based on the 2 parameters 'mutation_probability' and 'gene_space'. Based on whether the solution fitness is above or below a threshold, the mutation is applied diffrently by mutating high or low number of genes. The random values are selected based on space of values for each gene. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """ @@ -703,7 +705,7 @@ def adaptive_mutation_probs_randomly(self, offspring): Applies the adaptive mutation based on the 'mutation_probability' parameter. Based on whether the solution fitness is above or below a threshold, the mutation is applied diffrently by mutating high or low number of genes. The random values are selected based on the 2 parameters 'random_mutation_min_val' and 'random_mutation_max_val'. - It accepts a single parameter: + It accepts: -offspring: The offspring to mutate. It returns an array of the mutated offspring. """