# Honors Project Topics¶

Find below a list of projects that are available under my supervision. You are welcome to contact me if you want more detail on these topics.

### AE1: Multi-objective Control Parameter Tuning¶

Just about all optimization and machine learning algorithms have control parameters. The bad thing about these problem parameters is that performance of the algorithm is very sensitive to different control parameter values, and best values are usually very problem dependent. The consequence is that, if one wants to apply an optimized instance of an algorithm, tuning of these control parameters is required. A number of different approaches to control parameter tuning can be found, including factorial design, f-race, iterated f-race, amongst others. However, these approaches tune control parameters with respect to one performance criterion, usually either accuracy or computational effort.

This study will develop a multi-objective approach to control parameter tuning, considering a number of performance criteria, including accuracy, efficiency, robustness, and stability. A very simple approach will be to evaluate performance for a range of parameter values with respect to each of the perfromance criteria. Then to rank performance for the different control parameter values per performance criterion, and then compute an average rank over all of the performance criteria.

A second approach will be developed to use non-dominance based ranking of algorithms based on the performance measures under evaluation.

An alternative will be to adapt and approach such as f-race to drill down to the most optimal regions of the control parameters using the non-dominance relation with respect to all of the performance criteria.

In addition, it is also required to visualize how performance of an algorithm changes for different values of the control parameters, over different axes, one axis per performance measure.

This is a research project, with a significant amount of empirical analysis and coding. The project has very good potential to result in a publication. The control parameter tuning approach need to interface with CIlib, an opensource library of Computational Intelligence algorithms developed in scala. However, it should be general enough to also interface with any other algorithm or algorithm framework.

Prerequisite: Statistics, and students are advised to do Artificial Intelligence 791.

### AE2: Self-Adaptive Meta-Heuristics using Cultural Algorithms¶

Cultural algorithms (CAs) are a form of evolutionary algorithm which maintain a belief space in parallel with a population space. The population space represents a set of candidate solutions to the optimization problem, and the belief space maintains a set of beliefs about where in the search landscape and optimum resides. Any population-based meta-heuristic can be used in the population space to adapt candidate solutions. This meta-heuristic has as its objective to find an optimal solution to the optimization problem under consideration. The belief space is a collection of ``beliefs’’ as formed by a selected few individuals in the population as to where in the search space these individuals believe the optimal solution can be found.

Just about all meta-heuristics have control parameters, with different control parameter configurations resulting in different levels of performance. The best control parameter configuration is then also very problem dependent. In order to find the best control parameter configuration, a computationally expensive parameter tuning can be done prior to solving the problem. For each new problem, the tuning has to be redone. Instead, self-adaptive algorithms have been developed where control parameter values are adjusted during the optimization process.

Considering particle swarm optimization (PSO) specifically, a number of such self-adaptive approaches have been developed. However, a recent study has shown these approaches to be very inefficient.

This research will develop a CA approach to search for the optimal control parameter values of the meta-heuristic (PSO in this case) used in the population space, by defining a belief space to represent th econtrol parameter space. The belief space will effectively indicate parts of control parameter space where the influential (or the best performing) individuals belief the best values for the control parameters can be found. Each individual will then sample values for its control parameters from this belief space. Different staregies to update the belief space, and to utilize the belief space will be developed to prevent premature convergence in both the belief (control parameter) space and the population space. The end result will hopefully be a more efficient self-adaptive PSO algorithm.

The approach will be extensively empirically analyzed.

This is a research project, with a significant amount of empirical analysis and some coding. The project has very good potential to result in a publication. You are strongly advised to make use of CIlib, an opensource library of Computational Intelligence algorithms developed in scala.

Prerequisite: Students are advised to do Artificial Intelligence 791.

### AE3: A Concise Domain Specific Language for the Specification of Evolutionary and Swarm Intelligence Algorithms using CIlib¶

Evolutionary and swarm intelligence algorithms are a subset of nature-inspired optimization algorithms (inspiration comes from birds, ants, the survival of the fittest, bacteria etc). A software library, called CIlib, focuses on the definition, execution and evaluation of these kinds of algorithms.

Within this project, the student will be exposed to functional programming and will learn core abstractions and tools to enable this style of programming. The goal of this project is to create a domain specific language (DSL) to allow for a subset of the algorithms, problems and measurement to be used by individuals that do not have the needed skills to define such algorithms themselves. It will be a challenging problem to solve, requiring Scala code to combine different parts of the CIlib library in order to produce an executable experiment definition for the DSL user. A domain-specific language (DSL) will allow users without the adequate programming skills to define and execute algorithms in order to obtain measurements for analytical work, whilst adding to an ever growing repository of experimental data.

The DSL should be easily extendible, because additional algorithms and functionality are continuously added to CIlib.

The DSL produced within this project will become part of the main interface to a larger data processing pipeline. This pipeline will produce data that will add to an ever-increasing repository of algorithm results – these results will be referenced in subsequent academic publications and will be used as verification data to help prove or disprove hypotheses on algorithm performance and behaviour.

The project will have the following deliverables: 1.The design of a DSL to allow for simplfied experiment definitions – either kind of DSL (external or internal) is acceptable. 2.Implementation of an interpreter (a program that reads the DSL and performs actions) to convert the defined DSL into the required program such that the experiment can execute. 3.A simplified user manual to describe and explain the functionality of the DSL, and to explain how to expand the DSL as new functionality is added to the library. 4.A web-based interface to allow for the processing of the DSL language – the web based interface will not execute the programs, but instead validate the DSL syntax and submit the experiments as a job for processing by an external processing engine.

Important to note: * The student will receive regular mentoring from the main CIlib developers to aid with the process * Knowledge of how CIlib works and operates will be provided to the student * This project will form part of a larger project aimed at simplifying the computaional intelligence research field * Implementations will require knowledge of the Scala programming language for the DSL implementation. The web interface will not require any upfront knowledge.

Some research will be required in order to implement the DSL. The student will need to motivate for a specific type of DSL implementation, either being an internal or external DSL. Some inspiration may be taken from existing projects, but understanding the shortcomings of these projects is even more important.

### AE4: An Automated Empirical Analysis and Analytics Tool for Stochastic Optimization Algorithms¶

When a new optimization algorithm is developed, the performance of that algorithm has to be compared with other algorithms against different performance criteria on an extensive benchmark set. This firstly requires implementation of a number of optimization algorithms, benchmark problems, performance criteria, and then executing tuned versions of these for a number of independent runs on the provided benchmark problems. A task that can be extremely time consuming.

It happens that many researchers will implement the same algorithms (possibly with bugs), the same problems, and will run these algorithms on the same problems. After the research has been done, and a research paper written, the captured performance data is lost. No central repository is kept, and there is usually no archiving of raw performance data and analyzed data, with links to the corresponding algorithms, algorithm configuration, problems, and problem configurations. Should such a results repository exist, much effort and time can be saved. One can simply perform a query and extract algorithm performance data, and use that in the analysis and comparison with a new algorithm.

Empirical analysis and comparisons of algorithms also require that formal statistical tests be done to determine if one algorithm perform significantly better than another, and to rank algorithms based on different performance criteria.

In addition, benchmark problems have certain characteristics, and fitness landscape characteristics, which can be used to predict which algorithm will be best to solve that problem. In order to get to such predictions, performance results need to be correlated with fitness landscape characteristics.

This project will design such a data repository for optimization algorithms. A database has to be designed to characterize benchmark problems, to populate the landscape characteristics of problems, to store performance results and to tag these rsults with the algorithm and problem configurations. In addition, a dynamic approach to ranking algorithms on specific problems or classes of problems will be developed. This will be a dynamic system that will be able to update the ranking when new algorithms are added, and that will be able to rank algorithms based on different performance criteria.

In addition, some basic queries for exploring the raw and analyzed data will be included, to extract specific data and results, and to visualize results will be developed.

The final tool will need a web-based client to interface with the repository and analysis tools. Note that the tool will link to CIlib, a scala library of optimization algorithms and benchmark functions. It will also link to a fitness landscape analysis tool to populate the landscape characteristics of the problems.

Prerequisite: Students are advised to do Artificial Intelligence 791, and need strong experience in database designe and development, and client-server architectures. You will also have to learn scala.

### AE5: Dynamic Multi-Guide Particle Swarm Optimization¶

We have developed a sub-swarm based PSO, referred to as the multi-guide PSO (MGPSO) to solve static multi-objective optimization problems. The algorithm uses multiple swarms, one swarm per objective, and adds to the velocity update equation an archive term to guide the search towards the Pareto front. Results have shown that this MGPSO performs very competitively to other multi-objective optimization algorithms. The goal of this research is to develop approaches to adapt the \gls{mgpso} to solve dynamic multi-objective optimization problems. As a very first attempt, archive update mechanisms will be developed, to ensure that the solutions in the archive remains non-dominated after environment changes. As a second approach, mechanisms will be implemented to maintain diversity in the sub-swarms, to ensure efficient tracking of changing optimum positions.

This is a research project, with a significant amount of empirical analysis and some coding. The project has very good potential to result in a publication. You are strongly advised to make use of CIlib, an opensource library of Computational Intelligence algorithms developed in scala.

Prerequisite: Students are advised to do Artificial Intelligence 791.

### AE6: Diversity Measure for Genetic Programming¶

Diversity is an indication of the extent to which a population is still exploring the search space, or converging to a single solution. Genetic programming populations contain candidate solutions represented as tree structures, which makes it difficult to define diversity measures. This research will develop and validate diversity measures for genetic programming populations, that is to develop measures to quantify the degree of similarity of tree structures.

This is a research project, with the potential of publications.

Prerequisite: Students are advised to do Artificial Intelligence 791, and to have some knowledge of graph theory and tree structures.

### AE7: Active Learning Learning as A Dynamic Optimization Problem¶

Active learning in neural networks (NNs) refers to the ability of a NN to identify the most informative training patterns, and to focus training only on these informative patterns. Due to the training set changing over time, active learning is in essence a dynamic optimization problem (DOP), which will result in concept drift. Existing research on the development of active learning approaches does not consider these approaches as DOPs. This research will formally define active learning as a DOP. Dynamic meta-heuristic algorithms, such as particle swarm optimization or differential evolution, will then be applied to train these neural networks to see if performance can be improved compared to gradient based training.

This is a research project, with the potential of publications.

Prerequisite: Students are advised to do Artificial Intelligence 791 and Machine Learning 741.

### AE8: Set-based Particle Swarm Optimization for Data Clustering¶

Data clustering concerns the grouping of patterns into clusters such that patterns in the same cluster are similar to one another, and such that clusters differ from one another. Clustering has three objectives: (1) to produce compact clusters, i.e. distances between patterns and their cluster centroid should be as small as possible; (2) clusters should be well-separated, i.e. distances between cluster centroids should be as large as possible; (3) the number of clusters should be optimal. Clustering is therefore, in essence, a multi-objective optimization problem, where the optimal set of centroids need to be found.

The set-based particle swarm optimization (SBPSO) algorithm is a variation of particle swarm optimization (PSO) where candidate solutions are represented as sets, and not as vectors as for the standard PSO. Solutions are found by finding optimal sets as combinations of elements from the defined universe.

Because the solution to the clustering optimization problem is a set of centroids, and the universe is a finite set of elements, the centroids will be actual patterns from the provided dataset. That is, the set universe is the actual dataset. The SBPSO's task is then to select the smallest number of centroids such that the resulting clusters are compact and well-separated. This is then very similar to the k-mediods clustering algorithm.

This project will require to develop approaches to cope with multiple objectives in the SBPSO. One such approach is to make use of a weighted linear aggregation of the objectives. An alternative is to make use of a recently developed universal AND-OR fuzzy operator for multi-objective optimization problems. Lastly, a non-dominance (Pareto-optimal) based approach can be used.

It will be expected that the code be implemented in CIlib, an opensource library implemented in scala.

This is a research project, that will very likely lead to publications.

Prerequisite: Students are advised to do Artificial Intelligence 791.

### AE9: A Set-based Particle Swarm Optimization Algorithm for Rule Induction¶

In data analytics, the extraction of if-then rules from a dataset is a very important task in gaining an understanding of the patterns embedded in dataset. A number of rule induction (rule extraction) algorithms have been developed, e.g. the class of covering algorithms. These algorithms consider each class individually, and then extracts a set of rules from the data to predict that class. The set of rules then also serve as an explanation of the correlations and patterns between descriptive features and the target (class) features. Each rule in the rule set is in the form of *IF antecedant THEN consequent*, where the antecedant is a conjunction of selectors (or conditions on descriptive features).

Ultimately, rule induction is a multi-objective optimization problem, where the goals are to extract as few and as simple as possible rules that are as accurate as possible. This then will result in rules with the best possible generalization performance (i.e. rules do not overfit the training data) and that are easily interpretable.

This project will see the development of a set-based particle swarm optimization (SBPSO) algorithm to extract rules from datasets. The universe will be the set of possible selectors, and the candidate solutions (i.e. particles) will antecedants of the rules. As for covering algorithms, rules will be extracted per class. Since each particle will represent only one rule, a challenge will be to develop an approach to produce a set of such rules, i.e. a disjunction of antecedants per class.

It will be expected that the code be implemented in CIlib, an opensource library implemented in scala.

This is a research project, that will very likely lead to publications.

Prerequisite: Students are advised to do Artificial Intelligence 791.

### AE10: Fitness Landscape Analysis of Higher-Order Neural Networks¶

Higher-order neural networks (NNs) have higher-order transformations of input signals. One type of higher-order NNs is a product unit NN (PUNN) or pi-sigma NN. A PUNN has product units in the hidden layer, where the net input signal is a product of highly non-linear transformations of the input signal. Hidden units use linear activation functions in both the input and the output layers. The output layer makes use of summation units, where the net input signal is calculated as a weighted sum over the input signals.

Product units have the advantage that non-linear relationships are easier to approaximate, with more accurate results and less product units are necessary than summation units. The consequence is then less complex NN architectures. Product units do, however, have a significant impact on the NN loss landscape (error surface), resulting in steeper gradients, and possibly more local minima and plateaus.

The objective of this study is to analyze the impact of product units by conducting a fitness landscape and loss-gradient cloud analysis of the NN fitness landscape, in comparison of the fitness landscape produced by summation units. The study will then proceed to evaluate the performance of a number of NN training algorithms, and to map performance of these training algorithms to the observed fitness landscape characteristics.

Prerequisite: Students are advised to do Artificial Intelligence 791 and Machine Learning 741.