The phrase "Incremental Development," might be an all encomepassign term referring many aspects and techniques for the incremental training of a machine learning system, including the other aspects discussed here. The additional phrase "and Testing" is meant to convey a narrower focus, not to add an additional aspect.
In Incremental Development and Testing, the distinguishing characteristic is that, during the incremental development process there are many rounds of testing the performance on development data that has been set aside from the training data. This development testing is used by the human developers and/or the AI in a Learning Management System to make decisions about what to do in later rounds of the incremental development. Because repeated testing may cause the system to adapt to properties of the development data, in Incremental Development and Testing, best practices dictate that disjoint sets of development data be used for successive rounds of development.
As in manual development, in incremental development, the testing is used to make design decisions, to detect problems of bias or variance, and to tune hyperparameters. However, in HAT-AI, best practices use a cooperative human + AI learning mangagement system. This learning management system is capable of handling millions of hyperparameters. A node-to-node knowledge sharing link and any associated hyperparameters, for example, may be customized to an individual knowledge receiving node. Each node-to-node knowledge sharing link may also be data-specific, that is, to only apply for a selected subset of data. When a substantial number of customized hyperparameters have been tuned using a set of development data, it is recommended that the data no longer be used for further development testing. This former development data may be added to the training set to turn it into training data for future rounds of development. The selection of data for a data-specific knowledge sharing link should be done automatically by a subnetowrk or companion network that is trained only from training data.
Example paradigms using incremental development:
The ability to add structure without degrading performance is essential to the process of incremental growth described here. This technique seems obvious from hindsight but has been largely overlooked in the development of deep learning. For example, even after it was discovered in the 1980s that training by gradient descent using back propagation could successfully train neural networks with more than one layer, progress with training more than one hidden layer was disappointingly slow, with convolutional neural networks as the main exception. However, the success with convolutional neural networks do not generalize to general purpose network architectures.
In the twenty-first century, after deep learning had been successful for several years using algorithms that didn't exist in the 1980s, it was discovered that the algorithms used 1980s could also successfully train deeper neural networks on problems with large quantities of data and an amount of computing power that wasn't available in the 1980s.
However, with techniques such as one-shot learning, incremental development does not require a large amount of data. Furthermore, there are ways to limit the amount of computation to generally no worse than proportional to the number of errors, which is small for problems with only a small amount of data. In other words, incremental development based on incrementally adding structure without degrading performance would have been computationally feasible in the 1980s with amount of computing power than available.
Navigation:
by James K Baker and Bradley J Baker