My proposed project is a text-dependant speaker recognition
system using an ART2 network for the actual pattern recognition (ART2 is
a variant of ART designed to take real-valued vector input). I will
train the system on a population of 8 people, 4 male and 4 female.
Each speaker will utter the same short phrase three times to provide training
data and once to provide test data. This will be augmented with noisy
versions of the same utterance to give the network robustness. The
speech will then be processed with STRUCT,
a freely available toolkit that will do various forms of feature extraction.
Then the training data will be fed into an ART network and the parameters
adjusted to maximize the number of correctly recognized speakers and eliminate
incorrect classification. Once the network has been trained, it will
be given versions of the utterances with additional noise and a new version
of the utterance to test its stability. It will also be given utterances
by speakers it has not been trained for to test its discrimination and
plasticity.