One of the most essential challenges in Data Mining and Knowledge Discovery is the development of effective tools able to find regularities in data. In order to highlight and to extract interesting knowledge from the data at hand, a key problem is frequent pattern mining, i.e. to discover frequent substructures hidden in the available data. In many interesting application fields, data are often represented and stored as sequences over time or space of generic objects. Due to the presence of noise and uncertainties in data, searching for frequent subsequences must employ approximate matching techniques, such as edit distances. A common procedure to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. However, this plain approach can produce many spurious patterns due to multiple pattern matchings on close positions in the same sequence excerpt. In this paper, we present a method to overcome this drawback by applying an optimization-based step lter that identifies the most descriptive patterns among those found by the clustering process, and allows to return more compact and easily interpretable clusters. We evaluate the mining systems performances on synthetic data in two separate cases, corresponding respectively to two different (simulated) sources of noise. In both cases, our method performs well in retrieving the original patterns with acceptable information loss.
2016, Computational intelligence, Pages 131-150 (volume: 620)
Noise sensitivity of an information granules filtering procedure by genetic optimization for inexact sequential pattern mining (02a Capitolo o Articolo)
Maiorino Enrico, Possemato Francesca, Modugno Valerio, Rizzi Antonello
ISBN: 978-3-319-26391-5; 978-3-319-26393-9
Gruppo di ricerca: Robotics