[1] C. Furlanello, M. Serafini, S. Merler, and G. Jurman. Advances in Neural Network Research: IJCNN 2003, chapter An accelerated procedure for recursive feature ranking on microarray data. Elsevier, 2003. [ bib ]
[2] C. Furlanello, M. Serafini, S. Merler, and G. Jurman. Control of selection bias in microarray data analysis. Minerva Biotecnologica, 15(4):217-222, 2003. [ bib ]
[3] C. Furlanello, M. Serafini, S. Merler, and G. Jurman. Entropy-Based Gene Ranking without Selection Bias for the Predictive Classification of Microarray Data. BMC Bioinformatics, (4):54, 2003. [ bib | http ]
We describe the E-RFE method for gene ranking, useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of very small gene subsets (an effect known as the selection bias, in which too optimistic predictive errors are estimated due to testing on samples already considered in the feature selection process).

Results: With E-RFE, we speed up the recursive feature e limination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using to an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles.

Conclusions: Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, however, improving on alternative parametric RFE reduction strategies. A process for gene selection and error estimation is, thus, made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

[4] C. Furlanello, M. Serafini, S. Merler, and G. Jurman. An accelerated procedure for recursive feature ranking on microarray data. Neural Networks, 16(5-6):641-648, 2003. [ bib ]
We describe a new wrapper algorithm for fast feature ranking in classification problems. The E-RFE (Entropy-based Recursive Feature Elimination) method eliminates chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. With specific regard to DNA microarray datasets, the method is designed to support computationally intensive model selection in classification problems in which the number of features is much larger than the number of samples. We test E-RFE on synthetic and real data sets, comparing it with other SVM-based methods. The speed-up obtained with E-RFE supports predictive modeling on high dimensional microarray data.

[5] C. Furlanello, M. Neteler, S. Merler, S. Menegon, S. Fontanari, A. Donini, A. Rizzoli, and C. Chemini. GIS and the Random Forest Predictor: Integration in R for Tick-borne Disease Risk Assessment. In K. Hornik and F. Leisch, editors, Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, Austria, March 20-22, 2003. [ bib | .pdf ]
We discuss how sophisticated machine learning methods may be rapidly integrated within a GIS for the development of new approaches in landscape epidemiology. A multitemporal predictive map is obtained by modeling in R, analyzing geodata and digital maps in GRASS, and managing biodata samples and weather data in PostgreSQL. In particular, we present a risk mapping system for tick-borne diseases, applied to model the risk of exposure to Lyme borreliosis and tick-borne encephalitis (TBE) in Trentino, Italian Alps.

[6] S. Merler, C. Furlanello, B. Larcher, and A. Sboner. Automatic model selection in cost-sensitive boosting. Information Fusion, 4(1):3-10, 2003. [ bib | .pdf ]
This paper introduces SSTBoost, a predictive classification methodology designed to target the accuracy of a modified boosting algorithm towards required sensitivity and specificity constraints. The SSTBoost method is demonstrated in practice for the automated medical diagnosis of cancer on a set of skin lesions (42 melanomas and 110 naevi) described by geometric and colorimetric features. A cost-sensitive variant of the AdaBoost algorithm is combined with a procedure for the automatic selection of optimal cost parameters. Within each boosting step, different weights are considered for errors on false negatives and false positives, and differently updated for negatives and positives. Given only a target region in the ROC space, the method also completely automates the selection of the cost parameters ratio, tipically of uncertain definition. On the cancer diagnosis problem, SSTBoost outperformed in accuracy and stability a battery of specialized automatic systems based on different types of multiple classifier combinations and a panel of expert dermatologists. The method thus can be applied for the early diagnosis of melanoma cancer or in other problems in which an automated cost-sensitive classification is required.

[7] C. Furlanello, M. Serafini, S. Merler, and G. Jurman. Gene selection and classification by Entropy-based Recursive Feature Elimination. In International Joint Conference on Neural Networks, pages 3077-3082, Portland, Oregon, July 20-24, 2003. [ bib ]
We analyse E-RFE (Entropy-based Recursive Feature Elimination), a new wrapper algorithm for fast feature ranking in classification problems. The E-RFE method operates the elimination of chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. The method is designed to support computationally intensive model selection in classification problems in which the number of features is much larger than the number of samples. We proofread the elimination procedure on synthetic data sets, and we demonstrate the applicability of E-RFE for the identification of biomedically important genes in predictive classification of microarray data.

[8] H. Mitasova and M. Neteler. Free General-purpose GIS. A Geographic Resources Analysis Support System. GIM International, 17(11):40-43, 2003. [ bib | http ]
The Geographic Resources Analysis, GRASS, a general purpose GIS originally developed by U.S. Army Corps of Engineers Laboratory, has grown into one of the main components of Open Source and Free Software geospatial computational infrastructure. Current developments led by international team of programmers, focus on improving the 2D and 3D raster and vector data processing and analysis tools and 3D visualization capabilities in the wake of publishing of the code under GPL in 1999. Applications in the area of epidemiology, coastal management and water flow modelling provide a snapshot of the capabilities.

[9] G. Antoniol, M. Di Penta, and M. Neteler. Moving to smaller libraries via clustering and genetic algorithms. In CSMR 2003, 7th IEEE European Conference on Software Maintenance and Reengineering, pages 307-316, 2003. [ bib | .pdf ]
There may be several reasons to reduce a software system to its bare bone removing the extra fat introduced during development or evolution. Porting the software system on embedded devices or palmtops are just two examples.

This paper presents an approach to re-factoring libraries with the aim of reducing the memory requirements of executables. The approach is organized in two steps. The first step defines an initial solution based on clustering methods, while the subsequent phase refines the initial solution via genetic algorithms.

In particular, a novel genetic algorithm approach, considering the initial clusters as the starting population, adopting a knowledge-based mutation function and a multi-objective fitness function, is proposed.

The approach has been applied to several medium and large-size open source software systems such as GRASS, KDE-QT, Samba and MySQL, allowing to effectively produce smaller, loosely coupled libraries, and to reduce the memory requirement for each application.

[10] V. Raghavan, K. Kita, K. Iwao, and M. Neteler. Open source GIS GRASS for developing spatial data infrastructure - Present status and future potential. Journal of Information Science and Technology Association, 4(53):216-222, 2003. [ bib | .html ]
This article outlines the salient features and current state of development of the Open Source GIS GRASS. We discuss the concepts and issues related to the development of GRASS that represents the only full fledged, multi-platform GIS available as OSS. Further, we highlight the potential of GRASS GIS in developing spatial data infrastructure and put forth a proposal for establishing a GRASS Consortium to support, nurture and accelerate furher developments.