PIVS is a JAVA program for labeling error detection in Microarray. It provides 3 methods based on the Perturbing Influence Value, which is defined to measure the effect of data perturbation on the regression model. The 3 methods are:
- CAPIV (Column Algorithm based on the Perturbing Influence Value)
- RAPIV (Row Algorithm based on the Perturbing Influence Value)
- PRAPIV (Progressive Row Algorithm based on the Perturbing Influence Value)
Java environment (Download) is needed for PIVS
Source Code src.zip Download JAR Package pivs.jar Download Executable Program (for Windows 32bit) pivs.exe Download
For users under all kinds of operating system with Java environment, the jar package can be used in the following way:
java -jar pivs.jar <method> <input file> [output file]
For windows users with Java environment, the executable program pivs.exe can also be used:
pivs <method> <input file> [output file]
where
- <method> method name: choose the method of capiv, rapiv or prapiv.
- <input file> full path of input file: the input file should be microarray data with the label information, please see input file format section for details.
- [output file] full path of output file: it is an optional parameter, output file will not be generated if there is no output file parameter.
Examples:
java -jar pivs.jar capiv /home/input.dat
pivs rapiv C://input.dat
pivs prapiv C://input.dat C://output.dat
The input file of PIVS should contain both gene expression information and labels of samples. Every sample should be represent in a single line which is begin with the label number of the sample. The format of the sample line should be:
[label] [gene index1]:[expression value1] [gene index2]:[expression value2] ...
where
- [label]: Label number of the sample. PIVS only deals with 2-class data now, so the label number should be +1 for one class and -1 for the other.
- [gene index]: Order index of each gene expression value. It is a integer number such as 1, 2, 3...
- [expression value]: The corresponding expression value of the gene in the sample.
Here is an example of input file:
1 1:-0.272877 2:-0.368607 3:0.127899 4:0.473401 5:-0.517057
-1 1:-0.231336 2:-0.490426 3:-0.49687 4:-1 5:-0.528747
1 1:0.0898276 2:-0.394748 3:0.106695 4:0.110892 5:-0.501981
Test Data
You may use the following microarray datasets to test PIVS.
Dataset Name Number of Genes Number of Samples Caption Downloads Colon 2000 62 Suspect samples are T2, T30, T33, T36, T37, N8, N12, N34, N36.
Reference: Alon et al. (Proc. Natl Acad. Sci., 1999)Download Colon-p 2000 53 Remove suspect samples of T2, T30, T33, T36, T37, N8, N12, N34, N36 in Colon dataset. There suppose to be no labeling error in Colon-p. Download Breast 7129 49 Suspect samples are 11, 14, 16, 31, 33, 45, 46, 40, 43.
Reference: West et al. (Proc. Natl Acad. Sci., 2001)Download Breast-p 7129 40 Remove suspect samples of 11, 14, 16, 31, 33, 45, 46, 40, 43 in Breast dataset. There suppose to be no labeling error in Breast-p. Download You can also randomly flip the labels of some samples in Colon-p or Breast-p to make other datasets to test our methods.
Here is an example dataset based on Breast-p. The labels of sample 1, 2, 3, 38, 39, 40 in Breast-p are artificially flipped. Click here to download.Note: For large datasets, PIVS may take up to a few hours to run, especially for PRAPIV. We will improve the algorithms in the future.
Reference on PIVS
- Chen Zhang, Chunguo Wu, Enrico Blanzieri, You Zhou, Yan Wang, Wei Du, and Yanchun Liang. Methods for Labeling Error Detection in Microarrays Based on the Effect of Data Perturbation on the Regression Model, Bioinformatics, 2009, 25(20):2708-2714.
Links
- Computational Systems Biology Group (Jilin University): Our research group.
- DMarker: A Bio-Marker Inference System for Human Diseases.
Contact
Please send comments and suggestions to Chen Zhang.