PIVS

 -- Methods Based on Perturbing Influence Value For Labeling Error Detection in Microarray


  Introduction

PIVS is a JAVA program  for labeling error detection in Microarray. It provides 3 methods  based on the Perturbing Influence Value, which is defined to measure the effect of data perturbation on the regression model. The 3 methods are:


  Downloads

Java environment (Download) is needed for PIVS
Source Code src.zip Download
JAR Package pivs.jar Download
Executable Program (for Windows 32bit) pivs.exe Download


  Usage

For users under all kinds of operating system with Java environment,  the jar package can be used in the following way:

java -jar pivs.jar <method> <input file> [output file]

For windows users with Java environment, the executable program pivs.exe can also be used:

pivs <method> <input file> [output file]

where

Examples:

java -jar pivs.jar capiv /home/input.dat
pivs rapiv C://input.dat
pivs prapiv C://input.dat C://output.dat


  Input File Format

The input file of PIVS should contain both gene expression information and labels of samples. Every sample should be represent in a single line which is begin with the label number of the sample.  The format of the sample line should be:

[label] [gene index1]:[expression value1] [gene index2]:[expression value2] ...

where

Here is an example of input file:

1 1:-0.272877 2:-0.368607 3:0.127899 4:0.473401 5:-0.517057
-1 1:-0.231336 2:-0.490426 3:-0.49687 4:-1 5:-0.528747

1 1:0.0898276 2:-0.394748 3:0.106695 4:0.110892 5:-0.501981


  Test Data

You may use the following microarray datasets to test PIVS.

Dataset Name Number of Genes Number of Samples Caption Downloads
Colon 2000 62 Suspect samples are T2, T30, T33, T36, T37, N8, N12, N34, N36.
Reference: Alon et al. (Proc. Natl Acad. Sci., 1999)
Download
Colon-p 2000 53 Remove suspect samples of  T2, T30, T33, T36, T37, N8, N12, N34, N36 in Colon dataset. There suppose to be no labeling error in Colon-p. Download
Breast 7129 49 Suspect samples are 11, 14, 16, 31, 33, 45, 46, 40, 43.
Reference:  West et al. (Proc. Natl Acad. Sci., 2001)
Download
Breast-p 7129 40 Remove suspect samples of 11, 14, 16, 31, 33, 45, 46, 40, 43 in Breast dataset. There suppose to be no labeling error in Breast-p. Download

You can also randomly flip the labels of some samples in Colon-p or Breast-p to make other datasets to test our methods.
Here is an example dataset based on Breast-p. The labels of sample 1, 2, 3, 38, 39, 40 in Breast-p are artificially flipped. Click here to download.

Note: For large datasets, PIVS may take up to a few hours to run, especially for PRAPIV. We will improve the algorithms in the future.


  Reference on PIVS


  Links


  Contact

Please send comments and suggestions to Chen Zhang.


blogger counters
View My Stats