LAS Manual

Input File Format
LAS Guide
Output Files Format
- Binary (0/1) Output Format
- Text Output Format

Input File Format

The input file can have no, one or more rows of column labels and no, one or more columns of row labels. The sample input file shown below has 1 row of column labels and 2 columns of row labels.

rc1	rc2	col1	col2	col3
rowLabel11	rowLabel12	1	2	3
rowLabel21	rowLabel22	2	1	2
rowLabel31	rowLabel32	3	2	3
rowLabel41	rowLabel42	3	1	2
rowLabel51	rowLabel52	1	2	2
rowLabel61	rowLabel62	3	1	3
rowLabel71	rowLabel72	1	2	3
rowLabel81	rowLabel82	3	1	2
rowLabel91	rowLabel92	1	2	3

LAS Guide

The LAS wizard guides you through the various settings of the algorithm. Let me thank to Manish Ranjan Kumar for the C# Wizard Control, which I found very helpful.

The second page of the wizard helps you load the data to be biclustered.

Select the file to be loaded by browsing ("Browse" button) for it or typing or copy-pasting its name in the edit-box.
Select the number of top rows (Rows of column labels) and left columns (Columns of row labels) to be ignored. The breast cancer data has 1 row and 1 column with labels.
Select the delimiter separating the values in the file (File format).
Load the data with the "Load" button.
If the data is loaded successfully, its dimensions are displayed in the box below. Otherwise it shows the error message. Most often the error are caused by:
- Wrong choice of the delimited.
- Wrong choice of number of row/column labels.
- The file being open by Excel at the same time. Excel restricts access to open files.
If the dimensions of the data matrix are correct proceed to the next page with the "Next >" b button.

The third page of the wizard controls the data normalization.

The LAS model assumes the noise component of the data to have i.i.d. N(0,1) distribution. To make the data closer the model we can apply one of the normalization:

No normalization. Use if the data is already normalized.
Column standardization. Standardize each column independently (subtract mean, divide by standard deviation).
Tail transformation. Suppresses very large positive and negative values. It consists of 3 steps: Column standardization, sign(x)ln(1+|x|) transformation, second column standardization.

The program calculated the average column Kurtosis after second 2 normalizations and suggests the one with Kurtosis closer to 3.

The forth page of the wizard controls the parameters of biclustering.

"Search for biclusters with" option sets whether we are interested in biclusters with only large positive average, large negative average, or both.

The "Stopping Criteria" defines the maximum number of biclusters to produce and the "Score cut off". Once a bicluster with the score lower than the cut off is produced the search is stopped.

Each produced bicluster is the best among the "Number of iterations of the search algorithm per bicluster" found by the search procedure. The "Number of threads to use" sets the number of processor cores used by the program. To run LAS in background set it to at most 1 less than the maximum value (number of cores for your machine).

The last page of the wizard shows the progress of the bicluster search and controls the location and the type of output files.

The "Search Progress" shows the number of biclusters found so far. The checkbox "Save output in input folder" will fill the "Output folder" field for you with the path to the input file.

You do not have to wait for all biclusters to be found, the "Save Intermediate Results" button allows you to save the biclusters found up to the current moment.

Output Files Format

There are two formats for the output files: Binary (0/1) and Text. The names of output files are formed from the name of the input file:

Let the name of the input file be:

breast.txt

The Binary output consists of 3 files:

breast.txt.Labels.txt
breast.txt.Rows.binary.txt
breast.txt.Columns.binary.txt

The text output consists of 3 files:

breast.txt.Labels.txt
breast.txt.Rows.text.txt
breast.txt.Columns.text.txt

The common file "..Labels.txt" contains the summary information about biclusters. Each row contains information about a single bicluster, i.e. its size, average and score. For the breast cancer data it looks like this:

Red_1	bicluster of	1431	x	43	size with average	0.7551	and score	12894.1
Red_2	bicluster of	1520	x	29	size with average	0.8491	and score	11067.5
Red_3	bicluster of	1486	x	26	size with average	0.8061	and score	7803.8
…	…	…	…	…	…	…	…	…

Binary Output Format:

The binary output files "..Rows.binary.txt" and "..Columns.binary.txt" contain information about biclusters' row- and column-sets respectively. For a data matrix with m rows and n columns, each row of the "..Rows.binary.txt" file will contain m 0/1 values and each row of the "..Columns.binary.txt" file will contain n 0/1 values.

Let's consider a data matrix with 4 rows and 5 columns with the following output files.

..Rows.binary.txt file				..Columns.binary.txt file
0	1	1	0	1	0	1	1	0
1	0	1	0	0	0	0	1	1

The first row in both files keeps the information about the first bicluster (matching the first row in ..Labels.txt file). The rows and columns of the bicluster are indicated by 1s. So the first bicluster contains the 2^nd and 3^rd rows and 1^st, 3^rd and 4^th columns:

The data matrix with first bicluster:
	col1	col2	col3	col4	col5
row1	4	2	3	1	0
row2	6	7	5	9	4
row3	9	1	8	9	1
row4	0	2	4	2	1

Text Output Format:

As the binary output files, the text output files contain information about biclusters' rows ("..Rows.text.txt") and columns ("..Columns.text.txt"), one bicluster per row in each file. Namely, the k^th row in "..Rows.text.txt" file contains labels of all rows of the k^th bicluster and the k^th row in "..Columns.text.txt" file contains labels of all columns of the k^th bicluster. The biclusters described above by the binary files will be represented in the text files in the following way: