Columbia University
Department of Statistics

Statistical Genetics
Research Group

 
  Softwares

Available Methodologies:

  • For marker selection in case-control studies: BGTA Application (Updated on 12/01/2005)
  • Backward haplotype transmission association algorithm: BHTA Application (Updated on 03/01/2006)

BGTA Application System Document

This is the system document of BGTA Application. This document describes the general user interface of the application.

System requirement

- Windows XP Home/Professional Edition
- Microsoft .NET Framework 1.1
- Intel Pentium processor at 1.5GHZ, with at least 512MB RAM

Reference

Zheng T, Wang H, Lo SH (2005) Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs. Under revision.

BGTA program

Input data files: There are only two input files for BGTA method, one file for the affected (cases, af.txt) and one for the unaffected (controls, naf.txt). These two files must have the same number of markers but no necessarily the same number of individuals.

The two files have the same format as follows:
ID1, 0, 1, 2, 1, 2, ….
ID2, 0, 2, 2, 1. 2, …

The file should consist of exactly (number of markers plus 1) columns. The columns should be separated by commas. The first column is for ID numbers of individuals. And the remaining columns are genotypes at individual markers. The genotype should be coded using 0/1/2. No missing values can be processed by BGTA. Imputation needs to be done before BGTA is used.

Form fields

Users can also choose input data by individual files or by folders. If users choose data by individual files, they only need to select two input files, the af file and the naf file. And they also need to specify the output path of log files. If users choose input data by folders, they need to use the default file names (af.txt, naf.txt).

User need to input four parameters for BGTA method.

- “Simulation Number” field is the number of times BGTA random subset procedures should be run on this set of data. This is a way to dissect a large computational task into smaller blocks.
- “Handle Number” field is the size of random subset used by the BGTA algorithm.
- “Repeat Pick Number” is the number of repeated random subset BGTA screenings

After users input all the information and clicks "OK" button to continue, the information window will be displayed. If all the information is correct, user can click "OK" button to start simulations.

Output data files

There will be seven output files generated by the application.

- case-control-cluster-count.txt lists all return clusters, sorted by their frequencies.
- case-control-cluster-matrix.txt lists pair-wise joint returns of all markers.
- case-control-delete.txt lists the deletion sequence of each screening.
- case-control-freq.txt lists the total return frequencies of all markers.
- case-control-score-GTA.txt lists the sequence of GTA scores of the deleted markers during each screening.
- case-control-score-GTD.txt lists the GTD flow during each screening.
- log.txt: records all the input information. User can also check the running time of current simulation.

BHTA Application System Document

This is the system document of BHTA Application. This document describes the general user interface of the application.

System requirement

- Windows XP Home/Professional Edition
- Microsoft .NET Framework 1.1
- Intel Pentium processor at 1.5GHZ, with at least 512MB RAM

Reference

Lo SH, Zheng T (2002) Backward haplotype transmission association (BHTA) algorithm--a fast multiple-marker screening method. Human Heredity 53 (4):197-215.

BHTA program

Input data files: There are four input files for BHTA method. You can find the sample input data under application folder.

Form fields

Users can also choose input data by individual files or by folders. If users choose data by individual files, they need to select four input files, the father-transmission file, father-non-transmission file, the mother-transmission file, and the mother-non-transmission file. And they also need to specify the output path of log files. If users choose input data by folders, they need to input the file handler, such as *ther*.txt.

User need to input four parameters for BHTA method.

- “Total number of simulation” field is the number of times BHTA random subset procedures should be run on this set of data. This is a way to dissect a large computational task into smaller blocks.
- “Repeat number of each simulation” field is how many times this simulation will be repeated by this set of input data.
- “Marker number of data files” is the number of marker, used to verify the input data files.

After users input all the information and clicks "OK" button to continue, the information window will be displayed. If all the information is correct, user can click "OK" button to start simulations.

Output data files

There will be two output files generated by the application, brieflog.txt and detaillog.txt. If use run multiple simulations at same time, the application will generate multiple log files depends on the number they input.