Tuesday, April 26, 2011

What statistical analysis should I use?

The following table shows general guidelines for choosing a statistical analysis. We emphasize that these are general guidelines and should not be construed as hard and fast rules.  Usually your data could be analyzed in multiple ways, each of which could yield legitimate answers. The table below covers a number of common analyses and helps you choose among them based on the number of dependent variables (sometimes referred to as outcome variables), the nature of your independent variables (sometimes referred to as predictors).  You also want to consider the nature of your dependent variable, namely whether it is an interval variable, ordinal or categorical variable, and whether it is normally distributed (see What is the difference between categorical, ordinal and interval variables? for more information on this).  The table then shows one or more statistical tests commonly used given these types of variables (but not necessarily the only type of test that could be used) and links showing how to do such tests using SAS, Stata and SPSS.

Number of
Dependent
Variables
Nature of 
Independent
Variables
Test(s)
How to
SAS
How to
Stata
How to
SPSS
1
 0 IVs
(1 population)
interval & normal
one-sample t-test
SAS Stata SPSS
ordinal or interval
one-sample median
SAS Stata SPSS
categorical
 (2 categories)
binomial test
SAS Stata SPSS
categorical
 Chi-square goodness-of-fit
SAS Stata SPSS
 1 IV with 2 levels
(independent groups)
interval & normal
2 independent sample t-test
SAS Stata SPSS
 ordinal or interval
Wilcoxon-Mann Whitney test SAS Stata SPSS
 categorical
 Chi- square test
SAS Stata SPSS
Fisher's exact test SAS Stata SPSS
1 IV with 2 or more levels (independent groups)
interval & normal
one-way ANOVA
SAS Stata SPSS
ordinal or interval
Kruskal Wallis
SAS Stata SPSS
categorical
Chi- square test
SAS Stata SPSS
1 IV with 2 levels
(dependent/matched groups)
interval & normal
paired t-test 
SAS Stata SPSS
 ordinal or interval
Wilcoxon signed ranks test 
SAS Stata SPSS
 categorical
McNemar
SAS Stata SPSS
1 IV with 2 or more levels
(dependent/matched groups)
interval & normal
one-way repeated measures ANOVA
SAS Stata SPSS
ordinal or interval
Friedman test
SAS Stata SPSS
categorical
repeated measures logistic regression
SASStataSPSS
2 or more IVs
(independent groups)
interval & normal
factorial ANOVA
SAS Stata SPSS
ordinal or interval
ordered logistic regression
SASStataSPSS
categorical
factorial
logistic regression
SAS Stata SPSS
1 interval IV
interval & normal
correlation 
SAS Stata SPSS
simple linear regression
SAS Stata SPSS
ordinal or interval
 non-parametric correlation
SAS Stata SPSS
categorical
simple logistic regression
SAS Stata SPSS
1 or more interval IVs and/or
1 or more categorical IVs
interval & normal
multiple regression
SAS Stata SPSS
analysis of covarianceSAS Stata SPSS
categorical
multiple logistic regression
SAS Stata SPSS
discriminant analysis SAS Stata SPSS
2 or more
1 IV with 2 or more levels
(independent groups)
interval & normal one-way MANOVA SAS Stata SPSS
2 or more
2 or more
interval & normal
multivariate multiple linear regression
SAS Stata SPSS
2 sets of 
2 or more
0
interval & normal
canonical correlation
SAS Stata SPSS
2 or more
0
interval & normal
factor analysis
SAS Stata SPSS
Number of
Dependent
Variables
Nature of 
Independent
Variables
Test(s)
How to
SAS
How to
Stata
How to
SPSS

This page was adapted from Choosing the Correct Statistic developed by James D. Leeper, Ph.D.  We thank Professor Leeper for permission to adapt and distribute this page from our site.

Thursday, April 7, 2011

Scripts and Macros

SPSS Syntax Files -- a large collection of SPSS routines for randomized study design , sampling strategies, meta-analysis, sanple size for confidence intervals, correlation tests, psychometry and other areas. The documentation is in Portuguese, but the scripts are usable as-is. You can have AltaVista automatically translate the page into English by going here, but do not use the "translated" scripts! The author has recently added two additional sections (in English) -- one for Dyadic Data Analysis, and one for Simple and Complex Random Assignment for Experimental Designs

Link-King -- a SAS program to detect duplicate entries in a file, or to link matching records in two files, based on criteria like names (first, middle, last, maiden, nickname), date of birth, gender, and social security number. A graphical interface, a “Link King for Knaves” feature, and a powerful interface for manually reviewing uncertain matches make it easy to use. It features both probabilistic and deterministic record linkage algorithms, phonetic name matching (NYSIIS and Soundex), and many other features for dealing with "mushy matches".

Software Packages Manuals and Books

  • Vijay Gupta has authored a number of excellent books, including SPSS for Beginners, Interpreting Regression Output, Comprehensive Excel, Excel for Beginners, Charting in Excel, Excel - Beyond the Basics, Managing & Tabulating Data in Excel, Statistical Analysis with Excel, and Financial Analysis Using Excel. You can purchase and download these books from his web site, and you can download many sections of  the individual books for free.


  • A textbook on evaluation, statistics, and measurement, developed by Bill Miller of Iowa State U, in conjunction with his free OpenStat statistical package. The software and textbook are both available for free download from his web site.


  • Data Analysis with Epi Info -- an online textbook by Bud Gerstman. Describes analyses for continuous or binary outcomes; single group, paired samples, or two or more independent groups; one or more continuous predictors; stratified tables (with confounding and interaction); and much more.


  • Curvefit.com -- a complete guide to nonlinear regression. Most of the information here is excerpted from Analyzing Data with GraphPad Prism, a book that accompanies the program GraphPad Prism. You can download this book as a pdf file.


  • Graphpad web site of statistical resources -- short articles, book chapters, bibliographies, and (commercial) software. Well-written, down-to-earth, and helpful.


  • The Instat Guide to Choosing and Interpreting Statistical Tests, from Graphpad.


  • Electronic Statistics Textbook (by StatSoft) -- very extensive and well-organized (can also be downloaded for quicker access from your hard drive)
  • Other Links

  • Gene Shackman's page of links to free software packages. Contains sections for Statistical software, CDC/Census Bureau software, R, Other software, Lists of free stat software, Statistics with Excel, Mapping/GIS software, Non-statistical (but still useful) software, Office Suites (word processors -- stand-alone or web-based), Spreadsheets, Databases, Graphics, Web browsers / FTP clients, SUrvey software, Security software, and Miscellaneous.


  • Citizendium's online article about free statistical software -- lots of links to free packages, but also other material  about free stats software -- a brief history, reviews, advice about using the packages, and limitations of the packages.


  • Links to Econometric Software (and lots of other general packages), maintained by The Econometrics Journal


  • StatLib -- Software Archive, including Fortran source listings of hundreds of statistical and mathematical algorithms.
  • Additonal Packages

    GrafProg -- a Windows graphing program design, copy and save graphs generated by functions or by spreadsheet; also includes some statistical graphing processes.

    StudioLine Photo Basic -- Photo editing software from H&M Software. Add descriptions to images, re-size photos for efficient e-mail transmission, print high-quality copies, display slide-shows, publish web-galleries, safe-keep images on CD or DVD. Version 2.2 has a new user interface, dual-monitor support, increased speed and other technical improvements. SmartUpdate feature checks for new versions. Has a web-board for user-to-user help.

    WAFO -- Wave Analysis for Fatigue and Oceanography. A toolbox of Matlab (ver. 5.x / 6.x, for Windows & Unix) routines for statistical analysis and simulation of random waves and random loads. Tools are provided for analysis of measured data with routines for estimation of parameters in statistical distributions, estimation of spectra, plotting in probability papers, etc. Has routines for theoretical distributions of characteristic wave parameters from observed or theoretical power spectra of the sea. Another part is related to statistical analysis of fatigue. The theoretical density of rainflow cycles can be computed from parameters of random loads. Has routines is included for modelling of switching loads (hidden Markov models). Also contains general statistical tools.

    Sampling SIM: Downloadable program (for Mac or Windows) to explore sampling distributions of sample means and proportions. It provides separate windows for building population distributions, drawing and viewing random samples from the population, exploring the behavior of sampling distributions of sample means, and exploring the behavior of confidence intervals.

    First Bayes -- a free, easy-to-use Windows application for elementary Bayesian Statistics. Performs most standard, elementary Bayesian analyses, including: plotting and summarizing distributions, defining and examining arbitrary mixtures of distributions, analysis of two kinds of linear model (one or more normal samples with common but unknown variance, and simple linear regression), examination of marginal distributions for arbitrary linear combinations of the location parameters, and the generation of predictive distributions.

    IND -- Creation and manipulation of decision trees from data.  For supervised classification and prediction in artificial intelligence and statistical pattern recognition. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which (hopefully) has good prediction of classes on new data. IND improves on standard algorithms and introduces Bayesian and MML methods, producing more accurate class probability estimates that are important in applications like diagnosis. For UNIX systems. Currently available only in beta-test mode, and only to US citizens.

    MANET -- ("Missings Are Now Equally Treated") Macintosh software for interactive graphics tools for data sets with missing values. Generates missing values chart, histograms & barcharts, boxplots & dotplots, scatterplots, mosaic plots, polygon plots, highlighted boxplots, interactive trellis displays, traces, context-sensitive interrogation, cues, redframing, selection sequences.

    Text-Stat -- Free Windows program that analyzes ASCI/ANSI texts and HTML files (directly from the internet) and produces word frequency lists and "concordances" (sorted key-word-in-context listings). Can traverse an entire web site, acquiring pages for analysis.

    DeltaStat -- performs statistical calculations on data from 2D gel experiments quantified in Delta2D. Makes use of R and MySQL to perform much faster than the functions provided in Delta2D. Currently provides two sample t-test, a highly configurable database query, multiple analyses per query to analyze proteins that have both higher and lower expression in control versus experimental groups, and support for experiments with variable numbers of control and experimental replicates.

    Numerous statistical packages from companies acquired by SPSS Corp. Most of these demonstration versions expire after 30 days, and some have other limitations. Available products include:
    • allCLEAR versions 3.5 and 4.5 (PC)
    • GOLDMineR (PC)
    • DeltaGraph (Macintosh )
    • LogXact 2.1 (PC)
    • PeakFit 4.06 (PC)
    • QI Analyst 3.5 (PC)
    • Remark Office OMR 3.0 (PC)
    • SamplePower 1.2 (PC)
    • SigmaGel 1.0 (PC)
    • SPSS Diamond (PC)
    • SigmaPlot 4.0 (PC)
    • SigmaScan Pro 4.0 (PC)
    • SigmaStat 2.0 (PC)
    • SmartViewer (PC)
    • StatXact 3.1 (PC)
    • SYSTAT 7.0 (PC)
    • TableCurve 2D 4.07 (PC)
    • TableCurve 3D 3.01 (PC)
    • WesVar Complex Samples (PC)
    A large number of software demos are available for downloading from the website of SciencePlus, a distributor of scientific and related software (both full commercial packages and specialist academic tools). The list includes: ACTIV STATS, AGREE, AMOS, AQUAD, BIOFEEDBACK, BOJA, CADEMO, CART, CONTEST, CORWIN, DATADESK, DATA ENGINE, DBMS/Copy, EASYPLOT, EDWIN, ELI, E_PRIME, EQS, EQUITY, EQUIVTEST, ERTS, ERTSLAB, EXAMINER, EXPERT CHOICE, FASTTEST, GB-STAT, GETAREF, GLIMMIX, GOMAP, HIVIEW, HLM, ITEMAN, KWALITAN, LISREL, LPCM, MAPLE V, MELLAB, MEL 2, MICROCAT, MINITAB, MUDFOLD, NQUERY ADV., NSDSTAT+, OBSEVER, PARELLA, PEAKFIT, PLCA, POLYANALYST, RASCAL, REHACOM, SCRUTINY, SIGMA PLOT, SIGMA SCAN PRO, SIGMA STAT, SOLAS, STATISTICA, STRAD, STREAMS, SUPERLAB PRO, SUPERLAB LT, SYSTAT, TABLECURVE 2D/3D, TEXTANALYST, T-RASCH, TRIQ, UNISTAT, Vienna Test System, WINMIRA, WINROSA, XCALIBRE

    Advanced Grapher (formerly called Serpik Graph) -- a very sophisticated function graphing program - can also plot tables and perform regression. A 30-day full-functioned trial version can be downloaded.

    CoPlot 6.2 -- for publication-quality 2D and 3D scientific graphs (from data and equations), maps, and technical drawings. From CoHort Software. Creates precise technical drawings using drawing objects, genetic maps, field maps, flow charts, apparatus diagrams, circuit diagrams, chemical structures, etc. Text in drawing objects and graphs can include HTML-like text formatting tags and over 1000 special characters. Supports animated graphs. Exports graphs to .eps, .gif, .jpg, .pdf, .png, .svg, .wmf, and others. Has an auto-recorder and macro programming language. Invoke CoPlot from the command line, batch files, shell scripts, pipes, and other programs. Can be used as a graphics server program on a web site. Free time-limited demo version available.

    Programming Languages Softwares

    MuPAD -- a very powerful and general computerized algebra system, developed at the University of Paderborn, now distributed by SciFace Software. In the same category as Mathematica and Maple, it does numerical calculations, symbolic manipulation (algebra, differentiation & integration), graphing, and programming. A free "lite" (but still very powerful) version for PC and Mac can be downloaded.

    Statistics101 -- executes programs written in the easy-to-learn Resampling Stats statistical simulation language. You write a short, simple program in the language, describing the process behind a probability or statistics problem. Statistics101 then executes your Resampling Stats model thousands of times, each time with different random numbers or samples, keeping track of the results. When the program completes, you have your answer. Runs on Windows, Mac, Lunux -- any system that supports Java.

    R -- a programming language and environment for statistical computing and graphics. Similar to S or S-plus (will run most S code unchanged). Available for Windows, various Unix flavors (including Linux), NextStep and Mac. Provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. The R environment includes:
    • an effective data handling and storage facility,
    • a suite of operators for calculations on arrays, in particular matrices,
    • a large, coherent, integrated collection of intermediate tools for data analysis,
    • graphical facilities for data analysis and display either on-screen or on hardcopy, and
    • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
    Zelig -- an add-on for R that can estimate, help interpret, and present the results of a large range of statistical methods. It translates hard-to-interpret coefficients into quantities of interest; combines multiply imputed data sets to deal with missing data; automates bootstrapping for all models; uses sophisticated nonparametric matching commands which improve parametric procedures; allows one-line commands to run analyses in all designated strata; automates the creation of replication data files so that you (or anyone else) can replicate the results of your analyses (hence satisfying the replication standard); makes it easy to evaluate counterfactuals; and allows conditional population and superpopulation inferences. It includes many specific methods, based on likelihood, frequentist, Bayesian, robust Bayesian, and nonparametric theories of inference. Zelig comes with detailed, self-contained documentation that minimizes startup costs for Zelig and R, automates graphics and summaries for all models, and, with only three simple commands required, generally makes the power of R accessible for all users. Zelig also works well for teaching, and is designed so that scholars can use the same program with students that they use for their research.

    Apophenia -- a statistics library for C. It provides functions on the same level as those of the typical stats package (OLS, probit, singular value decomposition, &c.) but doesn't tie the user to an ad hoc language or environment.

    Octave -- a high-level mathematical programming language, similar to MATLAB, for numerical computations -- solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations. It is easily extensible and customizable via user-defined functions written in Octave's own language, or using dynamically loaded modules written in C++, C, Fortran, or other languages. Runs under Linux and Windows.

    J -- a modern, high-level, general-purpose, high-performance programming language. Runs on Windows, Unix, Mac, and PocketPC handhelds. J runs both as a GUI and in a console (command line). Much like APL, but uses "conventional" symbols, rather than APL's a specialized character set. J is particularly strong in the mathematical, statistical, and logical analysis of arrays of data. J systems have:
    • an integrated development environment
    • standard libraries, utilities, and packages
    • a form designer for your application forms
    • an event-driven graphical user interface to your application
    • interfaces with other programming languages and applications
    • integrated 2d and 3d graphics
    • memory mapped files for high performance data applications
    Matvec -- an object oriented programming language with extensive statistical capabilities. Can handle problems ranging from matrix and vector manipulation to the analysis of linear and generalized linear mixed models. Runs in Linux and Windows environments; has a command-line (non-GUI) user interface, and a strong "Unix-like" flavor.

    mle - Maximum Likelihood Estimation -- a simple programming language for building and estimating parameters of likelihood models. Originally designed for survival models, but the language has evolved into a general-purpose tool for building and estimating  general likelihood models. Available for Windows and Linux; also provides User Manual, Reference Manual, and Quick Reference Card.

    Ox -- an object-oriented matrix programming language with a comprehensive mathematical and statistical function library. Matrices can be used directly in expressions, for example to multiply two matrices, or to invert a matrix. The major features of Ox are its speed, extensive library, and well-designed syntax, which leads to programs which are easier to maintain. Versions of Ox are available for many platforms. The "Console" version can be freely downloaded for academic and research use; the "Professional" version must be purchased.

    Mx  -- a matrix algebra interpreter and numerical optimizer for exploration of matrix algebra. Many built-in fit fuctions for structural equation modeling and other statistical modeling. Has fitting fuctions like those in LISREL, LISCOMP, EQS and CALIS, along with facilities for maximum likelihood estimation of parameters from missing data structures, under normal theory. Users can easily specify complex 'nonstandard' models, define their own fit functions, and perform optimization subject to linear and nonlinear equality or boundary constraints.

    JDB -- Relational Database and Elementary Statistics for a Unix environment. Useful for manipulating experimental data (joining files, cleaning data, reformatting for input into other programs). Computes basic statistics (mean, std. dev., confidence intervals, quartiles, n-tiles, percentiles, histograms, correlations, z-scores, t-scores.

    B/D -- an interactive programming language for a priori and diagnostic analyses of Bayes linear statistical problems (subjective statistical analyses based on expectation and covariance structures, rather than on distributional assumptions). Quickly and easily specify beliefs about quantities of interest, attach data to some or all of those quantities, and carry out the general process of Bayes linear adjustment. Produces interactive Bayes linear influence diagrams for the adjustments, providing simple graphical summaries of the adjustments and accompanying diagnostics.

    MacANOVA -- comprehensive statistical package for the Mac and PC/Windows. MacAnova has macros which are used just like functions. Several macros are built in, and three files of additional macros (general, time series, design of experiments) are distributed with MacAnova. Like S, MacAnova is a programming language with for and while loops, if, else, elseif, break, and a full range of operations including bit manipulation.

    Lisp-Stat  -- an extensible statistical computing environment for data analysis, statistical instruction and research, and  for exploring the use of dynamic graphical methods. Based on an extended subset of Common Lisp, performs element-wise operations on lists and vectors, and adds a variety of basic statistical and linear algebra functions. Graphics system is object-oriented, and can be customized and adapted. Supports linear and nonlinear regression models and generalized linear models. Runs on Mac, X-window (UNIX), and MS Windows.

    Resampling Stats -- a different approach to learning statistics and performing statistical analyses, using simulation with random numbers instead of complex mathematics. 30-day trial version available for Win 95/NT.

    O-Matrix -- an extensive matrix manipulation system (for Windows) with lots of statistical capability. The "Light" version can be freely downloaded. Some capabilities include:
    • Matrix Functions: determinant, eigenvalues and eigenvectors, systems of equations
    • Statistics: minimum, maximum, mean, median, standard deviation, linear regression, correlation, covariance, sorting, t-distributions, f-distributions, probability, normal distributions, population simulations, Kolmogorov-Smirnov Test
    • Optimization: linear & nonlinear least squares, with and without box constraints and with or without derivatives, quadratic and general nonlinear programming, linear complementarity problems
    • Random Simulations: uniform and normal random number generators, auto-regressive process simulation
    • Special Functions: error, gamma, incomplete beta, Y and J Bessel
    • Also: quadrature, differential equations, Fourier analysis, spectral estimation, convolution, FFT, Interpolation, filtering, Kalman-Bucy filtering, wWavelets: Haar and Daubechies transforms, polynomials, and general functions (trig, hyp, inv trig &  hyp, exp, log, roots, forward & backward difference approximations to the derivatives of vector-valued functions
    Also provides extensive plotting capabilities, with multiple windows, axis scaling & labeling, titling, free-form text, selectable fonts. Plots exportable to word processors, spreadsheets, etc. Plot Types: line, contour, surface, mesh, bar, stair, polar, vector, error bar, smith charts, and histogram; line plots can contain unlimited points per curve and hundreds of curves per plot; two- and three-dimensional plotting is supported which provides additional flexibility with contours and surface plots; multiple colors, markers, and line types.

    Excel Spreadsheets and Additional

    PopTools -- Windows DLL for Excel 97 and 2000 (PC's only). Facilitates analysis of matrix population models & simulation of stochastic processes. Adds a new menu item and installs many powerful functions: matrix decompositions (Cholesky, QR, singular values, LU), eigenanalysis (eigenvalues and eigenvectors of square matrices) and formulas for generation of random variables (Normal, binomial, gamma, exponential, Poisson, logNormal). Also has routines for iterating spreadsheets to run Monte Carlo simulations, conduct randomisation tests (including the Mantel test) and calculate bootstrap statistics. Some facilities for maximum-likelihood parameter estimation, and some other generally useful functions. Free download from website, which also has documentation, examples, and related links.

    SimulAr -- Provides a very elegant point-and-click graphical interface that makes it easy to generate random variables (correlated or uncorrelated) from twenty different distributions, run Monte-Carlo simulations, and generate extensive tabulations and elegant graphical displays of the results.

    EZAnalyze -- enhances Excel (Mac and PC) by adding "point and click" functionality for analyzing data and creating graphs (no formula entry required). Does all basic "descriptive statistics" (mean, median, standard deviation, and range), and "disaggregates" data (breaks it down by categories), with results shown as tables or disaggregation graphs". Advanced features: correlation; one-sample, independent samples, and paired samples t-tests; chi square; and single factor ANOVA.

    Update Available! The latest version can create z-scores, percentile ranks, and random numbers as new variables; has repeated-measures ANOVA; does simple post hoc tests for single factor and repeated-measures ANOVA; can graph multiple variables on a single graph, and can add error bars for +/- 2 SD’s; adds the sum function to the disaggregate and descriptive statistics functions, and the mode function to descriptive stats; adds delete sheets; adds English & Spanish language options, and works better in international environments; incorporates various bug fixes; and contains an updated user manual.

    EZ-R Stats -- supports a variety of analytical techniques, such as: Benford's law, univariate stats, cross-tabs, histograms. Also supports databases such as mySQL, SQLite, MS-Access, MS-SQL. Simplifies the analysis of large volumes of data, enhances audit planning by better characterizing data, identifies potential audit exceptions and facilitates reporting and analysis. This language is a Computer Assisted Audit Technique (CAAT) in support of COSO, SAS 78, SAS 99 and analysis required by Sarbanes-Oxley.

    SSC-Stat -- an Excel add-in designed to strengthen those areas where the spreadsheet package is already strong, principally in the areas of data management, graphics and descriptive statistics. SSC-Stat is especially useful for datasets in which there are columns indicating different groups. Menu features within SSC-Stat can:
    • help users manipulate their data (stacking, unstacking columns, 2-way unstacking, lookups, generating factors, etc.);
    • generate good graphs (X-Y Scatter Plot, Category-Value Plot, Boxplot, Normal Probability Plot, Density Estimate), that can be edited and polished like any other Excel graph ;
    • provide basic statistical analysis (descriptive statistics, summary statistics, 1- and 2-sample t tests, 1- and 2-sample tests of proportion).
    22 Distribution Functions -- There is one spreadsheet for each of the following distribution functions: Beta, Binomial, Chi-Square, Discrete Uniform, Gamma, Geometric, Hypergeometric, Multivariate Hypergeometric, Laplace, Logistic, Multinomial, Negative Binomial, Normal, Bivariate Normal, Log-normal, Pareto, Poisson, Rectangular, Snedecor F, Student-t, Triangular, and Weibull. Each spreadsheet gives a graph of the distribution, along with the value of various parameters, for whatever shape and scale parameters you specify. You can also download a ZIP file containing all 22 spreadsheets.

    Sample-size calculator for cluster randomized controlled trials, which are used when the outcomes are not completely independent of each other. This independence assumption is violated in cluster randomized trials because subjects within any one cluster are more likely to respond in a similar manner. A measure of this similarity is known as the intra-correlation coefficient (ICC). Because of the lack of independence, sample sizes have to be increased. This web site contains two tools to aid the design of cluster trials – a database of ICCs and a sample size calculator (along with instruction manuals).


    DAG_Stat -- calculates an enormous number of quantities from a 2 -by-2 table:
    • for diagnostic tests: sensitivity, sensitivity of a random test given the observed prevalence and test level., sensitivity quality index, specificity, specificity of a random test, specificity quality index, efficiency (the correct classification rate), efficiency of a random test, quality index, Youden's index, the predictive value of positive test, predictive value of a positive random test, predictive value of negative test, predictive. value of a negative random test, likelihood ratio of a positive and negative tests, the odds ratio, false positive and false negative rates, prevalence observed in the sample and test level (proportion of subjects classified as 'positive.'
    • for interrater agreement: Cohen's Kappa, observed agreement, chance agreement, agreement about positive and negative cases, Byrt's bias index, Byrt's prevalence asymmetry index, bias adjusted Kappa, prevalence & bias adjusted Kappa. DAG_Stat also calculates Dice's index, Yule's Q (Gamma), Phi, Scott's agreement index, the tetrachoric correlation coefficient, Goodman & Kruskal's tau, Lambda, the Uncertainty Coefficient, Pearson's Chi Square (with and without Yates' correction), the likelihood ratio Chi Square, McNemar's Test, (with and without Yates' correction).
    MIX (Meta-analysis with Interactive eXplanations) -- a statistical add-in for Excel 2000 or later (Windows only). Ideal for learning meta-analysis (reproduces the data, calculations, and graphs of virtually all data sets from the most authoritative meta-analysis books, and lets you analyze your own data "by the book"). Handles datasets with dichotomous & continuous outcomes; calculates Risk Diff, RR, OR, Mean Diff, Hedges's g, Cohen's d; performs standard & cumulative meta-analysis with CI ,z & p; fixed and random effects modeling; Cochran's Q with p-value; Higgins's I2 and H with CI; and publication bias tests: Rank correlation (tau-b) test with z & p, Egger's and Macaskill's regression tests with CI, and Trim-and-Fill. Generates numerous plots: tandard and cumulative forest, p-value function, four funnel types, several funnel regression types, exclusion sensitivity, Galbraith, L'Abbe, Baujat, modeling sensitivity, and Trim-and-Fill.

    OZGRID -- contains over 4000 pages (and growing) of information on Excel and VBA for Excel. Many add-on's are for sale, but there is also an enormous amount of totally free content: downloads, a free 24/7 question and answer support forum for MS Office, a free Excel monthly newsletter full of detailed tips, tricks, hacks and more for Excel and VBA.

    Spreadsheet123 -- a collection of over 70 free Excel spreadsheets. (These will also run under an almost-free Excel-like program, Spreadsheet Software Developer.) Spreadsheets include: capital budgeting, acquisition/buyout, company valuation, risk analysis, FCFE and FCFF, lease or buy a car, NPV & IRR, cash flow, capital structure, stock & bond valuation, financial projections, risk analysis, foreign market exchange, income statement what-if analysis, historical & pro-forma financial statements, template for assessing risk of information technology and data warehousing, IPO timeline, Malcolm Baldrige quality model, and risk return optimization, among many others.

    Very-high-precision Statistical Probability Functions -- Provides double-precision (16 significant figures) mass , density, cumulative, inverse probability distributions, critical values, and confidence bounds for the geometric, negative binomial, binomial, Poisson, hypergeometric, negative hypergeometric, exponential, normal, chi-square, gamma, Student t, Fisher F and beta; non-central gamma, chi-square, beta, t and F; and the mixed Gamma-Poisson, Beta-Binomial, and Beta-Negative-binomial distributions. The routines are programmed in VBA, embedded within an Excel spreadsheet that illustrates the usage of each of them.

    DE Histograms -- an Excel add-in that provides comprehensive descriptives stats, histograms, outlier detection, normality testing, and much more.

    Exact confidence intervals for samples from the Binomial and Poisson distributions -- an Excel spreadsheet with several built-in functions for calculating probabilities and confidence intervals. (42k long).

    BiPlot -- by Ilya Lipkovich and Eric P. Smith, of Virginia Tech. A user-friendly add-in for Excel to draw a biplot display (a graph of row and column markers from data that forms a two-way table) based on results from principal components analysis, correspondence analysis, canonical discriminant analysis, metric multidimensional scaling, redundancy analysis, canonical correlation analysis or canonical correspondence analysis. Allows for a variety of transformations of the data prior to the singular value decomposition and scaling of the markers following the decomposition.

    Statistical Process Control (SPC) and Reliability spreadsheets from John Zorich's web site -- designed to simplify activities in Production and R&D. Formally validated to be "GMP" and "Part 11" compliant . Free spreadsheets include:
    • Self-made Sampling Plans -- Examine the OC curves for your own custom sampling plans. Use either binomial or hypergeometric calculations. Now be able to explain the "valid statistical rationale" of the sampling plans you already use.
    • Sequential Sampling Plans -- Provides an analysis and planning tool for sample sizes in situations where lots undergo sequential inspections (e.g., 1st by Manufacturing, 2nd by QC, and finally by QA).
    Lifetable -- does a full abridged current life table analysis to obtain the life expectancy of a population. Furthermore, one can calculate Potential Gains in Life Expectancy (PGLE) after removing cause k, considering competing causes of death; the (Premature) Years of Potential Life Lost (YPLL), this is the number of person years added to the total number of person years lived in a population if cause of death k would be removed; the Standardized Mortality Ratio (SMR), standardized numbers per 100,000 and the Comparative Mortality Figure (CMF) can also be calculated. From the Downloads section of the QuantitativeSkills web site.

    Intracorrelation -- does intra correlation calculations for dichotomous or binary yes/no type outcome variables according to two different methods proposed for the single cluster one by Fleiss and another one by Bennett et.al. A third spreadsheet concerns a method for two clusters by Donner and Klar. You will have to insert your own data by overwriting the tables in the second (total number of positive responses) and third (total number of negative responses) or fourth column (total number). From the Downloads section of the QuantitativeSkills web site.

    Weighted Least Squares Linear Fits -- an Excel add-in from Philip Kromer (Univ. of Texas)

    XLMathematics -- A set of Excel (Ver 5+) for mathematical computations: graphing , calculus (computing limits, computing and graphing derivatives and/or tangent lines, evaluating integrals using various techniques), Linear algebra (Gauss-Jordan elimination, allowing step-by-step views).

    Analyse-it -- includes over 30 parametric & non-parametric statistical functions, including multiple linear regression analysis, ANOVA, & chi-square statistics. A separate specialized package for clinical method evaluation provides NCCLS and IFCC procedures for accuracy & imprecision.

    Statistical Process Control (SPC) and Reliability spreadsheets from John Zorich's web site -- designed to simplify activities in Production and R&D. Formally validated to be "GMP" and "Part 11" compliant . Demo's of spreadsheets include:
    • Variables Data SPC -- XbarR, XbarS, XmR, histograms, capability indices, preformatted customizable printable report. Automaticly identify out-of-control points.
    • Count Data SPC  -- P and U SPC charts, pareto chart, preformatted customizable printable report. Sutomaticly identify out-of-control points.
    • Reliability Statistics Basics -- component reliability using K-factors, stress/strength analysis, failure analyses, for "normally distributed" and unknown distributions. Stress / strength formula has been modified to allow input of a "confidence" level, if desired.
    • Reliability Plotting -- Component reliability using "Reliability plotting" ("probability plotting", "rectification", etc.). Can confirm normality, or can identify normalizing transformation.
    • Power Curves for t-Tests  -- Power vs. Sample Size, Power vs. Hypothesized Difference, Power vs. Alpha, and Power vs. Population SD.
    • Statistical Analysis of Gages -- for quantifying measurement uncertainty. Methods include Gage R&R (up to 3 persons, 3 gages, 3 replicates, and 10 parts), Gage Correlation (up to 3 gages), Gage Bias, Gage Linearity, Spec/Inaccuracy Ratios, and Guardbanding..
    • C = 0 Sampling Plans -- two types of OC curves, and AOQL for chosen plan. Calculates the exact absolute smallest sample size that gives the desired protection level for a given exact size lot (up to 1000).
    XLStatistics -- a set of Excel (ver 5+) workbooks for statistical analysis of data. A step-by-step guide to data analysis with separate workbooks for handling data with different numbers and types of variables. Contains most standard analyses, analyses using only summary data, power / sample size , nonparametrics, curve fitting , non-linear regression, analysis for 2x2 tables. XLStatistics is not an Excel add-in and all the working and code is visible. A free version for analysis of 1- and 2-variable data is available.

    Surveys, Testing, and Measurement softwares

    CCOUNT -- a package for market research data cleaning, manipulation, cross tabulation and data analysis. Similar to, and uses the same syntax as, SPSS-MR "Quantum", a well known commercial package for processing market research data. Available for Windows, Linux, and SunOS. C++ source code also available, under the GNU General Public Licerse.

    ProtoGenie -- a free extensible web-based environment for research design and data collection for surveys, experiments, clinical trials, time series, cognitive and vision research, and methods courses. Lets you specify groups and define measurement and treatment events and their sequencing. The goal is to let users move smoothly from research design and data collection to interim and final statistical analysis.

    GGUM2004 (Item Response Theory Models for Unfolding) -- a Windows-based program that estimates parameters in the generalized graded unfolding model (GGUM; Roberts, Donoghue, & Laughlin, 2000). Has a user-friendly interface to prepare command files, run the core estimation program, and display results. Allows different questionnaire items to have varying numbers of response categories (useful when sparse responses require recoding into fewer response categories. Handles sporadically missing responses. Provides item fit statistics and diagnostic graphics of performance.

    Rasch Measurement Software -- deals with the various nuances of constructing optimal rating scales from a number of (usually) dichotomous measurements, such as responses to questions in a survey or test. Several free student/demo software packages are available. These may be freely downloaded, used, and distributed, and they do not expire. They are:
    • BIGSTEPS -- a DOS-based precursor to the Windows-based WINSTEPS Rasch measurements program.
    • MINISTEP -- a free evaluation/student version of WINSTEPS. It has complete WINSTEPS functionality, but is limited to 25 items and 100 persons (cases).
    • MINIFAC -- a free evaluation/student version of FACETS (Many-Facet Rasch Analysis). Contains all features except limited to 2,000 data points (responses).
    Q-Method -- a statistical program for analyzing data from the Q-Sort Technique. Enter data (Q-Sorts) the way they are collected, i.e. as 'piles' of statement numbers. It computes intercorrelations among Q-Sorts, which are then factor-analysed with the Centroid (or, alternatively, PCA) method. Resulting factors can be rotated either analytically (Varimax), or judgmentally with the help of two-dimensional plots. Finally, after selecting the relevant factors and 'flagging' the entries that define the factors, the analysis step produces an extensive report with a variety of tables on factor loadings, statement factor scores, discriminating statements for each of the factors as well as consensus statements across factors, etc.

    AnSWR -- Analysis Software for Word-based Records -- a free software system from the C.D.C. for coordinating and conducting large-scale, team-based analysis projects that integrate qualitative and quantitative techniques (for Windows).

    ez-text -- a software program from the C.D.C. developed to assist researchers create, manage, and analyze semi-structured qualitative databases.

    CSPro (Census and Survey Processing System) -- a public-domain software package for entering, tabulating and mapping census and survey data.

    IMPS (Integrated Microcomputer Processing System) -- performs the major tasks in survey and census data processing: data entry, data editing, tabulation, data dissemination, statistical analysis and data capture control. (from CDC)

    WebQ -- a set of HTML files for performing Q-Sorts online and collecting the data for subsequent analysis.

    Stats -- Windows program for several commonly-needed statistical functions for marketing researchers: random numbers;sample sizes needed for surveys; mean, standard deviation, standard error and range for keyboard-entered data; standard error of a proportion; significance testing between two percentages from independent samples; significance between two percentages from dependent samples; significance testing between two averages from independent samples; contingency table analysis (i.e., Chi-Square)

    SABRE -- for the statistical analysis of multi-process random effect response data. Responses can be binary, ordinal, count and linear recurrent events; response sequences can be of different types. Such multi-process data is common in many research areas, e.g. the analysis of work and life histories. Sabre has been used intensively on many longitudinal datasets surveys either with recurrent information collected over time or with a clustered sampling scheme. 

    POSDEM -- Uses simulation techniques to analyze and compare alternate sampling strategies for surveys. Performs power / sample size / precision analyses for different sampling methods: systematic, stratified, random, etc. Windows versions available in Spanish and English.

    WISC-III Profile Calculator for Macintosh and Windows -- uses generalized distance method to determine if the subtest profile of a single case is multivariately unusual or common in comparison to subtest clusters found in the WISC-III standardization sample. (Mac, 360K; Win anticipated in September)

    DEMETRA -- user-friendly interface to TRAMO/SEATS and X-12-ARIMA .

    Sociological Insights -- displays statistical information in an easy-to-use format, designed for teaching quantitative sociological reasoning. It uses aggregate data from the 50 U.S. states to teach the principles of distribution, correlation, and regression. It uses questionnaire data from the 2000 and 1994 General Social Surveys to teach distribution and cross-tabulation. The States module has 289 variables in all. The Survey module displays 249 variables from the 2000 GSS, plus (as a separate data set) 113 variables from the 1994 GSS.

    ConTEST -- a decision support system for assembly of educational and psychological tests from item banks.

    MUDFOLD (Multiple UniDimensional unFOLDing) -- for analyzing proximity data (e.g., attitudes, preferences, or choices) with the Coombsian unfolding model.

    WINMIRA -- Latent Class Analysis (LCA), the Rasch model (RM), and the Mixed Rasch model (MRM) and Hybrid models (HYBRID).

    T-Rasch -- exact or non-parametric tests for the Rasch model.

    LPCM-WIN -- a menu-driven program to apply ‘Linear Partial Credit Models’ in item analysis and measurement of change.

    Kwalitan -- for analysis of qualitative data, such as protocols of interviews, articles, and annual reports.

    StatPac Survey Software -- to design andimplement surveys, and to acquire, manage and analyze data from surveys. Supports multiply data types and question formats, multi-language spell-checking, large files (2,000 variables & 10,000,000 cases, basic statistics (crosstab & banner tables) & graphics, automatic coding of text responses, and data import / export capabilities. Optional Web Survey Module and Advanced Statistics Module (curve fitting, multiple regression, logistic regression, factor, analysis of variance, discriminant function, cluster, and canonical correlation). A demo version is available (limited to 35 cases).

    NewMDSX -- software for Multidimensional Scaling (MDS), a term that refers to a family of models where the structure in a set of data is represented graphically by the relationships between a set of points in a space. MDS can be used on a variety of data, using different models and allowing different assumptions about the level of measurement. This site offers a free month trial of the Windows version; a completely free copy of constituent programs, notes, documentation, test Input& Output in MS-DOS; a not-for-profit full Windows copy priced at cost; and a site with a range of data, cross-reference, & information.

    GLIMMIX -- a powerful approach to segmentation based on latent class models. Analysis of brand choice, purchase frequency and preference data.

    CORWIN -- a program for correspondence analysis, which decomposes relations in a two-way table.

    Form Artist -- lets you design and create online forms for data collection via the Web. Forms and surveys run on any web server (Microsoft, Unix, Linux ), and work with all browsers (no plugins required. WYSIWYG interface gives complete control over the appearance of forms (any shape, size, number of pages, color scheme). Create multi-page forms on the same web page without reloading.. Supports the usual data entry fields (text, numbers, lists, checkboxes etc.), also unique objects such as picture grids and emoticons. Can fill in forms online or offline. Completed data can then be sent back via email or by file. Free evaluation version available.

    AssiStat -- a Windows-based package of calculations and analyses useful in educational and psychological research, practice, and in measurement and statistics courses. Designed as a complement to typical statistical packages rather than as a primary analysis tool, it picks up where primary analysis packages usually fall short--in performing secondary analyses like correction of correlations for restriction in range or less-than-perfect reliability, and other specialized analyses and calculations usually not available in standard packages without special programming. Free demo available.

    Biostatistics and Epidemiology softwares

    OpenEpi Version 2.2.1 -- OpenEpi is a free, web-based, open source, operating-system-independent series of programs for use in public health and medicine, providing a number of epidemiologic and statistical tools. It is written in JavaScript and HTML and operates similar to a calculator. OpenEpi can be thought of as an important companion to Epi Info, EpiData, SAS, SPSS, and Stata.

    M.D. Anderson Statistical Software Library-- A large collection of free statistical software (almost 70 programs!) from the Biostatistics and Applied Mathematics department of the M.D. Anderson Cancer Center. Software is distributed in the form of program source files and/or self-extracting archives of executable programs for Windows, Mac, Unix/Linux environments.

    Lifetables -- Windows program for Mortality Analysis for Demography and Epidemiology. The program will calculate the life expectancy, including all intermediary statistics, variance an confidence interval for the life expectancy, Potential Gains in Life Expectancy (PGLE), Years of Potential Life Lost (YPLL) and Lifetime Years of Potential Life Lost (LYPLL). YPLL can be calculated adjusted for competing causes of mortality and both YPLL and LYPL can also be discounted. Two populations can be compared using direct and indirect standardization, the SMR and CMF and by comparing two lifetables. Confidence intervals and statistical test are provided. There is an extensive helpfile in which everything is explained. From the Downloads section of the QuantitativeSkills web site.

    Sample Size for Microarray Experiments -- compute how many samples needed for a microarray experiment to find genes that are differentially expressed between two kinds of samples (e.g.: cancer vs. normal tissue), by performing separate gene-by-gene t-tests. You specify how many genes you're looking at, how many false positives you are willing to accept, how large a difference you want to be able to detect (as the fold difference between the two kinds of samples), the power of the test (% of differentially expressed genes likely to be detected by the experiment), and an estimate of the logarithmic SD of the gene intensities.

    MIX (Meta-analysis with Interactive eXplanations) -- a statistical add-in for Excel 2000 or later (Windows only). Ideal for learning meta-analysis (reproduces the data, calculations, and graphs of virtually all data sets from the most authoritative meta-analysis books, and lets you analyze your own data "by the book"). Handles datasets with dichotomous & continuous outcomes; calculates Risk Diff, RR, OR, Mean Diff, Hedges's g, Cohen's d; performs standard & cumulative meta-analysis with CI ,z & p; fixed and random effects modeling; Cochran's Q with p-value; Higgins's I2 and H with CI; and publication bias tests: Rank correlation (tau-b) test with z & p, Egger's and Macaskill's regression tests with CI, and Trim-and-Fill. Generates numerous plots: tandard and cumulative forest, p-value function, four funnel types, several funnel regression types, exclusion sensitivity, Galbraith, L'Abbe, Baujat, modeling sensitivity, and Trim-and-Fill.

    EWOC - Escalation With Overdose Control -- a Bayesian method for selecting dose levels in Phase I Clinical Trials while controlling the probability of exceeding the maximum tolerated dose. This is a stand-alone Windows (95 through XP) program that receives information about dose-limiting toxicities (DLTs) observed at some starting dose, and calculates the doses to be administered next. DLT information obtained at each dosing level guides the calculation of the next dose level. (For some strange reason, the EWOC download web site does not work properly with the FireFox web browser; but it does work with MS Internet Explorer.)

    STPLAN -- Performs power, sample size, and related calculations needed to plan studies. Covers a wide variety of situations, including studies whose outcomes involve the Binomial, Poisson, Normal, and log-normal distributions, or are survival times or correlation coefficients. Available for MS-DOS and Mac; also as Fortran and C source code.

    Epi InfoVersion 3.5.1 -- Public domain statistical software for epidemiology developed by Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia (USA). Epi Info has been in existence for over 20 years and is currently available for Microsoft Windows. The program allows for data entry and analysis. Within the analysis module, analytic routines include t-tests, ANOVA, nonparametric statistics, cross tabulations and stratification with estimates of odds ratios, risk ratios, and risk differences, logistic regression (conditional and unconditional), survival analysis (Kaplan Meier and Cox proportional hazard), and analysis of complex survey data. The software is in the public domain, free, and can be downloaded from http://www.cdc.gov/epiinfo. Limited support is available.

    PEPI -- a collection of 43 small DOS / Windows programs that perform a large assortment of statistical tests. They can be downloaded individually, or as a single ZIP file.

    Free Public Health & Epidemiology Software -- written by Mark Myatt)and others:
    PAMCOMP (Person-years And Mortality COMputation Program) -- a free Windows 95/98/NT application for calculating person-years and standardised mortality ratios (SMRs). The calculation of person-years allows flexible stratification by sex, and self-defined and unrestricted calendar periods and age groups, and can lag person-years to account for latency periods.The SMR computation includes calculation of 90%, 95%, and 99% confidence intervals. Has filters for ASCII, dBase, Excel, Access, Paradox to import cohort and reference data and to export distributions of person-years and deaths.

    ARIMA -- a seasonal adjustment program for PC and Unix, developed by the Census Bureau.

    DEMETRA -- (Win 9x/NT) a user-friendly interface to the seasonal adjustment methods TRAMO/SEATS and X-12-ARIMA . Developed by Eurostat to facilitate the application of these modern time series techniques to large-scale sets of time series and in the explicit consideration of the needs of production units in statistical institutes. Client/server architecture can access various kinds of databases and files. Contains two main modules: seasonal adjustment and trend estimation with an automated procedure (e.g. for unexperienced users or for large-scale sets of time series), and with a user-friendly procedure for detailed analysis of single time series.

    Meta-analysis 5.3 -- Free DOS statistics software for meta-analysis. Probably still the most frequently used meta-analysis software in the world. Can select the analysis of exact p values or effect sizes (d or r, with a cluster size option). Can plot a stem-and-leaf display of correlation coefficients. A utility menu is provided that allows various transformations and preliminary computations that are typically required before the final meta-analysis can be performed.

    EasyMA -- a free user-friendly MS-DOS program for the meta-analysis of clinical trials results. Developed to help physicians and medical researchers to synthesize evidence in clinical or therapeutic research.

    EPIMETA (from CDC) -- a DOS-based meta-analysis program that features a Windows-like interface which makes data entry, file manipulation, and subgroup analysis easy.

    Life Table -- available in Lotus and Excel formats.

    ABSRISK -- a program (MS-DOS) for estimating absolute risks from relative risks. Uses age-specific mortality and morbidity data to convert relative risk estimates into absolute risk estimates. That is, it estimates the probability that a patient will suffer a specific morbid or mortal outcome in a given time interval. The user first specifies a data file that contains the needed mortality and morbidity data for the disease of interest. She then gives her patient's age and relative risk, and the time interval over which the risk estimate is to be derived. The program derives this risk, which is given both interactively and in a log file.

    Biodiversity Research Software -- Five software packages, with documentation:
    1. LUMP, LINK, and JOIN: Utility Programs for Biodiversity Research
    2. COLLECT1 and COLLECT2: Programs for Calculating Statistics of Collectors' Curves
    3. BOUNDARY: A Program for Detecting Boundaries in Ecological Landscapes
    4. EXTSPP1 and EXTSPP2: Programs for Comparing and Performance-Testing Eight Extrapolation-Based Estimators of Total Taxonomic Richness
    5. RARE, SPPDISS, and SPPRANK: Programs for Detecting Between- Sample Differences in Community Structure
    HICAST -- a PC-based program for rapid entry of clinical and laboratory parameters needed for the calculation of ten internationally applied scoring systems used on the an Intensive Care Unit. Allows sharing of relvant data, so multiple enrties of the same data are not necessary.

    Curve-fitting & Modeling softwares

    EasyReg (Easy Regression Analysis), by Herman J. Bierens. Incredibly powerful and multi-featured program for data manipulation and analysis. Designed for econometrics, but useful in many other disciplines as well. For Win 98/98/NT4.

    Compumine Rule Discovery System -- easy to use data mining software for developing high-quality rule based prediction models, such as classification and regression trees, rule sets and ensemble models. This program is licensed under the P3 license model wich means that it is free to use forever for developing rule-based predictive models, and can be freely downloaded here.

    gretl -- a cross-platform (Linux, Windows, Mac, etc.) package for econometric analysis. Has an intuitive interface (English, French, Italian & Spanish). Supports a wide variety of least-squares based estimators, including two-stage & nonlinear least squares, augmented Dickey-Fuller test, Chow test for structural stability, Vector Autoregressions, ARMA estimation. Creates output modelss as LaTeX files, in tabular or equation format. Has an integrated scripting language: enter commands either via the gui or via script, command loop structure for Monte Carlo simulations and iterative estimation procedures, GUI controller for fine-tuning Gnuplot graphs, Link to GNU R for further data analysis. Reads own format XML data files, Comma Separated Values files, Excel and Gnumeric worksheets, BOX1 files, own format binary databases (allowing mixed data frequencies and series lengths) and RATS 4 databases. Includes a sample US macro database. See also the gretl data page.

    mle - Maximum Likelihood Estimation -- a simple programming language for building and estimating parameters of likelihood models. Originally designed for survival models, but the language has evolved into a general-purpose tool for building and estimating  general likelihood models. Available for Windows and Linux; also provides User Manual, Reference Manual, and Quick Reference Card.

    WinSAAM -- Windows implementation of SAAM (System Analysis and Modeling Software). Lets you create mathematical models, design and simulate experiments, and analyze data. Models can contain differential equations, which will be numerically integrated and fit to data. Graphic and tabular output is provided.

    Boomer -- Non-linear Regression Program for Analysis of Pharmacokinetic and Pharmacodynamic Data. Includes normal fitting, Bayesian estimation, or simulation-only, with integrated or differential equation models. Allows selection of weighting schemes and methods for numerical integration. Free downloads for Macintosh and Windows; online manual, tutorial, sample data sets.

    DEMETRA -- user-friendly interface to TRAMO/SEATS and X-12-ARIMA .

    JoinPoint Regression Program (from the National Cancer Institute) -- for the analysis of trends using joinpoint models (where several different lines are connected together at the "joinpoints."). Takes trend data (e.g cancer rates) and fits the simplest joinpoint model that the data allow, using a Monte Carlo Permutation method. Models may incorporate estimated variation for each point (e.g. when the responses are age adjusted rates) or use a Poisson model of variation. In addition, the models may also be linear on the log of the response (e.g. for calculating annual percentage rate change). The software also allows viewing one graph for each joinpoint model, from the model with the minimum number of joinpoints to the model with maximum number of joinpoints.

    NeuroSolutions -- applies neural network technology to many situations, including regression. Free evaluation version does everything except print or save networks.

    LOCFIT -- a software system for fitting curves and surfaces to data, using the local regression and likelihood methods. (from Bell Labs) Runs on various platforms under R or S statistical systems; also available as a stand-along package for Win95/98/NT.

    Origin -- technical graphics and data analysis software for Windows. Includes 3D and contour plotting, FFT filtering; works closely with Excel. 30 evaluation.

    CART -- Salford Systems flagship decision-tree software, combines an easy-to-use GUI with advanced features for data mining, data pre-processing and predictive modeling.

    CurveExpert -- comprehensive curve fitting system for Windows. Handles linear regression models, nonlinear regression models, interpolation, or splines. Over 30 models built-in; custom user-defined regression models. Full-featured graphing capability. Supports an automated process that compares your data to each model to choose the best curve. 30-day evaluation of shareware package.

    DTREG generates classification and regression decision trees. It uses V-fold cross-valication with pruning to generate the optimal size tree, and it uses surrogate splitters to handle missing data. A free demonstration copy is available for download.

    NLREG performs general nonlinear regression. NLREG will fit a general function, whose form you specify, to a set of data values. A free demonstration copy is available for download.

    Partitionator -- a fast recursive partitioning engine that uses a learning set to generate rules by which a dependent variable can be predicted, by optimally splitting continuous predictors. Free 30-day evaluation.