Thursday, April 7, 2011

Aanalysis or a limited set of tests statistical Packges2

EasySample -- a tool for statistical sampling. Supports several types of attribute and variable sampling and includes a random number generator and standard deviation calculator. Has a consistent, easy-to-use interface. Results may be saved or read in CSV (spreadsheet compatible) or XML (Internet compatible) file formats or printed.

EpiData -- a comprehensive yet simple tool for documented data entry. Overall frequency tables (codebook) and listing of data included, but no statistical analysis tools.

Calculate sample size required for a given confidence interval, or confidence interval for a given sample size. Can handle finite populations. Online calculator also available.

Grocer -- a free econometrics toolbox that runs under Scilab. It contains: most standard econometric capabilities: ordinary least squares, autocorelated models, instrumental variables, non linear least squares, limited dependent variables, robust methods, specification tests (multicolinearity, autocorelation, heteroskedasticity, normality, predictive failure,...), simultaneous equations methods (SUR, two and three stage least squares,...), VAR, VECM, VARMA and GARCH estimation, the Kalman filter and time varying parameters estimation, unit root tests (ADF, KPSS,...) and cointegration methods (CADF, Johansen,...), HP, Baxter-King and Christiano-Fitzgerald filters. It also contains some rare -and useful- features: a pc-gets device that performs automatic general to specific estimations, and a contributions device, that provides contributions of exogenous variables to an endogenous one for any dynamic equation. Has a -rough- interface with Excel and unlike Gauss or Matlab, it deals with true timeseries objects.

Biomapper  -- a kit of GIS and statistical tools designed to build habitat suitability (HS) models and maps for any kind of animal or plant. Deals with: preparing ecogeographical maps for use as input for ENFA (e.g. computing frequency of occurrence map, standardisation, masking, etc.); Exploring and comparing them by mean of descriptive statistics (distribution analysis, etc.); Computing the Ecological Niche Factor Analysis and exploring its output; and Computing and evaluating a Habitat Suitability map

ROC Curves -- a set of downloadable programs and Excel spreadsheets to calculate and graph various kinds of ROC (Receiver Operator Characteristic) curves.

BKD: Bayesian Knowledge Discoverer  -- a computer program able to learn Bayesian Belief Networks from (possibly incomplete) databases. Based on a new estimation method called Bound and Collapse. Developed within the Bayesian Knowledge Discovery project. See also the commercial product, called Bayesware Discoverer, available free for non-commercial use.

RoC: The Robust Bayesian Classifier -- a computer program able to perform supervised Bayesian classification from incomplete databases, with no assumption about the pattern of missing data. Based on a new estimation method called Robust Bayesian Estimator. Developed within the Bayesian Knowledge Discovery project.

DQO-PRO -- a sample-size calculator for MS Windows that performs three types of calculations:
  • determining the rate at which an event occurs (confidence levels versus numbers of false positive or negative conclusions),
  • determining an estimate of an averge within a tolerable error (given the standard deviation of individual measurements), and
  • determining the sampling grid necessary to detect “hot spots” of various assumed shapes.
Binomial Probability Program (BPP) is a menu driven program which performs a variety of functions related to the success/ failure situation. Given the probability of occurrence for a specific event, this program calculates the probability that EXACTLY, NO MORE THAN, or AT LEAST a certain number of events occur in a given number of trials for all possible outcomes, and will generate plots for each of these.
The program allows the user to repeatedly combine probabilities in series or in parallel, and at any time will show a trail of the calculations which led to the current probability value. Other program capabilities are the calculation of probabilities from input data, Gaussian approximation, and the generation of a mean time between failure (MTBF) table for various levels of confidence. Up to 2200 trials may be run, limited by IBM PC BASIC memory utilization. It is assumed that the user is familiar with the theory behind binomial probability distribution.

ADE-4 -- multivariate analysis and graphical display software package for Mac andWin 95/NT. Includes component analysis and correspondence analysis, spatial data analysis methods (analogous to Moran and Geary indices), discriminant analysis and within/between groups analyses, many linear regression methods including lowess and polynomial regression, multiple and PLS (partial least squares) regression and orthogonal (principal component) regression, projection methods like principal component analysis on instrumental variables, canonical correspondence analysis and many other variants, coinertia analysis and the RLQ method, and several three-way table (k-table) analysis methods. Graphical displays include an automatic collection of elementary graphics corresponding to groups of rows or to columns in the data table, automatic k-table graphics and geographical mapping options, searching, zooming, selection of points, and display of data values on factor maps. Simple and homogeneous user interface.

Weibull Trend Toolkit -- Fits a Weibull distribution function (like a normal distribution, but more flexible) to a set of data points by matching the skewness of the data. (Windows)

TURNER -- Macintosh software for interactivly analysing multidimensional discrete data. Uses interactive paradigms from exploratory graphical data analysis to the concise treatment of categorical data, typically arranged in two- or multi-way contingency tables. Including standard features for categorical data like Pearson's chi-squared test and log-linear models it offers the whole goodness-of-fit family of power divergence statistics and the N-value. Interactive contingency tables provide the user with the facility of easily switching between all two-dimensional views of multivariate data. All displays dealing with the same data set are fully linked and may be interacted with directly.

BUGS -- Bayesian inference Using Gibbs Sampling. Software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. Command-line interface versions available for major computer platform; a Windows version, WinBUGS, supports a graphical user interface, on-line monitoring and convergence diagnostics.

MSBNx -- a component-based Windows application for creating, assessing, and evaluating Bayesian Networks, created at Microsoft Research. Includes complete help files and sample networks. Bayesian Networks are encoded in an XML file format.

QUEST (Quick, Unbiased and Efficient Statistical Tree), and CRUISE (Classification Rule with Unbiased Interaction Selection and Estimation. Two statistical decision tree algorithms for classification and data mining, by Wei-Yin Loh and Yu-Shan Shih.

AMELIA -- A program for substituting reasonable values for missing data (called "imputation")
A collection of MS-DOS program from the Downloads section of the QuantitativeSkills web site:
  • Hypergeometric -- calculates the hypergeometric probability distribution to evaluate hypothesis in relation to sampling without replacing in small populations
  • Binomial -- calculates probabilities for sampling with replacing in small populations or without replacing in very large populations. Can be used to approximate the hypergeometric distribution. The binomial is probably the best known discrete distribution.
  • Poisson -- calculates probabilities for samples which are very large in an even larger population. Is used to approximate the binomial distribution, try to compare it with the binomial! The distribution is more often used in a completely different way, for the analysis of how rare events, such as accidents, cumulate for a single individual. For example, you can use it to estimate your chances of getting one, two, three or more accidents in any one year considering that on average people get 'U' accidents per year.
  • Negative binomial -- Also used to study accidents, is a more general case than the Poison, it considers that the probability of getting accidents if accidents clusters differently in subgroups of the population. However, the theoretical properties of this distribution and the possible relationship to real events are not well known.
  • Negative binomial -- Another version of the negative binomial, this one is used to do the marginal distribution of binomials (try it!). Often used to predict the termination of real time events. An example is the probability of terminating listening to a non-answering phone after n-rings.
  • Multinomial -- Same as the multinomial above, this one for DOS computers.
  • Fisher -- Is used to calculate the exact p-value in 2*2 tables. It is o.k. for one sided testing but not so exact for two sided testing, where there are different theories about how to do it. The sum of small p-values is the most used method, but there does not seem to be a good rationale for that. Use the fisher exact instead of the Chi-square when you have a small value in one cell or a very uneven marginal distribution.
  • SPRT -- This method of analysis is not often used, which is a pity because it is actually quite good. It is based on the case of phenomena being observed, tested, or data collected, sequentially in time. The testing or data collection is stopped as soon as some upper or lower limit is crossed of the proportion positive or negative events or outcomes relative to the total number observed. Was originally developed to keep the costs of 'destructive' testing low. Is sometimes used in medical trials to monitor the amount of negative side effects and to decide if the trial should be stopped because the number of side effect is considered unacceptably high.
  • Chi-square -- Calculates the Chi-square and some other measures for two dimensional tables
  • CASRO -- Calculates response rates according to different procedures. The CASRO (Council of American Survey Research Organizations) procedure is the 'accepted' procedure for surveys.

No comments:

Post a Comment