Multivariate Data Analysis  Sixth Edition by Hair, Black, Babin, Anderson and Tatham


A number of datasets are available to enable students and faculty to perform the multivariate analyses described in the textbook.  While some techniques require specialized datasets (e.g., multidimensional scaling, conjoint analysis and structural equation modeling), many of the techniques are performed using conventional survey data. 


We have collected all of the datasets needed for each edition, along with some supplemental datasets and documentation.  Each dataset is is SPSS format (.SAV) which is easily read by most statistical packages.  Moreover, the basic datafiles are also provided in Excel format for ease of use in other statistical packages.


Select the edition below for a compressed (zipped) file of all datasets.  Descriptions of the datasets are provided in documentation within each file as well as in the section below.

Download the Complete Set of Datasets

(Right click, then "Save As")


Descriptions of the Individual Datasets

HBAT:  Actually a series of datasets used with many of the techniques.






HBAT: the primary database with multiple metric & nonmetric variables allowing for use in most of the multivariate techniques. HBAT_200: an expanded dataset, comparable to HBAT except for 200 rather than 100 respondents, used in MANOVA.
HBAT_MISSING; a reduced dataset with 70 respondents and missing data in the variables.  Used with techniques for diagnosis and remedy of missing data (Chapter 2). HBAT_SPLITS: contains two variables that split the HBAT dataset into 50/50 and 60/40 subsamples.  This dataset can be merged with the original HBAT dataset if desired.
Conjoint Analysis HBAT_CPLAN: details the "full-profile" stimulus descriptions HBAT_CONJOINT: contains the actual responses to the stimulus profiles


HBAT_MDS: used in MDS (multidimensional scaling) HBAT_CORRESP: used for correspondence analysis
Structural Equation

Modeling (SEM)


Download the set of five datasets or individual datasets.

HBAT_SEM: the original data responses from 400 individuals used to derive the input matrices for SEM programs (e.g. LISREL, EQS or AMOS) HBAT_SEM_NOMISSING: the original dataset of 400 responses has two individuals with missing data.  This dataset replaces the missing values so that the resulting sample is 400 complete responses.
HBAT.COV, HBATF.COV and HBATM.COV: these three covariance matrices represent the overall sample, female respondents and male respondents, respectively.
Other Datasets:
Two additional datasets are provided to allow students access to data other than the HBAT data files described in the textbook HATCO: this dataset has been utilized in past versions of the textbook and provides a simplified set of variables amenable to all of the basic multivariate techniques. SALES: this dataset concerns sales training and is comprised of 80 respondents, representing a portion of data that was collected by academic researcher

Drop us an e-mail if you have a comment, suggestion
or online resource you would like to share.


Multivariate Data Analysis
Hair, Black, Babin and Anderson