[Return to Sta113 Home]

Sta113: Lab2 (Friday, January 17, 2003)


Download the CPUtime data by click here and save it as a file named "CPUtime.dat" in your home directory.
cpu = load('CPUtime.dat');  % load data
Summary Statistics
n = length(cpu)  % sample size               

mean(cpu)               
median(cpu)

range(cpu)                      
max(cpu) - min(cpu)
var(cpu)
sum((cpu-mean(cpu)).^2)/(n-1)
std(cpu)
sqrt(var(cpu))

prctile(cpu, 0:25:100)  % five-number summary
Histogram

Two matlab commands "hist" and "histc". Read the help file to learn the difference between the two commands.

hist(cpu,7);  % draw histogram with 7 classes

edg = 0.015:0.7:4.915;
[n,bin] = histc(cpu, edg);  % draw histogram with classes defined by "edg"
h = bar(edg, n/25 , 'histc');
set(h, 'facecolor', [1 1 1]);  % change the color of the bins

% change axes labels and ticks 
set(gca, 'XLim', [0 5]);         
set(gca, 'XTick', edg);
xlabel('CPU Time')
ylabel('Frequency');
ylabel('Relative Frequency');
Time Series Plot

Download the birth rate data: birthrate.dat and save it in your home directory.

y = load('birthrate.dat')       
plot(y(:,2), y(:,1), 'ko')      
% plot the second column (birthrate) of y again its first column (year).
% 'k' means black; 'o' means data will be marked by circles
 
xlabel('Year');
ylabel('Birthrate');
Boxplot

We will draw boxplot for the haircolor data shown in class.

% read data where 'NaN' means missing data
pain= [ 62 60 71 55 48; 63 57 52 41 43; 42 50 41 37 NaN; 32 39 51 30 35];
pain = pain'  % transpose of the original matrix
boxplot(pain)
 
set(gca, 'XTicklabel', {'Light Blonde', 'Dard Blonde', 'Light Brunette', 'Dark Brunette'})
xlabel('Hair Color');
ylabel('Pain threshold score');
Empirical Rule and Outliers

Download the data on 144 contaminated fish specimens from Appendix III. A quantitative variable measured for each specimen is length (in centimeters), which is recorded at the 4th column in the data file. We will do some analysis similar to what Example 2.6 (from the textbook) does.

fish = load('fish.dat');
length = fish(: ,4); % take the 4th column

s = std(length);

% check empirical rule, where '&' is a logical operator, standing for 'AND' 
sum((length > mean(length) - s) & (length < mean(length) + s))/144
sum((length > mean(length) - 2*s) & (length < mean(length) + 2*s))/144
sum((length > mean(length) - 3*s) & (length < mean(length) + 3*s))/144

% how to identify outliers
boxplot(length);
qu = prctile(length, 75);
ql= prctile(length, 25);

upper = qu +  1.5*(qu-ql);
lower = ql - 1.5*(qu-ql); 

% number of outliers, where '|' is a logical operator, standing for 'OR'
sum(length > upper | length < lower)
% remove outliers, now you can do analysis on the new data set
newlength = length(length <= upper & length >= lower);