cpu = load('CPUtime.dat'); % load dataSummary Statistics
n = length(cpu) % sample size mean(cpu) median(cpu) range(cpu) max(cpu) - min(cpu) var(cpu) sum((cpu-mean(cpu)).^2)/(n-1) std(cpu) sqrt(var(cpu)) prctile(cpu, 0:25:100) % five-number summaryHistogram
Two matlab commands "hist" and "histc". Read the help file to learn the difference between the two commands.
hist(cpu,7); % draw histogram with 7 classes edg = 0.015:0.7:4.915; [n,bin] = histc(cpu, edg); % draw histogram with classes defined by "edg" h = bar(edg, n/25 , 'histc'); set(h, 'facecolor', [1 1 1]); % change the color of the bins % change axes labels and ticks set(gca, 'XLim', [0 5]); set(gca, 'XTick', edg); xlabel('CPU Time') ylabel('Frequency'); ylabel('Relative Frequency');Time Series Plot
Download the birth rate data: birthrate.dat and save it in your home directory.
y = load('birthrate.dat') plot(y(:,2), y(:,1), 'ko') % plot the second column (birthrate) of y again its first column (year). % 'k' means black; 'o' means data will be marked by circles xlabel('Year'); ylabel('Birthrate');Boxplot
We will draw boxplot for the haircolor data shown in class.
% read data where 'NaN' means missing data pain= [ 62 60 71 55 48; 63 57 52 41 43; 42 50 41 37 NaN; 32 39 51 30 35]; pain = pain' % transpose of the original matrix boxplot(pain) set(gca, 'XTicklabel', {'Light Blonde', 'Dard Blonde', 'Light Brunette', 'Dark Brunette'}) xlabel('Hair Color'); ylabel('Pain threshold score');Empirical Rule and Outliers
Download the data on 144 contaminated fish specimens from Appendix III. A quantitative variable measured for each specimen is length (in centimeters), which is recorded at the 4th column in the data file. We will do some analysis similar to what Example 2.6 (from the textbook) does.
fish = load('fish.dat'); length = fish(: ,4); % take the 4th column s = std(length); % check empirical rule, where '&' is a logical operator, standing for 'AND' sum((length > mean(length) - s) & (length < mean(length) + s))/144 sum((length > mean(length) - 2*s) & (length < mean(length) + 2*s))/144 sum((length > mean(length) - 3*s) & (length < mean(length) + 3*s))/144 % how to identify outliers boxplot(length); qu = prctile(length, 75); ql= prctile(length, 25); upper = qu + 1.5*(qu-ql); lower = ql - 1.5*(qu-ql); % number of outliers, where '|' is a logical operator, standing for 'OR' sum(length > upper | length < lower) % remove outliers, now you can do analysis on the new data set newlength = length(length <= upper & length >= lower);