[Return to Sta113 Home]

STA113: Lab 4 (Friday, 24 September, 2004)



M-files

In Matlab, strings of instructions may be written in a file and called upon by the name of the file. There are two basic types of these: scripts and functions. A "script" is simply a series of commands written in a file. The filename must end in the extension .m. For example, write a file called makeparabolas.m containing the following lines of commands:

% A script file to demonstrate how m-files work.
% It is always a good idea to begin your .m files
% With a comment about what they do.
x = [-10:0.1:10]';
y = x.^2;
plot(x,y,'r')
hold on
plot(x,-y,'g')
hold off
title('A red parabola and a green parabola')
Once you have saved that file, you can simply type makeparabolas at the Matlab prompt, and it will execute all the lines of code as if you had entered them sequentially. Be sure to save the file in the same directory from which you are running Matlab.

The second basic type of .m file in Matlab is a function. Functions are far more powerful than simple scripts, for three reasons. First, they can take inputs. Second, they can return outputs. Third, the variables created in the function are local, not global, so they won't clutter up your set of variables and you need not worry about re-using variables of the same name. (See help global if you want to learn how to declare variables global.)

A function .m file must begin with the following syntax:

% Comments stating what the function does.
function returnvalue = functionname(input1, input2)
The name of the function should be the same as the name of the file, without the .m extension. There can be as many inputs as you like. Multiple outputs are also possible, but we'll just use a single output for now. (The word output, by the way, is a reserved word in Matlab, so avoid using that one.) Somewhere in the file, the return value variable must be defined. (Alternatively, if the return variable remains undefined, the function will still work but will not return any value.) The function may, in addition to returning an output, also make graphs or serve other purposes.

Practice by writing the following file, calling it parabolavertex.m:

% Draws a parabola for the function f(x)=ax^2+bx+c
% over the range of values from xmin to xmax
% and returns the y-value of the vertex.

function vertex = parabolavertex(a,b,c,xmin,xmax)

vertexx = -b/(2*a);
vertexy = a*vertexx^2+b*vertexx+c;
x = [xmin:0.01:xmax]';
y = a*x.^2+b*x+c;
plot(x,y,'r')
grid on
title('Here is your parabola')
vertex = [vertexx, vertexy];

Be sure to save that file in the directory from which you're running Matlab. Now at the Matlab prompt, type:
myvertex = parabolavertex(1, -5, 4, 0, 6)
who
You should see that the variable vertexy does not exist because that was local to your function. Instead, you now have a variable called myvertex which is the vector of coordinates of the vertex of a parabola defined by y=a^2-5x+4. And you got a picture of the parabola as well.


The following function draws a histogram of a set of data adjusted so that its total area is 1. That makes it useful for comparison with a probability density function. This is not a built-in Matlab function, like the absolute frequency historgam function hist. You may notice in the function that the return variable dummy is never defined. That means that the function cannot return a value.
function dummy = histadjusted(data, numbins)
% relhist(data, numbins) shows the histogram of the data
% with the number of bins specified by "numbins".
%
% The height of the histogram is equal to 
% (relative freq)/(bin width)
%
  n = length(data);
  binwidth = range(data)/numbins;
  edg= min(data):binwidth:max(data);
  [count, bin] = histc(data, edg);
  h = bar(edg, count./(n*binwidth), 'histc');
  set(h, 'facecolor', [0.8 0.8 1]);
  set(h, 'edgecolor', [0.8 0.8 1]);

Copy and paste the function into a text editor and save it as histadjusted.m in the directory where you started Matlab. Now the function histadjusted can be called as any other Matlab functions.


Random Number Generator

You can generate random numbers from each distribution in Matlab. The random number generator command is the name of the distribution plus rnd (for example, unifrnd is the command for generating random numbers from uniform distribution). This function provides a single random number or a matrix of random numbers, depending on the arguments you specify in the function call. To figure out how to specify the arguments, read the help file.

Suppose we want to simualate 10000 draws from a standard normal distribution:

sims = normrnd(0, 1, 10000, 1);
Let's see what we got:
histadjusted(sims,50)
Now let's perform a linear transformation on the simulated data:
simstransformed = 3+4*sims;
histadjusted(simstransformed,50)
Can you see what happened? The new simulants have a distribution centered on 3 (because of the translational shift) with a standard deviation of 4 (because of the "stretching" multiplier). Let's see how this compares to the normal pdf with mean 3 and standard deviation 4:
hold on
x = [-15:0.1:20]';
y = normpdf(x,3,4);
plot(x,y,'r')
hold off
Hopefully, you got a reasonably good fit.


Normal probability plots

As you have seen in your reading (section 4.6), a normal probability plot (sometimes called a "normal quantile plot", just a "normal plot" or a "qq plot") is a graph plotting the actual data against the standard normal quantiles. If the data are from a normal distribution, then the normal probability plot should look roughly linear. The Matlab command for creating this plot is simple:

qqplot(data)
Let's check it out with the simulated data simstransformed that you already created:
qqplot(simstransformed)
Hopefully, that looks very linear. Will normality be preserved if we square the data?
sims2 = simstransformed.^2;
qqplot(sims2)
Not very linear, so the data must not be very normal.
histadjusted(sims2,25)
Confirmed.

You can download data from your text (example 4.29 from p. 193) at this link exp4-29.dat and save it as a file named "exp4-29.dat" in your home directory. You might need to refer back to the first lab to see how to read the data. (This data file has a header line.) Use Matlab to reproduce the normal quantile plot shown in your text.


Check your mastery

These problems do not need to be turned in. They are here for your practice. Some topics not covered in this lab are included. These may be from earlier in the course or they may foreshadow future topics.

  1. Major leage baseball players

  2. Simulated sampling distributions of statistics

    You can simulate n random integers from 1 to J like this:

    sims = ceil(rand(n,1)*J);
    
    The ceil function stands for "ceiling". It rounds up numbers to the next-higher integer. You can sample n items at random from a list of J items like this:
    sample = listname(ceil(rand(n,1)*J));
    
    If you wanted lots (say, 10,000) of samples of size n, you could do it like this:
    samplesmatrix = listname(ceil(rand(n,10000)*J));
    
    There are 10,000 columns in this matrix, each one representing a different sample of size n. The functions mean, median, max, and min all return a row vector of column statistics (such as column means, column medians, etc.) when their argument is a matrix.