Posted on
programming research matlab cluster ETH Zurich

We all know it, Matlab is not the cleanest programming language, yet many of us use it. Probably mostly because Matlab is quite efficient when it comes to prototyping new ideas. Once you convinced yourself that your new idea is working, you normally would like to submit your work to a scientific conference or a journal. But for this you probably need to perform some more extensive experiments. In machine learning you might have to average over different seeds or choose the regularization weight via cross-validation. Running the algorithm for all the different configurations might become quite time consuming. Wouldn't it be great if you could use a cluster for this? This post documents my best practices to run matlab scripts on a cluster and documents a set of scripts, called matluster, that I have developed to simplify this task.

For this post let's assume you have developed a new machine learning algorithm. You want to evaluate the algorithm on several datasets for different seeds as your algorithm involves some randomization. I generally split up the workflow into three matlab scripts: prepare.m, main.m and collect.m. The filenames should be fairly self-explanatory.

A preparation script that generates a shell script to submit jobs for all the different configurations.

prepare.m
function prepare()
 
addpath('matluster');
 
run_idx = 0;
 
% initialization
if (~exist('./local', 'dir'))
    mkdir('local');
end
fid = fopen('submit.sh', 'w');
 
% formatting
format = [];
format.dataset = '%s';
format.algorithm = '%s';
format.seed = '%d';
 
% reporting
reporting = [];
reporting.groupby = {'dataset'};
 
% init options
options = [];
options.format = format;
options.reporting = reporting;
 
% configurations
datasets = {'ds1', 'ds2', 'ds3'};
algorithms = {'algo1','algo2','algo3'};
seeds = [1, 7, 13, 31, 37, 41, 43, 47, 53, 59];
for dataset_idx=1:numel(datasets)
    options.dataset = datasets{dataset_idx};
    
    for algorithm_idx=1:numel(algorithms)
        options.algorithm = algorithms{algorithm_idx};
        for seed=seeds
            options.seed = seed;
 
            filename = sprintf('local/options_%d.mat', run_idx);
            save(filename, 'options');
            timelimit = '08:00';
            matluster_addJobToQueue(fid, options, run_idx, './run_main.sh /cluster/apps/matlab/7.14/', timelimit);
        
            run_idx = run_idx+1;
        end
    end
end
 
num_runs = run_idx;
save('local/num_runs.mat', 'num_runs');
 
fclose(fid);
unix('chmod +x submit.sh');

The main script which calls the machine learning algorithm and performs the evaluation, e.g. compute a test error.

main.m
function main(run_idx)
 
run_idx = str2num(run_idx);
 
% load the options
load(sprintf('local/options_%d.mat', run_idx));
 
if (~isdeployed())
    addpath('matluster');
end
 
testerr = runalgo(options.dataset, options.algorithm, options.seed);
 
result = [];
result.err = testerr;
 
% save all the results
if (~exist('./output', 'dir'))
    mkdir('output');
end
conf_str = matluster_generateStringFromOptions(options);
filename = sprintf('output/%s.mat', conf_str);
save(filename, 'result');

The collection script which loads all the intermediate results and compiles them to a summary and/or generates plots.

collect.m
function collect()
 
% for plotting
addpath('matluster');
 
% collect results
load('local/num_runs.mat');
 
collection = matluster_collect(num_runs);
 
for idx=1:numel(collection)
    collectDataset(collection{idx});
end
 
 
function collectDataset(collection)
 
% get the results into a table
tab_err = matluster_reshapeResults(collection, 'err');
 
% shows a table with algorithm and seed as the different dimensions
tab_err

I always compile the main script, this explains the strange call via ./run_main.sh. I use variations of the brutus_compile.sh script in order to perform the compilation. Here we assume that ${TMPDIR} is set by the queuing system and corresponds to a local scratch folder.

brutus_compile.sh
mcc -R -singleCompThread -R -nodisplay -R -nojvm -m main.matlab -a matluster
sed -i '7i\export MCR_CACHE_ROOT=${TMPDIR};' run_main.sh

The example code here might also help to understand the matluster scripts a bit better.





comments powered by Disqus