Posts Tagged ‘Matlab’

Linear indexing and hash tables in MATLAB

Monday, October 10th, 2011

Let’s consider the following database table:

Date Id Amount
29.10.2011 1001 10.2
30.10.2011 1001 15.6
29.10.2011 1002 17.5
30.10.2011 1002 12.2
31.10.2011 1002 7.9
1.11.2011 1001 19.3
1.11.2011 1002 17.9

Suppose that from this table we want to generate the following MATLAB matrix storing amount information.

10.2 15.6 NaN 19.3
17.5 12.2 7.9 17.9

To do that we need to map each date and ID with an index. We use containers.Map objects. See Matlab Map Containers page to learn more.

1001 1
1002 2
29.10.2011 1
30.10.2011 2
31.10.2011 3
1.11.2011 4

Next step is to fill matrix Amount. Consider a m x n matrix A. Suppose we want to use matrix B = (1,1;1 2) for selecting elements of A. Unfortunetely A(B(:,1), B(:,2)) will not work. We should use linear indexing. See Matrix indexing to learn more.

MATLAB code:

T1 = ['29.10.2011'; '30.10.2011'; '29.10.2011'; '30.10.2011'; '31.10.2011'; '1.11.2011'; '1.11.2011'];
T2 = [1001; 1001; 1002; 1002; 1002; 1001; 1002];
T3 = [10.2; 15.6; 17.5; 12.2; 7.9; 19.3; 17;9 ];

keySet = {'29.10.2011', '30.10.2011', '31.10.2011', '01.11.2011'};
valueSet = 1:4;

date_index = containers.Map(keySet,valueSet);

keySet = {1001, 1002};
valueSet = 1:2;

id_index = containers.Map(keySet,valueSet);

Amount = ones(2,4)*NaN;

Amount(sub2ind(size(Amount), values(date_index,T1), values(id_index,T2) )  ) = T3

You can also index dates by using Matlab’s datenum function. This function returns how many days is passed from a reference point (1-Jan-0000). Suppose D matrix contains a list of dates then the following code will also work:

D = {'29.10.2011', '30.10.2011', '29.10.2011', '30.10.2011', '31.10.2011', '01.11.2011'}
datenum(D) - min(datenum(D))

Tags:
Posted in General | No Comments »

A matlab implementation of Havel-Hakimi algorithm

Saturday, June 11th, 2011

In graph theory, degree of a given vertex is the number of vertices which are neighbour to this vertex. Let (d_0, d_1, \ldots , d_n) be a degree sequence then we can check whether it is possible to construct a graph such that the degrees of vertices follows this degree sequence. We call such a degree sequence graphical. The first simple check is to sum all the degree values. This number should even by hand-shakking lemma. This is a necessary condition but not sufficient. For example it is impossible to construct a graph with a degree sequence [3 3 3 1].

Erdős–Gallai theorem gives us necessary and sufficient conditions for this problem.

the sum of the sequence is even and
\sum_{i=1}^{k}d_i \leq k(k-1) + \sum_{i=k+1}^n  \min(d_i,k) \quad \text{for } k \in \{1,\dots,n\}

Suppose the degree sequence is graphical then the following simple procedure construct a possible graph implementation from this given degree sequence.

Pseudocode: Havel-Hakimi

Matlab Code: Havel-Hakimi

function E = mcmc_graph(degree_sequence)

n = size(degree_sequence,2);
% Check if the given degree sequence is graphical
degree_sequence_feasible = 1;
if mod(sum(degree_sequence),2)
    degree_sequence_feasible = 0;
else
    sorted_degree_sequence = sort(degree_sequence,'descend');
    for k=1:n
        temp_sum = 0;
        for i=k+1:n
            temp_sum = temp_sum + min(sorted_degree_sequence(i),k);
        end
        if (sum( sorted_degree_sequence(1:k)) > k*(k-1) + temp_sum )
            degree_sequence_feasible = 0;
        end
    end
end

m = sum(degree_sequence)/2;
E = zeros(m,2);

% Construct an initial graph with Havel-Hakini
if degree_sequence_feasible
    ds = degree_sequence;
    i = 0;
    while sum(ds) > 0
        [residual_degree vertices] = sort(ds,'descend');
        selected_vertices = vertices(2:residual_degree(1)+1);
        E(i+1:i+residual_degree(1),:) = ...
            [repmat(vertices(1),residual_degree(1)) selected_vertices'];
        i = i + residual_degree(1);
        ds(vertices(1)) = ds(vertices(1)) - residual_degree(1);
        ds(selected_vertices) = ds(selected_vertices) - 1;
    end

Tags: ,
Posted in General | No Comments »

Latex Template for Assignments/Homeworks

Saturday, March 13th, 2010

I am not using MS Word since I started my master degree; all of the documents that I produced is done in LaTeX. If you are also using LaTeX for your homeworks or you will start writing your thesis soon, I recommend you to visit Ted’s latex template page. Probably the most exciting feature of this template is easiness of including matlab scripts, plus it looks fancy!

Tags: , ,
Posted in PHD | No Comments »

How to deal with missing values in data analysis

Friday, February 19th, 2010

Missing values in a data set can be occured from several reasons:

One can try to ignore those missing entries however this approach may be misleading especially for small data sets. Sample containing missing values also hold some value and the information can be easily recovered by imputation. The following site contains useful resources, tutorials and links. In this study, I tried to compare different techniques for the imputation process, corresponding matlab files.
Basically there are three methods:

1. Random Fill

In this approach missing fiels are filled with a number generated according to a probability distribition and the parameters of this distribition can be determined with the rest of that column which are available.

2. Regression

One can also think filling missing value problem as a regression problem. In this case, column containing missing values should be considered as a output values and other columns are dependant variables. Missing values can be estimated with this regression model.

3. Multiple Imputation

In this modern approach, not only one filled tableau is generated but several and then we take the average of them. If we knew the missing values, then estimating the model parameters would be
straightforward. Similarly, if we knew parameters of the data model, then it would be
possible to obtain unbiased prediction for the missing values. An iterative method can
be used: first predict the missing values based on assumed values for the parameters,
use these predictions to update the parameter estimates, and repeat. The sequence
of parameters converges to maximum-likelihood estimates.

Tags: , ,
Posted in Optimization | 2 Comments »

Traveling Salesman Problem

Friday, February 12th, 2010

Traveling Salesman Problem also known as TSP can be defined as follows: Given a number of cities and the costs of travelling from any city to any other city, what is the least-cost round-trip route that visits each city exactly once and then returns to the starting city. This is a good book about TSP. You can find TPS instances in this site. These instances can be used in order to benchmark algorithms. In this report some contructive heuristic methods (nearest neighbor, arbitrary insertion, farthest insertion) as well as a improvement heuristic (2-opt) are compared. Corresponding matlab files are also available.

Tags: , ,
Posted in Optimization | No Comments »