Tutorial :How to read file using textscan?


I have a large tab delimited file (10000 rows, 15000 columns) and would like to import it into Matlab.

I've tried to import it using textscan function the following way:

function [C_text, C_data] = ReadDataFile(filename, header, attributesCount, delimiter,     attributeFormats, attributeFormatCount)  AttributeTypes = SetAttributeTypeMatrix(attributeFormats, attributeFormatCount);  fid = fopen(filename);  if(header == 1)      %read column headers      C_text = textscan(fid, '%s', attributesCount, 'delimiter', delimiter);      C_data = textscan(fid, AttributeTypes{1, 1}, 'headerlines', 1);  else      C_text = '';      C_data = textscan(fid, AttributeTypes{1, 1});  end      fclose(fid);  

AttributeTypes{1, 1} is a string wich describes variable types for each column (in this case there are 14740 float and 260 string type variables so the value of AttributeTypes{1, 1} is '%f%f......%f%s%s...%s where %f is repeated 14740 times and %s 260 times).

When I try to execute

>> [header, data] = ReadDataFile('data/orange_large_train.data.chunk1', 1, 15000, '\t', types, size);  

header array seems to be correct (column names have been read correctly).

data is a 1 x 15000 array (only first row has been imported instead of 10000) and don't know what is causing such behavior.

I guess the problem is caused in this line:

C_data = textscan(fid, AttributeTypes{1, 1});  

but don't know what could be wrong because there is a similar example described in the help reference.

I would be very thankful if anyone of you suggested any fix for the issue - How to read all 10000 rows.


I believe all your data are there. If you look inside data, every cell there should contains the whole column (10000x1). You can extract i-th cell as an array with data{i}.

You would probably want to separate double and string data. I don't know what is attributeFormats, you probably can use this array. But you can also use the AttributeTypes{1, 1}.

isdouble = strfind(AttributeTypes{1, 1}(2:2:end),'f');  data_double = cell2mat(data(isdouble));  

To combine string data into one cell array of strings you can do:

isstring = strfind(AttributeTypes{1, 1}(2:2:end),'s');  data_string = horzcat(data{isstring});  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »