Tutorial :Large data store (nosql or not)


I have large amounts of scientific data that I need to store (150 TB+ starting data) and I want to know the best way to store the data (nosql or RDBMS etc...)

Any tips......



There are special db's for scientific data: http://www.dbms2.com/2009/09/12/xldb-scid/


Answer this question to choose from NoSQL or a RDBMS : "Are my data structured in relationships?"


This really depends on what you need to do with the data on a later time. If the data is a collection of a few very large files then the a normal file system would be ok. If you need to be able to search and analyse the data then a database might be the best solution.

I am working with large datasets as well in a scientific environment. Most of this data is tabular and when we started we stored every datapoint is a table. We found it to be much easier in the end to zip the tables and store this in a binary blob into the database. In a separate table we stored the metadata about this tables.


Does it have to be one database type? Part of NoSQL means one size does not fit all, so why not two or more NoSQL? How about one column store and one graph database?


You should look at NetCDF and HDF5. Also, consider using PyTables for accessing and extracting the data.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »