Tutorial :How to store data with dynamic number of attributes in a database



Question:

I have a number of different objects with a varying number of attributes. Until now I have saved the data in XML files which easily allow for an ever changing number of attributes. But I am trying to move it to a database.

What would be your preferred way to store this data?

A few strategies I have identified so far:

  • Having one single field named "attributes" in the object's table and store the data serialized or json'ed in there.
  • Storing the data in two tables (objects, attributes) and using a third to save the relations, making it a true n:m relation. Very clean solution, but possibly very expensive to fetch an entire object and all its attributes
  • Identifying attributes all objects have in common and creating fields for these to the object's table. Store the remaining attributes as serialized data in another field. This has an advantage over the first strategy, making searches easier.

Any ideas?


Solution:1

If you ever plan on searching for specific attribtes, it's a bad idea to serialize them into a single column, since you'll have to use per-row functions to get the information out - this never scales well.

I would opt for your second choice. Have a list of attributes in an attribute table, the objects in their own table, and a many-to-many relationship table called object attributes.

For example:

objects:      object_id    integer      object_name  varchar(20)      primary key  (object_id)  attributes:      attr_id      integer      attr_name    varchar(20)      primary key  (attr_id)  object_attributes:      object_id    integer  references (objects.object_id)      attr_id      integer  references (attributes.attr_id)      primary key (object_id,attr_id)  

Your concern about performance is noted but, in my experience, it's always more costly to split a column than to combine multiple columns. If it turns out that there are performance problems, it's perfectly acceptable to break 3NF for performance reasons.

In that case I would store it the same way but also have a column with the raw serialized data. Provided you use insert/update triggers to keep the columnar and combined data in sync, you won't have any problems. But you shouldn't worry about that until an actual problem surfaces.

By using those triggers, you minimize the work required to only when the data changes. By trying to extract sub-column information, you do unnecessary work on every select.


Solution:2

A variation on your 2d solution is just two tables (assuming all attributes are of a single type):

T1: |Object data columns|Object_id|

T2: |Object id|attribute_name|attribute value| (unique index on first 2 columns)

This is even more efficient when combined with 3rd solution, e.g. all of the common fields go into T1.

Sstuffing >1 attribute into the same blob is no recommended - you can not filter by attributes, you can not efficiently update them


Solution:3

Let me give some concreteness to what DVK was saying.

Assuming values are of same type the table would look like (good luck, I feel you're going to need it):

  dynamic_attribute_table  ------------------------  id         NUMBER  key        VARCHAR  value      SOMETYPE?  

example (cars):

  |id|    key   |   value   |  ---------------------------  | 1|'Make'    |'Ford'     |  | 1|'Model'   |'Edge'     |  | 1|'Color'   |'Blue'     |  | 2|'Make'    |'Chevrolet'|  | 2|'Model'   |'Malibu'   |  | 2|'MaxSpeed'|'110mph'   |  

Thus,
entity 1 = { ('Make', 'Ford'), ('Model', 'Edge'), ('Color', 'Blue') }
and,
entity 2 = { ('Make', 'Chevrolet'), ('Model', 'Malibu'), ('MaxSpeed', '110mph') }.


Solution:4

I used to implement this scheme:

t_class (id RAW(16), parent RAW(16)) -- holds class hierachy.  t_property (class RAW(16), property VARCHAR) -- holds class members.  t_declaration (id RAW(16), class RAW(16)) -- hold GUIDs and types of all class instances  t_instance (id RAW(16), class RAW(16), property VARCHAR2(100), textvalue VARCHAR2(200), intvalue INT, doublevalue DOUBLE, datevalue DATE) -- holds 'common' properties    t_class1 (id RAW(16), amount DOUBLE, source RAW(16), destination RAW(16)) -- holds 'fast' properties for class1.  t_class2 (id RAW(16), comment VARCHAR2(200)) -- holds 'fast' properties for class2  --- etc.  

RAW(16) is where Oracle holds GUIDs

If you want to select all properties for an object, you issue:

SELECT  i.*  FROM    (          SELECT  id           FROM    t_class          START WITH                  id = (SELECT class FROM t_declaration WHERE id = :object_id)          CONNECT BY                  parent = PRIOR id          ) c  JOIN    property p  ON      p.class = c.id  LEFT JOIN          t_instance i  ON      i.id = :object_id          AND i.class = p.class          AND i.property = p.property  

t_property hold stuff you normally don't search on (like, text descriptions etc.)

Fast properties are in fact normal tables you have in the database, to make the queries efficient. They hold values only for the instances of a certain class or its descendants. This is to avoid extra joins.

You don't have to use fast tables and limit all your data to these four tables.


Solution:5

sounds like you need something lick couchdb, not an RDBMS.


Solution:6

if you are going to edit/manipulate/delete the attributes in later point, making a true n:m (second option) will be the one which I go for. (Or try to make it 2 table where the same attribute repeats.But data size will be high)

If you are not dealing with attributes(just capturing and showing the data) then you can go ahead and store in one field with some separator(Make sure the separator wont occur in the attribute value)


Solution:7

If you are using a relational db, then I think you did a good job listing the options. They each have their pros and cons. YOU are in the best position to decide what works best for your circumstances.

The serialized approach is probably the fastest (depending on your code for de-serializing), but it means that you won't be able to query the data with SQL. If you say that you don't need to query the data with SQL, then I agree with @longneck, maybe you should use a key/value style db instead of a relational db.

EDIT - reading more of your comments, WHY are you switching to a db if speed is your main concern. What's BAD about your current XML implementation?


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »