Tutorial :The strategy to get recovery from broken files?



Question:

Me and my colleague are trying to implement a mechanism to get recovery from broken files on an embedded equipment.

This could be happened during certain circumstances, e.g. user takes off the battery during file writing.

Orz, but now we have just one idea:

  • Create duplicated backup files, and copy them back if dangerous file i/o is not finished properly.

This is kind of stupid, as if the backup files also broken, we are just dead.

Do you have any suggestions or good articles on this?

Thanks in advance.


Solution:1

Depends on which OS etc. etc. but in most cases what you can do is copy to a temporary file name and as the last final step rename the files to the correct name.

This means the (WOOPS) Window of Opertunity Of Potential S****p is confined to the interval when the renames take place.

If the OS supports a nice directory structure and you lay out the files intelligently you can further refine this by copying the new files to a temp directory and renaming the directory so the WOOPS becomes the interval between "rename target to save" and "rename temp to target".

This gets even better if the OS supports Soft link directories then you can "ln -s target temp". On most OSes replacing a softlink will be an "atomic" operation which will work or not work without any messy halfway states.

All these options depend on having enough storage to keep a complete old and new copy on the file system.


Solution:2

Read up on database logging and database journal files.

A database (like Oracle) has very, very robust file writing. Do not actually use Oracle. Use their design pattern. The design pattern goes something like this. You can borrow these ideas without actually using the actual product.

  1. Your transaction (i.e., Insert) will fetch the block to be updated. Usually this is in memory cache, if not, it is read from disk to memory cache.

  2. A "before image" (or rollback segment) copy is made of the block you're about to write.

  3. You change the cache copy, write a journal entry, and queue up a DB write.

  4. You commit the change, which makes the cache change visible to other transactions.

  5. At some point, the DB writer will finalize the DB file change.

The journal is a simple circular queue file -- the records are just a history of changes with little structure to them. It can be replicated on multiple devices.

The DB files are more complex structures. They have a "transaction number" -- a simple sequential count of overall transactions. This is encoded in the block (two different ways) as well as written to the control file.

A good DBA assures that the control file is replicated across devices.

When Oracle starts up, it checks the control file(s) to find which one is likely to be correct. Others may be corrupted. Oracle checks the DB files to see which match the control file. It checks the journal to see if transactions need to be applied to get the files up to the correct transaction number.

Of course, if it crashes while writing all of the journal copies, that transaction will be lost -- not much can be done about that. However, if it crashes after the journal entry is written, it will probably recover cleanly with no problems.

If you lose media, and recover a backup, there's a chance that the journal file can be applied to the recovered backup file and bring it up to date. Otherwise, old journal files have to be replayed to get it up to date.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »