Tutorial :Paranoia, excessive logging and exception handling on simple scripts dealing with files. Is this normal?



Question:

I find myself using python for a lot of file management scripts as the one below. While looking for examples on the net I am surprised about how little logging and exception handling is featured on the examples. Every time I write a new script my intention is not to end up as the one below but if it deals with files then no matter what my paranoia takes over and the end result is nothing like the examples I see on the net. As I am a newbie I would like to known if this is normal or not. If not then how do you deal with the unknowns and the fears of deleting valuable info?

def flatten_dir(dirname):      '''Flattens a given root directory by moving all files from its sub-directories and nested       sub-directories into the root directory and then deletes all sub-directories and nested       sub-directories. Creates a backup directory preserving the original structure of the root      directory and restores this in case of errors.      '''      RESTORE_BACKUP = False      log.info('processing directory "%s"' % dirname)      backup_dirname = str(uuid.uuid4())      try:          shutil.copytree(dirname, backup_dirname)          log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))      except shutil.Error:          log.error('shutil.Error: Error while trying to back up the directory')          sys.stderr.write('the program is terminating with an error\n')          sys.stderr.write('press consult the log file\n')          sys.stderr.flush()          time.sleep(0.25)          print 'Press any key to quit this program.'          msvcrt.getch()          sys.exit()        for root, dirs, files in os.walk(dirname, topdown=False):          log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))          if root != dirname:              for file in files:                  full_filename = os.path.join(root, file)                  try:                      shutil.move(full_filename, dirname)                      log.debug('"%s" copied to directory "%s"' % (file,dirname))                  except shutil.Error:                      RESTORE_BACKUP = True                      log.error('file "%s" could not be copied to directory "%s"' % (file,dirname))                      log.error('flagging directory "%s" for reset' % dirname)              if not RESTORE_BACKUP:                  try:                      shutil.rmtree(root)                      log.debug('directory "%s" deleted' % root)                  except shutil.Error:                      RESTORE_BACKUP = True                      log.error('directory "%s" could not be deleted' % root)                      log.error('flagging directory "%s" for reset' % dirname)          if RESTORE_BACKUP:              break      if RESTORE_BACKUP:          RESTORE_FAIL = False          try:              shutil.rmtree(dirname)          except shutil.Error:              log.error('modified directory "%s" could not be deleted' % dirname)              log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)              RESTORE_FAIL = True           if not RESTORE_FAIL:              try:                  os.renames(backup_dirname, dirname)                  log.debug('back up of directory "%s" restored' % dirname)                  print '>'                  print '>******WARNING******'                  print '>There was an error while trying to flatten directory "%s"' % dirname                  print '>back up of directory "%s" restored' % dirname                  print '>******WARNING******'                  print '>'              except WindowsError:                  log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)                  log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))                  print '>'                  print '>******WARNING******'                  print '>There was an error while trying to flatten directory "%s"' % dirname                  print '>back up of directory "%s" was NOT restored successfully' % dirname                  print '>no information is lost'                  print '>check the log file for information on manually restoring the directory'                  print '>******WARNING******'                  print '>'      else:          try:              shutil.rmtree(backup_dirname)              log.debug('back up of directory "%s" deleted' % dirname)              log.info('directory "%s" successfully processed' % dirname)              print '>directory "%s" successfully processed' % dirname          except shutil.Error:              log.error('backup directory "%s" could not be deleted' % backup_dirname)              log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)              print '>'              print '>******WARNING******'              print '>directory "%s" successfully processed' % dirname              print '>cleanup of backup directory "%s" failed' % backup_dirname              print '>manual cleanup necessary'              print '>******WARNING******'              print '>'  


Solution:1

Learning to let go (or how I learned to live with the bomb)...

Ask yourself this: what exactly are you afraid of, and how will you handle it if it happens? In the example that you provide you want to avoid data-loss. The way that you've handled it is by looking for every combination of conditions that you think is an error and placing huge amounts of logging over it. Things will still go wrong and it's not clear that having a large amount of logging will be a good way to deal with it. Sketching out what you are trying to achieve:

for each file in a tree    if file is below the root      move it into the root  if nothing went wrong    delete empty subtrees  

So what kind of things could go wrong in this process? Well, there are many ways in which the move file operations could barf due to the underlying file-system. Can we list them all and provide nice ways to deal with them? No... but in general you are going to deal with them all the same way. Sometimes an error is just an error regardless of what it is.

So in this case if any error occurs then you want to abort and undo any changes. The way that you've decided to do that is by creating a backup copy and restoring it when something goes wrong. But your most likely error is the file-system being full in which case these steps are likely to fail.... Ok, so it is a common enough problem - if you are worried about unknown errors at any point how do you stop your restoration path from going wrong?

The general answer is make sure that you do any intermediate work first, and then take a single troublesome (hopefully atomic) step. In your case you need to flip your recovery around. Instead of building a copy as a backup, build a copy of the result. If everything succeeds you can then swap the new result in for the old original tree. Or, if you are really paranoid you can leave that step for a human. The advantage here is that if something goes wrong you can just abort and throw away the partial state that you have constructed.

Your structure then becomes :

make empty result directory  for every file in the tree    copy file into new result  on failure abort otherwise    move result over old source directory  

By the way, there is a bug in your current script that this psuedo-code makes more obvious: if you have files with identical names in different branches they will overwrite each other in the new flattened version.

The second point about this psuedo code is that all of the error handling is in the same place (ie wrap the make new directory and recursive copy inside a single try block and catch all the errors after it), this solves your original issue about the large ratio of logging / error-checking to actual work code.

backup_dirname = str(uuid.uuid4())  try:      shutil.mkdir(backup_dirname)      for root, dirs, files in os.walk(dirname, topdown=False):          for file in files:              full_filename = os.path.join(root, file)              target_filename = os.path.join(backup_dirname,file)              shutil.copy(full_filename, target_filename)  catch Exception, e:      print >>sys.stderr, "Something went wrong %s" % e      exit(-1)  shutil.move(back_dirname,root)      # I would do this bit by hand really  


Solution:2

It's ok to be a little paranoid. But there are different kinds of paranoia :). During the development phase, I use a lot of debug statements so I can see where I am going wrong (if I do go wrong). Sometimes I will leave these statements in, but use a flag to control whether they need to be displayed or not (pretty much a debug flag). You could also have a "verbosity" flag to control how much logging you do.

The other type of paranoia comes with sanity checks. This paranoia comes into play when you rely on external data or tools - pretty much anything that doesn't come out of your program. In this case, it never hurts to be paranoid (especially with data that you receive - never trust it).

It's also ok to be paranoid if you're checking to see if a particular operation completed successfully. This is just part of normal error-handling. I notice that you're performing functions like deleting directories and files. These are operations that could potentially fail, and so you must deal with the scenario where they fail. If you simply ignore it, your code could end up in an indeterminate/undefined state and could potentially do bad (or at the least, undesirable) things.

As far as the log files and debug files are concerned, you can leave them in if you wish. I usually perform a decent amount of logging; just enough to tell me what's going on. Of course, that is subjective. The key is to make sure you don't drown yourself in logging; where there is so much information that you can't pick it out easily. Logging in general helps you figure out what when wrong when a script you wrote suddenly stops working. Instead of stepping through the program to figure it out, you can get a rough idea of where the problem is by going through your logs.


Solution:3

Paranoia can definitely obscure what your code is trying to do. That is a very bad thing, for several reasons. It hides bugs. It makes the program harder to modify when you need it to do something else. It makes it harder to debug.

Assuming Amoss can't cure you of your paranoia, here is how I might rewrite the program. Note that:

  • Each block of code that contains a lot of paranoia is split out into its own function.

  • Each time an exception is caught, it is re-raised, until it is finally caught in the main function. This eliminates the need for variables like RESTORE_BACKUP and RESTORE_FAIL.

  • The heart of the program (in flatten_dir) is now just 17 lines long and paranoia-free.


def backup_tree(dirname, backup_dirname):      try:          shutil.copytree(dirname, backup_dirname)          log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))      except:          log.error('Error trying to back up the directory')          raise    def move_file(full_filename, dirname):      try:          shutil.move(full_filename, dirname)          log.debug('"%s" copied to directory "%s"' % (file,dirname))      except:          log.error('file "%s" could not be moved to directory "%s"' % (file,dirname))          raise    def remove_empty_dir(dirname):      try:          os.rmdir(dirname)          log.debug('directory "%s" deleted' % dirname)      except:          log.error('directory "%s" could not be deleted' % dirname)          raise    def remove_tree_for_restore(dirname):      try:          shutil.rmtree(dirname)      except:          log.error('modified directory "%s" could not be deleted' % dirname)          log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)          raise    def restore_backup(backup_dirname, dirname):      try:          os.renames(backup_dirname, dirname)          log.debug('back up of directory "%s" restored' % dirname)          print '>'          print '>******WARNING******'          print '>There was an error while trying to flatten directory "%s"' % dirname          print '>back up of directory "%s" restored' % dirname          print '>******WARNING******'          print '>'      except:          log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)          log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))          print '>'          print '>******WARNING******'          print '>There was an error while trying to flatten directory "%s"' % dirname          print '>back up of directory "%s" was NOT restored successfully' % dirname          print '>no information is lost'          print '>check the log file for information on manually restoring the directory'          print '>******WARNING******'          print '>'          raise    def remove_backup_tree(backup_dirname):      try:          shutil.rmtree(backup_dirname)          log.debug('back up of directory "%s" deleted' % dirname)          log.info('directory "%s" successfully processed' % dirname)          print '>directory "%s" successfully processed' % dirname      except shutil.Error:          log.error('backup directory "%s" could not be deleted' % backup_dirname)          log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)          print '>'          print '>******WARNING******'          print '>directory "%s" successfully processed' % dirname          print '>cleanup of backup directory "%s" failed' % backup_dirname          print '>manual cleanup necessary'          print '>******WARNING******'          print '>'          raise    def flatten_dir(dirname):      '''Flattens a given root directory by moving all files from its sub-directories and nested       sub-directories into the root directory and then deletes all sub-directories and nested       sub-directories. Creates a backup directory preserving the original structure of the root      directory and restores this in case of errors.      '''      log.info('processing directory "%s"' % dirname)      backup_dirname = str(uuid.uuid4())      backup_tree(dirname, backup_dirname)      try:          for root, dirs, files in os.walk(dirname, topdown=False):              log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))              if root != dirname:                  for file in files:                      full_filename = os.path.join(root, file)                      move_file(full_filename, dirname)                  remove_empty_dir(dirname)      except:          remove_tree_for_restore(dirname)          restore_backup(backup_dirname, dirname)          raise      else:          remove_backup_tree(backup_dirname)    def main(dirname):      try:          flatten_dir(dirname)      except:          import exceptions          logging.exception('error flattening directory "%s"' % dirname)          exceptions.print_exc()          sys.stderr.write('the program is terminating with an error\n')          sys.stderr.write('press consult the log file\n')          sys.stderr.flush()          time.sleep(0.25)          print 'Press any key to quit this program.'          msvcrt.getch()          sys.exit()  


Solution:4

That seems reasonable to me. It depends how important your data is.

I often start out like that, and have the logging be optional, with a flag set at the top of the file (or by the caller) setting the logging on or off. You could also have a verbosity.

Generally, after something has been working for a while and is no longer in development, I stop reading the logs, and build up giant log files which I never read. However, if something does go wrong, it's good to know that they're there.


Solution:5

If it's OK to leave the job half-done on error (only some files moved), as long as no files are lost, then the backup directory is unnecessary. So you can write dramatically simpler code:

import os, logging    def flatten_dir(dirname):      for root, dirs, files in os.walk(dirname, topdown=False):          assert len(dirs) == 0          if root != dirname:              for file in files:                  full_filename = os.path.join(root, file)                  target_filename = os.path.join(dirname, file)                  if os.path.exists(target_filename):                      raise Exception('Unable to move file "%s" because "%s" already exists'                                      % (full_filename, target_filename))                  os.rename(full_filename, target_filename)              os.rmdir(root)    def main():      try:          flatten_dir(somedir)      except:          logging.exception('Failed to flatten directory "%s".' % somedir)          print "ERROR: Failed to flatten directory. Check log files for details."  

Each individual system call here makes progress without destroying data that you wanted to keep. There's no need for a backup directory because there's never anything you need to "recover".


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »