Tutorial :How to do a non-blocking URL fetch in Python



Question:

I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.

I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?

I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.


Solution:1

You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:

from multiprocessing import Pool    def fetch_url(url):      # Fetch the URL contents and save it anywhere you need and      # return something meaningful (like filename or error code),      # if you wish.      ...    pool = Pool(processes=4)  result = pool.map(f, image_url_list)  


Solution:2

As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.


Solution:3

As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.

Here is a tutorial on threading in python: http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf


Solution:4

Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.

import threading, time  class Ping(threading.Thread):      def __init__(self, multiple):          threading.Thread.__init__(self)          self.multiple = multiple      def run(self):          #sleeps 3 seconds then prints 'pong' x times          time.sleep(3)          printString = 'pong' * self.multiple    pingInstance = Ping(3)  pingInstance.start() #your run function will be called with the start function  print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1  print "Number of threads alive: %d" % threading.activeCount()  #main thread + class instance  time.sleep(3.5)  print "Number of threads alive: %d" % threading.activeCount()  print "pingInstance is alive?: %d" % pingInstance.isAlive()  #isAlive returns false when your thread reaches the end of it's run function.  #only main thread now  


Solution:5

You have these choices:

  • Threads: easiest but doesn't scale well
  • Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
  • Multiprocessing: hardest. Scales well if you know how to write your own event loop.

I recommend just using threads unless you need an industrial scale fetcher.


Solution:6

You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »