Tutorial :A question on python sorting efficiency



Question:

Alright so I am making a commandline based implementation of a website search feature. The website has a list of all the links I need in alphabetical order.

Usage would be something like

./find.py  LinkThatStartsWithB  

So it would navigate to the webpage associated with the letter B. My questions is what is the most efficient/smartest way to use the input by the user and navigate to the webpage?

What I was thinking at first was something along the lines of using a list and then getting the first letter of the word and using the numeric identifier to tell where to go in list index.

(A = 1, B = 2...) Example code:

#Use base url as starting point then add extension on end.  Base_URL = "http://www.website.com/"    #Use list index as representation of letter  Alphabetic_Urls = [         "/extensionA.html",         "/extensionB.html",         "/extensionC.html",         ]  

Or would Dictionary be a better bet?

Thanks


Solution:1

How are you getting this list of URLS?

If your commandline app is crawling the website for links, and you are only looking for a single item, building a dictionary is pointless. It will take at least as long to build the dict as it would to just check as you go! eg, just search as:

for link in mysite.getallLinks():      if link[0] == firstletter:          print link  

If you are going to be doing multiple searches (rather than just a single commandline parameter), then it might be worth building a dictionary using something like:

import collections  d=collections.defaultdict(list)  for link in mysite.getallLinks():      d[link[0]].append(link)             # Dict of first letter -> list of links    # Print all links starting with firstletter  for link in d[firstletter]:      print link  

Though given that there are just 26 buckets, it's not going to make that much of a difference.


Solution:2

The smartest way here will be whatever makes the code simplest to read. When you've only got 26 items in a list, who cares what algorithm it uses to look through it? You'd have to use something really, really stupid to make it have an impact on performance.

If you're really interested in the performance though, you'd need to benchmark different options. Looking at just the complexity doesn't tell the whole story, because it hides the factors involved. For instance, a dictionary lookup will involve computing the hash of the key, looking that up in tables, then checking equality. For short lists, a simple linear search can sometimes be more efficient, depending on how costly the hashing algorithm is.

If your example is really accurate though, can't you just take the first letter of the input string and predict the URL from that? ("/extension" + letter + ".html")


Solution:3

Dictionary! O(1)


Solution:4

Dictionary would be a good choice if you have (and will always have) a small number of items. If the list of URL's is going to expand in the future you will probably actually want to sort the URL's by their letter and then match the input against that instead of hard-coding the dictionary for each one.


Solution:5

Since it sounds like you're only talking about 26 total items, you probably don't have to worry too much about efficiency. Anything you come up with should be fast enough.

In general, I recommend trying to use the data structure that is the best approximation of your problem domain. For example, it sounds like you are trying to map letters to URLs. E.g., this is the "A" url and this is the "B" url. In that case, a mapping data structure like a dict sounds appropriate:

html_files = {      'a': '/extensionA.html',      'b': '/extensionB.html',      'c': '/extensionC.html',  }  

Although in this exact example you could actually cheat it and skip the data structure altogether -- '/extension%s.html' % letter.upper() :)


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »