Favourite Quote :P

Justifications are the way of the world

Friday, April 1, 2011

Sorting a list of complicated tuples in Python

Consider a tuple of the form:

("URL of page", "Title of web page", ['word1' ,'word2', word3'], len(words), priority index - int)

where word1, word2 etc are the words in the web page, the priority index is an integer representing the priority of the web page.

We have a list of such tuples.
Example:
[
("www.yahoo.com", "Yahoo! - International", ['yahoo', 'mail', 'news', 'sports', 'messenger'], 5, 2)
("www.google.com", "Google Web search", ['google', 'mail', 'search', 'docs'], 4, 1)
("timesofindia.indiatimes.com", "Times of india - Latest news", ['timesofindia', 'news', 'india', 'sports', 'world'], 5, 3)
...
...
]

We are to sort this list based on the following criteria (in the order given):

  1. Length of words in the words list
  2. Priority index
  3. Alphabetical order of titles

Simplest way would be to make the tuples conform to this format:
(priority index, length of words, title , ... )
Python would by default sort the list using elements from the start to end when one uses list.sort() or sorted(list)

But  what if we wanted the sorting to be done so that the numbers are sorted in the descending order, while the titles are sorted in the ascending order?
Here's when a custom sort function like the one below would prove useful:

def tupleSort(t1, t2)
    #first sort according to word length
    t1wordlength = t1[3]
    t2wordlength = t2[3]
    if t1wordlength >t2wordlength : return 1
    elif t2wordlength >t1wordlength : return -1
    #next sort according to the priority index
    t1priority = t1[4]
    t1priority = t2[4]
    if t1priority >t2priority : return 1
    if t2priority >t1priority : return -1
    #finally if both of the above are same, then sort according to the alphabetical order of the title
    t1title = t1[1]
    t2title = t2[1]
    if t1title >t2title : return -1
    if t21title >t1title : return 1
    #if even that is same then the tuples match, so return 0
    return 0

Giving this funtion as the parameter to the cmp parameter of the list.sort() function of sorted() BIF with reverse parameter set to True would cause the list given on top to be sorted in the way we want.

No comments: