The other day I was asked if there was a way to sort a dictionary by value. If you use Python regularly, then you know that the dictionary data structure is by definition an unsorted mapping type. Some would define a dict as a hash table. Regardless, I needed a way to sort a nested dictionary (i.e. a dictionary of dictionaries) based on a value in the nested dictionaries so I could iterate over the keys in the specified order. We’ll spend some time looking at an implementation I found.
After Googling for ideas, I came across an answer on StackOverflow that did most of what I wanted. I had to modify it slightly to make it sort using my nested dictionary values though, but that was surprisingly easy. Before we get to the answer, we should take a quick look at the data structure. Here is a variation of the beast minus the private parts that were removed for your safety:
mydict = {'0d6f4012-16b4-4192-a854-fe9447b3f5cb': {'CLAIMID': '123456789', 'CLAIMDATE': '20120508', 'AMOUNT': '365.64', 'EXPDATE': '20120831'}, 'fe614868-d0c0-4c62-ae02-7737dea82dba': {'CLAIMID': '45689654', 'CLAIMDATE': '20120508', 'AMOUNT': '185.55', 'EXPDATE': '20120831'}, 'ca1aa579-a9e7-4ade-80a3-0de8af4bcb21': {'CLAIMID': '98754651', 'CLAIMDATE': '20120508', 'AMOUNT': '93.00', 'EXPDATE': '20120831'}, 'ccb8641f-c1bd-45be-8f5e-e39b3be2e0e3': {'CLAIMID': '789464321', 'CLAIMDATE': '20120508', 'AMOUNT': '0.00', 'EXPDATE': ''}, 'e1c445c2-5148-4a08-9b7e-ff5ed51c43ed': {'CLAIMID': '897987945', 'CLAIMDATE': '20120508', 'AMOUNT': '62.66', 'EXPDATE': '20120831'}, '77ad6dd4-5704-4060-9c38-6a93721ef98e': {'CLAIMID': '23212315', 'CLAIMDATE': '20120508', 'AMOUNT': '41.05', 'EXPDATE': '20120831'} }
Now we know what we’re dealing with. Let’s take a quick look at the slightly modified answer I came up with:
sorted_keys = sorted(mydict.keys(), key=lambda y: (mydict[y]['CLAIMID']))
That’s a pretty spiffy one-liner, but I think it’s a little confusing. Here’s my understanding of how it works. The sorted function sorts a list (the dict’s keys) based on the key, which in this case is an anonymous function (the lambda). The anonymous function is passed the dictionary plus one of the outer keys and the inner key we want to sort on, which in this case is ‘CLAIMID’. Once it’s sorted, it returns the new list. Personally I find lambdas a little confusing, so I usually spend a little time deconstructing them into a named function just so I can understand them a little better. So without further ado, here’s a function version of the same script:
#---------------------------------------------------------------------- def func(key): """""" return mydict[key]['CLAIMID'] sorted_keys = sorted(mydict.keys(), key=func) for key in sorted_keys: print mydict[key]['CLAIMID']
And just for fun, let’s write a script that can sort the nested dictionary by ANY of the keys inside it.
mydict = {'0d6f4012-16b4-4192-a854-fe9447b3f5cb': {'CLAIMID': '123456789', 'CLAIMDATE': '20120508', 'AMOUNT': '365.64', 'EXPDATE': '20120831'}, 'fe614868-d0c0-4c62-ae02-7737dea82dba': {'CLAIMID': '45689654', 'CLAIMDATE': '20120508', 'AMOUNT': '185.55', 'EXPDATE': '20120831'}, 'ca1aa579-a9e7-4ade-80a3-0de8af4bcb21': {'CLAIMID': '98754651', 'CLAIMDATE': '20120508', 'AMOUNT': '93.00', 'EXPDATE': '20120831'}, 'ccb8641f-c1bd-45be-8f5e-e39b3be2e0e3': {'CLAIMID': '789464321', 'CLAIMDATE': '20120508', 'AMOUNT': '0.00', 'EXPDATE': ''}, 'e1c445c2-5148-4a08-9b7e-ff5ed51c43ed': {'CLAIMID': '897987945', 'CLAIMDATE': '20120508', 'AMOUNT': '62.66', 'EXPDATE': '20120831'}, '77ad6dd4-5704-4060-9c38-6a93721ef98e': {'CLAIMID': '23212315', 'CLAIMDATE': '20120508', 'AMOUNT': '41.05', 'EXPDATE': '20120831'} } outer_keys = mydict.keys() print "outer keys:" for outer_key in outer_keys: print outer_key print "*" * 40 inner_keys = mydict[outer_key].keys() for key in inner_keys: sorted_keys = sorted(mydict.keys(), key=lambda y: (mydict[y][key])) print "sorted by: " + key print sorted_keys for outer_key in sorted_keys: print mydict[outer_key][key] print "*" * 40 print
This code works, but it doesn’t give the results I expected. Try running this and you’ll notice that the output is kind of weird. The sorting is being done on strings, so all the values that look like numbers are sorted like strings. Oops! Most people would want the numbers sorted like numbers, so we need to do a quick conversion of the number-like values into integers or floats. Here’s the final version of the code (yes, it’s a little sloppy):
mydict = {'0d6f4012-16b4-4192-a854-fe9447b3f5cb': {'CLAIMID': '123456789', 'CLAIMDATE': '20120508', 'AMOUNT': '365.64', 'EXPDATE': '20120831'}, 'fe614868-d0c0-4c62-ae02-7737dea82dba': {'CLAIMID': '45689654', 'CLAIMDATE': '20120508', 'AMOUNT': '185.55', 'EXPDATE': '20120831'}, 'ca1aa579-a9e7-4ade-80a3-0de8af4bcb21': {'CLAIMID': '98754651', 'CLAIMDATE': '20120508', 'AMOUNT': '93.00', 'EXPDATE': '20120831'}, 'ccb8641f-c1bd-45be-8f5e-e39b3be2e0e3': {'CLAIMID': '789464321', 'CLAIMDATE': '20120508', 'AMOUNT': '0.00', 'EXPDATE': ''}, 'e1c445c2-5148-4a08-9b7e-ff5ed51c43ed': {'CLAIMID': '897987945', 'CLAIMDATE': '20120508', 'AMOUNT': '62.66', 'EXPDATE': '20120831'}, '77ad6dd4-5704-4060-9c38-6a93721ef98e': {'CLAIMID': '23212315', 'CLAIMDATE': '20120508', 'AMOUNT': '41.05', 'EXPDATE': '20120831'} } outer_keys = mydict.keys() print "outer keys:" for outer_key in outer_keys: print outer_key print "*" * 40 inner_keys = mydict[outer_key].keys() for outer_key in outer_keys: for inner_key in inner_keys: if mydict[outer_key][inner_key] == "": continue try: mydict[outer_key][inner_key] = int(mydict[outer_key][inner_key]) except ValueError: mydict[outer_key][inner_key] = float(mydict[outer_key][inner_key]) for key in inner_keys: sorted_keys = sorted(mydict.keys(), key=lambda y: (mydict[y][key])) print "sorted by: " + key print sorted_keys for outer_key in sorted_keys: print mydict[outer_key][key] print "*" * 40 print
So now we have it sorted in a way that’s more natural to human perceptions. Now there’s one other way we could do this and that’s sorting the data the way we want to BEFORE we put it into our data structure. However, that will only work if we use an OrderedDict from the collections module starting in Python 2.7. You can read about it in the official documentation.
Now you know what I know about this topic. I’m sure my readers will have other solutions or ways to do it too. Feel free to mention them or link to them in the comments.
Further Reading
I’m not really sure what you mean by “sorting a dictionary” since a dictionary is not an ordered data structure. I guess you mean iterating either keys and/or values in some sorted order. In any case, if I understand what you are doing, you can side-step the sorting of the main keys by a field in the value dict and just sort the values you want:
  print sorted(float(_[‘CLAIMID’]) for _ in mydict.values())
This just extracts all the CLAIMID values from the sub-dict and sorts them as floats. Likewise you could do it by AMOUNT or any other field:
  print sorted(float(_[‘AMOUNT’]) for _ in mydict.values())
There is also the extremely useful itemgetter function from the operator module.
for thing in sorted(mydict.values(), key=itemgetter(key)):Â Â Â Â print thing
This does have the disadvantage of not keeping the keys of the original dictionary, but I find I rarely need them, which is why I keep this sort of data in xml, or as a list of dictionaries.
@Josh – I think I saw something about that while I was working on my article. Thanks for bringing it up.
Any advantage(s) over using sorted_dict = sorted(myDict, key=myDict.get), as suggested in the same Stack Overflow post?
…lambdas scare me.