Perlfect Solutions
 

Sorting Techniques

Introduction

Sorting is a commonly needed operation in all kinds of programs. Luckily, for us perl programmers, perl provides a very simple yet extremely powerful mechanism to accomplish any sort you might think of. This article is about teaching the novice programmer how to sort lists of things, while showing to the more experienced folks certain techniques and ideas that could be new to them if they are migrating from a different language.

Moving to the meat of the matter staight away, we'll start from talking about comparison. Obviously, in order to put a list of things in order, you'll have to first define that order. Order is defined by how things compare to each other. If I give you two items from the list, can you tell me which one is bigger / better / nicer / sexier ... [insert you favourite adjective here] than the other? Or tell me thet they are both of equal order? Well, that's just about it! If you give me a list of items and promise me that you can answer this question for any pair of them, I can make a sorted list of them. All I have to do is take all possible pairs and ask you "how do these two compare?" and arrange them accordingly to finally come up with a sorted list. Actually there are even smarter ways to do it, minimising the amount of comparisons needed, but that is not an issue here, as we will see soon that perl performs that task for us, and we trust perl that it uses the least expensive method.

Now, the issue in question being comparison, I assume you must be familiar with all (or at least most) of perl's comparison operators. There's a list of them:

Numbers Strings
< lt
> gt
<= le
>= gr
== eq
<=> cmp

Now the first five rows should be ok, they're just like math. But what are the <=> and cmp operators? Basically, the expression $a <=> $b (or $a cmp $b for strings) returns one of the values 1, 0, -1 if $a is, respectively, larger, equal or lower than $b. (see table below)

Relation of $a and $b Value Returned by $a <=>
$a greater than $b 1
$a equal to $b 0
$a less than $b -1

Does that ring a bell? Coming to think of it, the <=> and cmp operators actually provide the answer to the question we were investigating earlier when we talked about how to sort by using a comparative criterion. So, if we already have an operator that answers this question ("how do two items compare?") all we need is a function that will take a list of items and perform the necessary comparisons to arrive at a sorted list. And guess what? That's exactly what perl's sort operator does. So, if you have an unsorted list @not_sorted and want to created a sorted list @sorted, you just say:

@sorted = sort { $a <=> $b } @not_sorted # numerical sort or @sorted = sort { $a cmp $b } @not_sorted # ASCII-betical sort or better @sorted = sort { lc($a) cmp lc($b) } @not_sorted # alphabetical sort

Looking at the sort function, we notice that it is exactly as we described it in words, earlier in this article. Perl needs just two things: the list of items to sort, and a subroutine that answers the question "how do these two compare?" for any pair of items in the list. Perl puts the two items it want to compare int the special variables $a and $b and your function is responsible to give a return value that corresponds to the existing relationship of the two, as shown in the table shown earlier.

Here, in this simple example, we used perl's built-in comparison operators that work for numerical and alphabetical (not realy... to be correct it is ASCII order) sorting. Of course, you can roll your own comparison function to create sorts for any kind of ordering you wish to have. Before you start coding your own functions, take a look to the following examples:

Get a list of hash keys sorted by value.

@sorted = sort { $hash{$a} cmp $hash{$b} } keys %hash;

Get a reverse sort of a list.

@sorted = sort { $b cmp $a } @list; Which can also be done with @sorted = reverse sort { $a cmp $b } @list;

Get an alphabetical sort of words, but make 'aardvark' always come last.

(Now, why you would want to do that is another question...)

@sorted = sort { if ($a eq 'aardvark') { return 1; } elsif ($b eq 'aardvark') { return -1; } else { return $a cmp $b; } } @words;

Digg! Save This Page

Comments

Dr Vee   

Posted at 5:39pm on Thursday, March 8th, 2007

Googled over a dozen of other pages, nothing comes close to this. THANKS ALOT!

sakthi   

Posted at 1:52am on Tuesday, March 13th, 2007

how to sort the the array of items which contains list of file names. these can distinguished by date

dre   

Posted at 2:18pm on Friday, March 23rd, 2007

yahooed one page, and this was it, i did not have to google 20 pages.

amsanjeev   

Posted at 12:34am on Saturday, March 31st, 2007

simple and to the point article . Thanks a lot.

Todd   

Posted at 1:34pm on Sunday, April 1st, 2007

Wonderful article, helped a lot.

willtriv   

Posted at 7:02am on Thursday, April 5th, 2007

reverse sorting was made very clear after seeing this example :)

Prabhu   

Posted at 3:07pm on Thursday, April 5th, 2007

sakthi,

I had an array like this:
$fileNames[0] = logfile200704050000
$fileNames[1] = logfile200704050050
$fileNames[2] = logfile200704050812
$fileNames[3] = logfile200704050225

and sorted them like this:

@sorted = sort { substr($a,8,12) substr($b,8,12) } @fileNames;

Prabhu   

Posted at 3:10pm on Thursday, April 5th, 2007

The website took out the operator on the last line. It should look like this without the spaces on either side of the '='
@sorted = sort { substr($a,8,12) < = > substr($b,8,12) } @fileNames;

Stan   

Posted at 1:33am on Wednesday, April 18th, 2007

Great page, very useful! Simplified some pretty tricky problems I was having.

Randal L. Schwartz   

Posted at 4:52am on Thursday, April 19th, 2007

I have plenty of examples of using Perl's sort (including the now-famous "Schwartzian Transform", named after me, but not BY me) at my 250-magazine articles available on my website. Use a google search of "sort site:stonehenge.com" (but without the quotes) and have fun reading.

Dan   

Posted at 2:07am on Friday, April 27th, 2007

How do you sort for example chapters in a book? They go like 1, 1.1, 1.1.1, 1.1.2, 1.2, 1.3, 2, 2.1, etc...
Numeric sort just won't put it in that order.

Al   

Posted at 7:15am on Friday, June 15th, 2007

Depending on how deep the book is, you could strip the .'s, then pad right with 0's so all values are 4 digits long. e.g:

1000 (was 1)
1100 (was 1.1)
1120 (was 1.1.2)
1200 etc
1300
2000
2100

So you need to strip the ".", then pad. Look for replace and sprintf.

Milind   

Posted at 8:34am on Friday, July 13th, 2007

Awesome article. Poured over ton of sites, and this article is like the most concise, to the point article. Read first 8 lines, implemented my sort function inline and I was good to go. You should bill my company!

Chris   

Posted at 2:54am on Wednesday, July 18th, 2007

schweeeeeeeeeeeeet!!!!!!!!!!!!

Anonymous   

Posted at 11:18am on Wednesday, July 18th, 2007

good job, also would be nice to add a paragraph about how the interpreter compares the strings.
for example
baaaaaa> azzzzzzz

mcb   

Posted at 1:54pm on Wednesday, July 25th, 2007

If you have a list of hash refs and you want to sort by two values (for example, sort primarily by people's ages, but if the ages are the same, sort alphabetically), do something like this:
@sorted = sort { ($a->{age} $b->{age}) || ($a->{name} cmp $b->{name}) } @not_sorted;

Willem Kernkamp   

Posted at 8:46am on Monday, July 30th, 2007

To the point and concise! Thanks a lot.

Paul   

Posted at 1:44am on Thursday, August 9th, 2007

"Get an alphabetical sort of words, but make 'aardvark' always come last.
(Now, why you would want to do that is another question...) "

This was exactly what I needed, thanks very much!

Daniel   

Posted at 12:14am on Wednesday, August 15th, 2007

You can also sort a hash of hashes (or hash of arrays or whatever) based on values in any subscript as such:
sort { $not_sorted{$a}{name} cmp $not_sorted{$b}{name} } keys %$not_sorted;

... useful where you might have multiple properties for each element, as it allows you to sort on any given property. Can of course be combined with mcb's method for sorting on multiple criteria.

Daniel   

Posted at 12:15am on Wednesday, August 15th, 2007

typo in the above, should be:
sort { $not_sorted{$a}{name} cmp $not_sorted{$b}{name} } keys %not_sorted;

oscii   

Posted at 11:46pm on Wednesday, August 15th, 2007

thank you! teh 1 minute bugfix before the release ;)

David   

Posted at 4:57am on Friday, August 17th, 2007

This is one kind of sort you did not talk about. You want to sort the following: Die Hard, Die Hard 2, Die Hard 3. The sort function will put Die Hard last. I do not want to add a 1 for Die hard. How would you make a sort to fix this problem? Thanks

David   

Posted at 4:27am on Tuesday, August 21st, 2007

So clear and so helpful! Thanks!

Daniel   

Posted at 7:44pm on Wednesday, August 22nd, 2007

There appears to be a mistake in the last section of the article (making 'aardvark' always come last). I've recently had to use something similar to this, and the way it's written here will actually force 'aardvark' to always come _first_, not last. Reverse the return values (-1, 1) to correct this.

perlfect   

Posted at 7:28am on Thursday, August 23rd, 2007

@Daniel: Thanks for spotting that. I have updated the article above.

Nubee   

Posted at 4:13pm on Wednesday, August 29th, 2007

How can I sort a sub-array based on the order another array?

David   

Posted at 7:39am on Tuesday, September 4th, 2007

My application requires sorting of a large hash (> 100,000,000), but I am only interested in the largest 50,000 entries. Is there an economical way to first get the largest 50,000 entries (all floats) and then sort only those rather than sorting the whole hash?

Pacman   

Posted at 11:35pm on Tuesday, October 2nd, 2007

To decide to which are the largest 50000 entry you need to short the whole thing first.

Martin   

Posted at 7:12am on Tuesday, October 9th, 2007

David, the most efficient algorithm to find the largest i elements out of n elements works in expected linear time O(n) (as opposed to at least O(n log n) for sorting). Pacman is wrong. The algorithm is similar to quick sort, partitioning the array repeatedly but always making just one recursive call to the half where the split should occur (in your case where the position 50000) is. You should check out the Intro to Algorithms book or find information on Selection (order statistics) on the web somewhere to get details. This procedure is significantly faster especially when you deal with such a huge array of 100 mil. elements.

Sujeet   

Posted at 1:20am on Wednesday, October 17th, 2007

How Sort numeric array or Character array without using sort function in Perl

gpm1982   

Posted at 12:51am on Friday, October 26th, 2007

"Get an alphabetical sort of words, but make 'aardvark' always come last.
(Now, why you would want to do that is another question...) "

Seriously, this helped me a lot for my taks. Thanks a lot

Brenda   

Posted at 7:03am on Tuesday, October 30th, 2007

So how do I get the sort to ignore the case of the characters?

Brenda   

Posted at 4:45am on Friday, November 2nd, 2007

Continuing on from the the previous comment I used @sorted = sort { lc $hash{$a} cmp lc $hash{$b} } keys %hash; to sort with upper and lower cases mixed and it does not work.

Perlfect   

Posted at 6:53am on Friday, November 2nd, 2007

@Brenda: Is your intention to sort the *values* of the hash according to its *keys* in a case insensitive way?

Perhaps what you intended to do is:
@sorted = sort { lc $a cmp lc $b } keys %hash;
which will give you the keys of the hash sorted as you desire.

TJ   

Posted at 10:37am on Thursday, November 29th, 2007

MCB - your comment really helped...thanks a TON

Sneha   

Posted at 11:19pm on Tuesday, January 15th, 2008

when sorting the list the words in uppercase are given highest priority than words in lowercase. why is it so???

John W. Krahn   

Posted at 5:57pm on Wednesday, January 16th, 2008

Your table of comparison operators has one error and one omission.

>= gr

That should be:

>= ge

And you are missing:

!= ne

HTH.

jhon   

Posted at 5:47am on Monday, January 21st, 2008

cool!!!!!!!

Mark T   

Posted at 3:14am on Thursday, January 31st, 2008

When sorting words and digits, do the words always appear first?

Matt   

Posted at 6:43am on Thursday, January 31st, 2008

I am trying to sort on a value within a hash of hashes. 'Trade' is the key to the value which is a number. Pls can anyone advise on the syntax?

The key is at the following level:
%tradeSummary{top25}{Trade}

foreach my $instrument( sort{$tradeSummary{top25}{Trade}{$a}$tradeSummary{top25}{$instrument}{Trade}{$b}} keys %{$tradeSummary{top25}} ){

Matt   

Posted at 7:36am on Thursday, January 31st, 2008

I've managed to resolve this referencing issue. The Trade value actually belonged in the first item of an array, so the following worked:

foreach my $instrument (sort {$tradeSummary{top25}{$b}[0]$tradeSummary{top25}{$a}[0]} keys %{$tradeSummary{top25}} ){

Andre   

Posted at 9:41pm on Thursday, February 21st, 2008

Fantastic article! Helped me in my Perl/CGI programming course.

Gaborik   

Posted at 7:53pm on Thursday, March 6th, 2008

Very nice article. Helped me a lot.

Anupam Nandan   

Posted at 4:47am on Wednesday, March 19th, 2008

gud work ..

almitra   

Posted at 11:14am on Friday, March 28th, 2008

I want to sort a file where the starting is the file name followed by the sequence
, how do i sort these two
>ugast 1234....
agtctgsft
> ecoli12k12
gcttctag

I just want the sequence and not the line staring with >

et   

Posted at 4:59am on Tuesday, April 22nd, 2008

how about random order:

sort { int(rand(3))-1 } @table

Thor Is my Real Name and I'm not Swedish   

Posted at 11:40am on Friday, April 25th, 2008

THANK YOU WRITER OF THIS POST

Anonymous   

Posted at 8:48am on Tuesday, April 29th, 2008

I am new to perl
how can we do sorting in hash value - currently using following code

for (my $i = 0 ; $i< $#docs ; $i++) {
for (my $j = 0 ; $j< $#docs-$i ; $j++) {
if( $docs[$j]->{"LAST_MODIFIED"} > $docs[($j+1)]->{"LAST_MODIFIED"} )
{
my $temp = $docs[$j+1];
@docs[$j+1] =$docs[$j] ;
@docs[$j] = $temp;
}
}
}
thanks

Hasan   

Posted at 11:17am on Wednesday, May 14th, 2008

Thanks.

me   

Posted at 11:48am on Friday, June 6th, 2008

google 1 time
string = "perl sort"
got me this link on 1 link! hauhauhuau
xD
it all depends on how is googling! ;)

raghu   

Posted at 5:01am on Tuesday, July 29th, 2008

excellent article on the cmp operator. thank you. :)

Anonymous   

Posted at 11:54pm on Monday, August 11th, 2008

How to remove duplicate entries from an array?

My task is to remove duplicate entries from $PATH

ken   

Posted at 12:22pm on Sunday, August 17th, 2008

sort by T#

perl god   

Posted at 11:30am on Tuesday, August 26th, 2008

nice website. I will tell my mommy about it.

Michael Martin   

Posted at 3:53am on Wednesday, August 27th, 2008

More About.com: Perl articles need to be written in a similar style and format. This is an excellent article.

vortex   

Posted at 2:46pm on Wednesday, August 27th, 2008

I want to sort the following alphanumeric list:
@list = {C20,RL20,C3,BAT10,C9,RL190,BAT8,C101,RL4}

I do:
@sorted = sort { $a cmp $b } @list;

The sorted result is:
BAT10,BAT8,C101,C20,C3,C9,RL190,RL20,RL4

Instead:
BAT8,BAT10,C3,C9,C20,C101,RL4,RL20,RL190

What I'm doing wrong ??

DUncan   

Posted at 4:08am on Wednesday, September 3rd, 2008

You're sorting on the string and, for instance, BAT1 < BAT8
and so BAT10 < BAT 8 the same way alps comes before alt in a dictionary.

What you want is natural sort (?) If you've not found it already, look at http://www.perlmonks.org/?node_id=466268

Anonymous   

Posted at 12:41pm on Monday, September 8th, 2008

My situation is slightly different than vortex's:

@datesearch = sort {$b $a}} @datesearch;

where each record in @datesearch is:

$yr$mon$day$name

so it sorts these two data points as follows:

07101418mm duration version
080906semroc space

Where as is should be the other way around. I tried the recommendations using substr($a,0,6), and cmp verses , but can't seem to influence change here.

Help!? and thanks.

Anonymous   

Posted at 1:12pm on Monday, September 8th, 2008

As long as I'm here and asking... does perl sort have a limit in the size that it compares? If I have have a text that is 256 charaters and only different on the last character.. will it sort it? Should I work to reduce the length of the strings to increase speed?

HrZ   

Posted at 12:44am on Tuesday, September 30th, 2008

Hi
How should i sort array @fxed=[1,23][2,21][3,87]...
[occurance,number]
Array is populated on loop ($fxed[$i])=join(",",$i,$e[$i]);

Jaime   

Posted at 8:16am on Monday, October 6th, 2008

Help, I have these loops.....

foreach $i (0..$counter){
foreach $j ($i..$counter2){
open DAT, ">>$vec2[$j][2].dat";
push (@data, $vec2[$j][3], $vec1[$i][1]);
printf DAT "$vec2[$j][3] $vec1[$i][1]n";
last;
}
}

And it brings me out these info into a file :
1 391.692
4 -167.747
2 2.82568
3 200

What I want its that info sorted from min to max
1 391.692
2 2.82568 and so on, etc etc....

I have tried this inside the loops but nothing happens! please what can I do? THANKS A LOT !

@data_s= sort {$a->[0]$b->[0]}@data;
printf DAT $data_s[$j][$i];

yakeen   

Posted at 2:04pm on Wednesday, October 8th, 2008

hi can any one sort the numbers without using sorting techniques

Daniel   

Posted at 4:24pm on Thursday, October 9th, 2008

@Jamie:
First thing I notice is the unconditional 'last' call in the inner loop, which would seem to defeat the purpose of even having an inner loop. The following code is functionally equivalent to what you had above:

foreach $i (0..$counter){
open DAT, ">>$vec2[$i][2].dat";
push (@data, $vec2[$i][3], $vec1[$i][1]);
printf DAT "$vec2[$i][3] $vec1[$i][1]n";
}

However I suspect that this would be a lot simpler if you:
A) drop the counter variables and loop over you array elements directly (after sorting as required), optionally use the 'for $i ( 0 .. $#vec2[.. )' construct if you really need to know $i
B) drop the [3] element in the final subscript of your array - it appears to just be a line number for the output, which would be better to generate inside the print loop rather than have it tied to the array.

Keep in mind this is based on some assumptions which may not be correct, so you might be better to go back and read the following tutorials at perldoc.perl.org:
http://perldoc.perl.org/perldsc.html
http://perldoc.perl.org/perllol.html

Sorry I didn't answer your question directly, but I really think that without a better understanding of the topics discussed in those tutorials above you'll just run into more problems.

Cheers,
Daniel

Daniel   

Posted at 4:37pm on Thursday, October 9th, 2008

@yakeeen & Sujeet:
You're asking if you can sort without sorting? I'm confused. If you want a list sorted, you use these sorting techniques. You can of course roll-your-own sorting method instead, but using perl's native 'sort' function is by far the quickest, easiest and most efficient, so in almost all cases you are best to use it. Read http://perldoc.perl.org/functions/sort.html and then this page again if you don't understand how best to use it.

Jonathan Leto   

Posted at 11:06pm on Friday, October 10th, 2008

If you are doing numeric sorts you can get a 2x speedup by using Math::GSL::Sort (part of Math::GSL), which also has functions like "find the k smallest" and "find the k largest".

Cheers

pravenn   

Posted at 10:57pm on Wednesday, October 22nd, 2008

i have one big file contain data like
Term0|Term2: 0000023|0000103 0044237 2 0.463338365057
Term0|Term3: 0000023|0000105 0044238 2 0.422143662
Term0|Term4: 0000023|0000154 0044238 2 0.422143662
Term0|Term5: 0000023|0000160 0008150 0 1.0
Term0|Term6: 0000023|0000162 0044238 2 0.422143662
Term0|Term8: 0000023|0000256 0044237 2 0.463338365057
Term0|Term9: 0000023|0000270 0005975 3 0.0445726374991
Term0|Term10: 0000023|0000271 0005975 3 0.0445726374991
Term0|Term11: 0000023|0000272 0005975 3 0.0445726374991
Term0|Term12: 0000023|0000746 0008150 0 1.0
Term0|Term13: 0000023|0000902 0008150 0 1.0
Term0|Term14: 0000023|0000917 0008150 0 1.0
Term0|Term15: 0000023|0000918 0008150 0 1.0
Term0|Term16: 0000023|0001514 0044238 2 0.422143662
Term0|Term17: 0000023|0001522 0044238 2 0.422143662
Term0|Term18: 0000023|0001539 0008150 0 1.0

I wants to extract the values in between 0 to 0.05
The value meens the last Column value....

frinz rodney   

Posted at 9:50pm on Wednesday, November 12th, 2008

hi!!!!!!!!!!!!!!!!!!!!!!!good job..............
thnkz 4 ur comcrn

graeme   

Posted at 2:39pm on Thursday, November 20th, 2008

cheers!

paul   

Posted at 5:54pm on Thursday, December 11th, 2008

thanks for really nice article!
i m facing a little prblem now. pls kindly help me.
i want to sort list which contanins lists as folows

@data1=(12000,"cccc");
@data2=(10000,"dddd");
@data3=(90000,"bbbb");
@data4=(1200,"aaaa");

@mainArray=(@data1,@data2,@data3,@data4);

@list2 = sort customerSortByName (@mainArray);

pls kindly reply me how can i sort this?

thanks again..

lawson   

Posted at 10:51am on Monday, January 5th, 2009

OK, how about this one:

I have a %hash where the keys are strings, and the values are integers. I want to turn this into a sorted @list based on the integer values stored in %hash, largest to smallest.

Example:
${"key1"} = 1;
${"key2"} = 2;
${"key3"} = 3;
...
${"keyN"} = N;

The integer values are conveniently chosen in the example for illustration purposes only. Any valid integer could be stored in any location of %hash;

sorting should yield the following:

$list[0] = N;
$list[1] = N-1;
$list[2] = N-2;
...
$list[N-1] = 1;

I'm sure this is probably easy, but it's my first day back on the job since the holidays, so my head is still elsewhere. :-)

Thanks for the help!

Perlfect   

Posted at 11:08am on Monday, January 5th, 2009

sort {$b <=> $a} values %hash; should do it.

lawson   

Posted at 11:18am on Monday, January 5th, 2009

Oops! The sorted result actually needs to contain the keys (not the stored integers), and so should look like the following:

$list[0] = "keyN";
...
$list[N-3] = "key3";
$list[N-2] = "key2";
$list[N-1] = "key1";

The result doesn't need to be a @list. Another %hash is OK, as long as the keys in the new %hash are (obviously) unique and easily sorted, such as integers (but not the stored integers of the original hash, as they are not necessarily unique):

$hash2{0} = "keyN";
...
$hash2{N-3} = "key3";
$hash2{N-2} = "key2";
$hash2{N-1} = "key1";

Sorry for mis-stating the desired result of the original problem.

Perlfect   

Posted at 11:22am on Monday, January 5th, 2009

So you want to sort the keys of the hash in decreasing value order:

sort {$hash{$b} <=> $hash{$a}} keys %hash;

lawson   

Posted at 4:15pm on Monday, January 5th, 2009

I'm sorry. I'm not stating the problem clearly enough.

I'm going to use a very concrete example instead of trying to state the problem in the abstract.

Suppose I go to the zoo and count the animals. The keys for %hash are the animal names, and the values stored in %hash are the counts for each type of animal:

$hash{"monkey"} = 12
$hash{"duck"} = 3
$hash{"horse"} = 9
$hash{"pig"} = 27
$hash{"alligator"} = 9

I want to output the animal names based on the number of animals I counted of each type, ranked most to least:

$list[0] = "pig"; # 27
$list[1] = "monkey"; # 12
$list[2] = "horse"; # 9
$list[3] = "alligator"; # 9
$list[4] = "duck"; # 3

The "horse" and "alligator" entries may be swapped and still yield a valid result, as they both have an assigned count of 9.

Hope this is more clear. Sorry for the confusion.

Perlfect   

Posted at 6:32pm on Monday, January 5th, 2009

Look at my previous response. It should so exactly what you want, ie, return a list of animals ordered by decreasing count.

Comments to date: 75.

Your name:
Your comments:

Security check *

 

Like it? Share it!

  Post to del.icio.us
Post to
del.icio.us
   

Hosted Perlfect Search(beta)

New
Don't have the time or the expertise to install and maintain Perlfect Search? Then our freehosted Pelrfect Search service is for you!

Suggested Reading

Effective Perl Programming Effective Perl Programming is a thoroughly involved and enjoyable book about perl programming. This book does not merely show you how to get your job done, but also shows you effective and elegant solutions to perl programming problems. Effective perl programming is not a manual. It is a book about good programming in the mindset of the perl language. If you like perl and you enjoy programming then you are sure to like this book.

The Regex Tutor

Try your regular expressions online with the regex tutor.