Never use ORDER BY RAND() again!

Posted on March 5, 2008. Filed under: mysql+, PHP | Tags: , , , , , |

I have been guilty of using ORDER BY RAND() in MySQL queries to return random records from time to time, but everyone knows to avoid it like the plague. For those who agree with me, aside from using it on very small tables [1 – 500 records], I have an elegant workaround using PHP.


<?php
    $condition = true;
    while ($condition) {
        $randID = rand(1,$totalRecordsInTblExample);
        $r = mysql_query("SELECT * FROM `tblexample` WHERE `exampleID`=$randID LIMIT 1");
        if (mysql_num_rows($r) == 1) {
            $condition = false;
        }
    }
    $exampleTitle = mysql_result($r,0,"exampleTitle");
?>

The very first thing we need to know is how many records there are within the table in question, if the table is MyISAM, you’re in luck. Getting the total number of records is quick and inexpensive since MyISAM keeps an index of how many records a given table contains:


<?php
    $r = mysql_query("SELECT COUNT(*) AS `count` FROM `tblexample`");
    $numRecords = mysql_result($r,0,"count");
?>

However, don’t attempt this on an InnoDB table! InnoDB would need to count each and every row every time, which would be expensive on large tables. I keep a reference of record counts in a MyISAM table [for stats purposes], so I query that table to get the record count instead.

Once we have determined how many records are in the table, we can use PHP to [duh] select a random number between 1 and the total number of records.

Once we have that random number, query the DB to see if the record exists, and if not try again. Since the column in the where clause is the Primary Key [I assume], the lookups will be very fast and cheap.

Advertisements

Make a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

19 Responses to “Never use ORDER BY RAND() again!”

RSS Feed for Adventures in PHP / DHTML / CSS and MySQL Comments RSS Feed

“I have been guilty of using ORDER BY RAND() in MySQL queries to return random records from time to time, but everyone knows to avoid it like the plague”

I dont understand with your statement, what’s wrong with using ORDER BY RAND() ?

Using ORDER BY RAND() on medium to large table is extremely inefficient. Google “ORDER BY RAND” to see what I mean.

Hi there, My table doesn’t have any Id like 1,2,3, but last developer using employee ID for main Id (such as 2007.122, 2007.125, etc.). How to randomize it by row ordering? is it possible in MySQL?

You could apply the same concept to the LIMIT clause. If you know how many rows there are in the table, just append LIMIT $randID,1 … it wouldn’t be a PK lookup, but the performance difference would be trivial in most cases.

If you add/delete rows in a table the $totalRecordsInTbleExample could be lower than the exampleID field actual numbers. Also there could be missing numbers within that field. So there is possibility of not getting your random result. I don’t see the problem using ORDER BY RAND(), never has caused me a problem.

I’ve used this technique as well and it works great. Just wanted to add that for people who don’t have row primary ids like 1,2,3 etc due to deletes. You could easily just do “WHERE `exampleID` >=$randID LIMIT 1” This will grab where greater than if the random is missing and still be much faster than ORDER BY RAND() on larger data sets. :)

Hello!
Very Interesting post! Thank you for such interesting resource!
PS: Sorry for my bad english, I’v just started to learn this language ;)
See you!
Your, Raiul Baztepo

why use either? “Hey I just built this really cool function that randomly orders things!!” “Why did you take the time / energy to do that instead of just using what the language already provides?” “Because this function is super awesome, dude!” “Yeah ok. *fires you*”

getMyRows(); // or whatever
shuffle( $r );
output( $r ); // or whatever
?>

It’s more expensive ( or at least slower given intervening networking of any kind ) to keep getting one row @ a time than to just get them all once and reorder in memory once. Plus, if your developer is getting paid for time, it takes less time, and developers are generally more expensive than cpu cycles. By alot.

If the result set is something of a size that would make the above approach unfeasible, then it’s still better to have php deal with it instead of constantly pinging the db:

getMyStuff(); // or whatever. Point being I only went to the trough once.
$r = array();
$used = array();
$total = count( $temp );
while ( count( $used )

Not sure if that’s all technically correct, just coding it on the fly, but the idea is there.

For small sets, put the load on the db. For large sets, find inexpensive ways to put the load on php. Either way, 1 query per page is still the idealized goal. ( 10 is fine, 10000 is just stupid ). The above solution mixes the two, resulting in more load and latency than choosing either one or the other.

Follow that rule of thumb leaving room for judgment calls and you’ll be in alot better shape.

you seem to be mixing number of rows and maximum id, which only makes sense if you never delete anything. the two ways which make a little more sense:

1. get number of rows in table (count(*))
2. use LIMIT r,1. this is fast if you order by primary key

1. get max id (select id from table order by id desc limit 1)
2. use your row-hunting method

Agreed… I had forgotten to mention that this was intended for a table without any sequential PKs missing — which is where I’m using it. I’ve been meaning to overhaul this post with this in mind :-)

opps, didn’t mean to reply to the last comment. confusing!

How do you use this technqiue for queries with inner joins, left joins, right joins — simply counting the records and randomizing the ID just won’t work.

This is indeed a simple and elegant solution – if you don’t have holes in your exampleID. If some records may be missing then you have a problem. But that can be solved if you update exampleID the smart way (if you insert a record, put in max(exampleI)+1; if you delete it, get the record with highest exampleID and put the deleted record’s exampleID in it).

Also, your solution only retrieves 1 record from DB, not multiple. It can be changed to do so, though.

But not much can help you if you have a where constraint:
select * from mytable where somefile=3 and someotherfield=5 order by rand() limit 3;

I am still looking for a solution to that case. Currently I am assigning a rand() every few minutes to some field and selecting like this:
$rand=rand(1,99999999)’;
select * from mytable where somefield=3 and someotherfield=5 and randfield>=$rand order by randfield limit 1

If anyone has a better idea… Do share. :)

Here is the bug report, let’s hope MySQL developers step up and solve this issue for us so we don’t have to reinvent this functionality:
http://bugs.mysql.com/bug.php?id=33837

You can help by voting for it (hint ;).

I agree with you 100%… this is great if you don’t have holes in your PKs…. and now since I do [for the same application I wrote this for, I’m doing something like this:

maxPK = SELECT MAX(`id`) AS `maxPK` FROM `tbl`
minPK = SELECT MIN(`id`) AS `maxPK` FROM `tbl` -- Can't assume 1 is the lowest PK [primary key]
randPK = rand(minPK,numPK)
row = SELECT * FROM `tbl` WHERE `id`<=randPK -- worst case scenario, it returns the last record in the DB.

It might be better to use rand(1,numPK) and get rid of one select. Then if the last query fails you can rerun it with id>randPK.
In worst case this solution is equivalent to yours, in best case it saves one select.
Of course, this is only true because the selects are more or less equally expensive.

But this does not really matter, because the distribution of this approach is not even. If you have ids 1,2 and 4 then ‘2’ is twice as likely to be chosen. This is the reason I haven’t mentioned this approach – it is not random.

There are up to 10 different approaches, and they all suck one way or the other. :)

Another one is:
select count(*) from mytable
$rand=rand(1,numRec)
select * from mytable order by id limit $rand,1

BUT please don’t use this. Yes you can use ‘where’ with it. Yes you can have holes. Yes it is simple. Yes you can get more IDs at once, if you build a function that returns N different randoms from X values. Yes the distribution is even.
But it is also (probably) terribly inefficient. In worst case it is probably as inefficient as rand(). If you must take this approach, at least benchmark it. i haven’t done it because… Well, I think it is inefficient. :) But yes, MySQL could optimize that kind of query, so YMMV.

I’m still struggling to find a simple alternative to ORDER BY RAND(). Whats the answer?

Doesn’t work on tables that have from time to time skipped id values from deleting records.

This one stops the table scan as soon as 1000 rows have been found and therefore returns sooner. Of course this bogs down the randomness a bit, but perhaps this is good enough in your case.


Where's The Comment Form?

Liked it here?
Why not try sites on the blogroll...

%d bloggers like this: