Never use ORDER BY RAND() again!
I have been guilty of using ORDER BY RAND() in MySQL queries to return random records from time to time, but everyone knows to avoid it like the plague. For those who agree with me, aside from using it on very small tables [1 - 500 records], I have an elegant workaround using PHP.
<?php
$condition = true;
while ($condition) {
$randID = rand(1,$totalRecordsInTblExample);
$r = mysql_query("SELECT * FROM `tblexample` WHERE `exampleID`=$randID LIMIT 1");
if (mysql_num_rows($r) == 1) {
$condition = false;
}
}
$exampleTitle = mysql_result($r,0,"exampleTitle");
?>
The very first thing we need to know is how many records there are within the table in question, if the table is MyISAM, you’re in luck. Getting the total number of records is quick and inexpensive since MyISAM keeps an index of how many records a given table contains:
<?php
$r = mysql_query("SELECT COUNT(*) AS `count` FROM `tblexample`");
$numRecords = mysql_result($r,0,"count");
?>
However, don’t attempt this on an InnoDB table! InnoDB would need to count each and every row every time, which would be expensive on large tables. I keep a reference of record counts in a MyISAM table [for stats purposes], so I query that table to get the record count instead.
Once we have determined how many records are in the table, we can use PHP to [duh] select a random number between 1 and the total number of records.
Once we have that random number, query the DB to see if the record exists, and if not try again. Since the column in the where clause is the Primary Key [I assume], the lookups will be very fast and cheap.
“I have been guilty of using ORDER BY RAND() in MySQL queries to return random records from time to time, but everyone knows to avoid it like the plague”
I dont understand with your statement, what’s wrong with using ORDER BY RAND() ?
cevarief
March 6, 2008
Using ORDER BY RAND() on medium to large table is extremely inefficient. Google “ORDER BY RAND” to see what I mean.
pureform
March 6, 2008
Hi there, My table doesn’t have any Id like 1,2,3, but last developer using employee ID for main Id (such as 2007.122, 2007.125, etc.). How to randomize it by row ordering? is it possible in MySQL?
Irfander
June 2, 2008
You could apply the same concept to the LIMIT clause. If you know how many rows there are in the table, just append
LIMIT $randID,1… it wouldn’t be a PK lookup, but the performance difference would be trivial in most cases.pureform
June 3, 2008
If you add/delete rows in a table the $totalRecordsInTbleExample could be lower than the exampleID field actual numbers. Also there could be missing numbers within that field. So there is possibility of not getting your random result. I don’t see the problem using ORDER BY RAND(), never has caused me a problem.
joe
June 26, 2008
I’ve used this technique as well and it works great. Just wanted to add that for people who don’t have row primary ids like 1,2,3 etc due to deletes. You could easily just do “WHERE `exampleID` >=$randID LIMIT 1″ This will grab where greater than if the random is missing and still be much faster than ORDER BY RAND() on larger data sets. :)
Bill
February 10, 2009
Hello!
Very Interesting post! Thank you for such interesting resource!
PS: Sorry for my bad english, I’v just started to learn this language ;)
See you!
Your, Raiul Baztepo
RaiulBaztepo
March 28, 2009
why use either? “Hey I just built this really cool function that randomly orders things!!” “Why did you take the time / energy to do that instead of just using what the language already provides?” “Because this function is super awesome, dude!” “Yeah ok. *fires you*”
getMyRows(); // or whatever
shuffle( $r );
output( $r ); // or whatever
?>
It’s more expensive ( or at least slower given intervening networking of any kind ) to keep getting one row @ a time than to just get them all once and reorder in memory once. Plus, if your developer is getting paid for time, it takes less time, and developers are generally more expensive than cpu cycles. By alot.
If the result set is something of a size that would make the above approach unfeasible, then it’s still better to have php deal with it instead of constantly pinging the db:
getMyStuff(); // or whatever. Point being I only went to the trough once.
$r = array();
$used = array();
$total = count( $temp );
while ( count( $used )
Not sure if that’s all technically correct, just coding it on the fly, but the idea is there.
For small sets, put the load on the db. For large sets, find inexpensive ways to put the load on php. Either way, 1 query per page is still the idealized goal. ( 10 is fine, 10000 is just stupid ). The above solution mixes the two, resulting in more load and latency than choosing either one or the other.
Follow that rule of thumb leaving room for judgment calls and you’ll be in alot better shape.
yinrunning
April 8, 2009
you seem to be mixing number of rows and maximum id, which only makes sense if you never delete anything. the two ways which make a little more sense:
1. get number of rows in table (count(*))
2. use LIMIT r,1. this is fast if you order by primary key
1. get max id (select id from table order by id desc limit 1)
2. use your row-hunting method
cal
May 6, 2009
Agreed… I had forgotten to mention that this was intended for a table without any sequential PKs missing — which is where I’m using it. I’ve been meaning to overhaul this post with this in mind :-)
pureform
May 6, 2009
opps, didn’t mean to reply to the last comment. confusing!
cal
May 6, 2009
How do you use this technqiue for queries with inner joins, left joins, right joins — simply counting the records and randomizing the ID just won’t work.
AD
September 19, 2009
This is indeed a simple and elegant solution – if you don’t have holes in your exampleID. If some records may be missing then you have a problem. But that can be solved if you update exampleID the smart way (if you insert a record, put in max(exampleI)+1; if you delete it, get the record with highest exampleID and put the deleted record’s exampleID in it).
Also, your solution only retrieves 1 record from DB, not multiple. It can be changed to do so, though.
But not much can help you if you have a where constraint:
select * from mytable where somefile=3 and someotherfield=5 order by rand() limit 3;
I am still looking for a solution to that case. Currently I am assigning a rand() every few minutes to some field and selecting like this:
$rand=rand(1,99999999)’;
select * from mytable where somefield=3 and someotherfield=5 and randfield>=$rand order by randfield limit 1
If anyone has a better idea… Do share. :)
Here is the bug report, let’s hope MySQL developers step up and solve this issue for us so we don’t have to reinvent this functionality:
http://bugs.mysql.com/bug.php?id=33837
You can help by voting for it (hint ;).
Anze
November 11, 2009
I agree with you 100%… this is great if you don’t have holes in your PKs…. and now since I do [for the same application I wrote this for, I'm doing something like this:
maxPK = SELECT MAX(`id`) AS `maxPK` FROM `tbl`
minPK = SELECT MIN(`id`) AS `maxPK` FROM `tbl` -- Can't assume 1 is the lowest PK [primary key]
randPK = rand(minPK,numPK)
row = SELECT * FROM `tbl` WHERE `id`<=randPK — worst case scenario, it returns the last record in the DB.
pureform
November 11, 2009
It might be better to use rand(1,numPK) and get rid of one select. Then if the last query fails you can rerun it with id>randPK.
In worst case this solution is equivalent to yours, in best case it saves one select.
Of course, this is only true because the selects are more or less equally expensive.
But this does not really matter, because the distribution of this approach is not even. If you have ids 1,2 and 4 then ’2′ is twice as likely to be chosen. This is the reason I haven’t mentioned this approach – it is not random.
There are up to 10 different approaches, and they all suck one way or the other. :)
Another one is:
select count(*) from mytable
$rand=rand(1,numRec)
select * from mytable order by id limit $rand,1
BUT please don’t use this. Yes you can use ‘where’ with it. Yes you can have holes. Yes it is simple. Yes you can get more IDs at once, if you build a function that returns N different randoms from X values. Yes the distribution is even.
But it is also (probably) terribly inefficient. In worst case it is probably as inefficient as rand(). If you must take this approach, at least benchmark it. i haven’t done it because… Well, I think it is inefficient. :) But yes, MySQL could optimize that kind of query, so YMMV.
Anze
November 12, 2009
[...] [...]
Anonymous
July 25, 2010
I’m still struggling to find a simple alternative to ORDER BY RAND(). Whats the answer?
John Ryan
July 9, 2011
Doesn’t work on tables that have from time to time skipped id values from deleting records.
Matt DeKok
December 28, 2011
This one stops the table scan as soon as 1000 rows have been found and therefore returns sooner. Of course this bogs down the randomness a bit, but perhaps this is good enough in your case.
Mitchell X. Owens
January 31, 2013