Let's get a list of all duplicate postcodes (could be any string of course) out of a specific field in the database. This could be handled in PHP but, with a decently spec'd server especially, it's quicker to handle it in the database.
The basic query looks like this: (this example is from WordPress but the principle is the same regardless)
global $wpdb;
$meta_key = 'customer_address_postcode';
$postcode_counts = $wpdb->get_results( $wpdb->prepare(
"
SELECT meta_value as postcode, COUNT(*) as count
FROM $wpdb->postmeta
WHERE meta_key = %s
GROUP BY meta_value
HAVING count > 1
ORDER BY count DESC
",
$meta_key
) );
This will return an array of objects each with 2 values – postcode and count, in descending order of the number of times they occur, and only if they occur more than once.
However, spaces have no meaning in a string like this. BS1 3XX, BS13XX and even B S 1 3 X X are all the same postcode and need to be considered duplicates.
Ignoring spaces
I'm not a SQL expert, and there are a lot of functions that I have never heard of, but there is the very sensibly named REPLACE
function that replaces all occurances of a specified string within a string. It would perhaps be more common to see it used when updating the database, but we can use it to ignore spaces in our GROUP BY
.
global $wpdb;
$meta_key = 'customer_address_postcode';
$postcode_counts = $wpdb->get_results( $wpdb->prepare(
"
SELECT meta_value as postcode, COUNT(*) as count
FROM $wpdb->postmeta
WHERE meta_key = %s
GROUP BY REPLACE( meta_value, ' ', '' )
HAVING count > 1
ORDER BY count DESC
",
$meta_key
) );
This comes with a performance penalty of course but depending on how frequently you need to run the query, and if you can cache the results, it is an effective solution to matching like this. You could also add the same REPLACE into the SELECT if you want the results to be spaceless as well.
Leave a Reply