bnks.xyz

Menu

Category: Blog

Mapping series – an aside, a note on views in MySQL

While working on a project recently – a mapping one which required complex queries – I worked more with MySQL views. They undeniably make looking at complex data sources easier – you can see the aggregated data while masking the complexity of queries behind them. However they have 2 significant downsides:

  1. They do not actually make the queries any faster. Without anything additional, they are just a presentational aid that makes things appear simpler.
  2. They store the username and password that initialises them; this means that when migrating the database – from staging to live for example – if the same credentials don’t exist on both locations you will get errors. Nothing insurmountable, but care must be taken.

That’s not to say there’s anything wrong with views. However this experience gave me the idea that their place, in the work I do, should be very limited. When you are writing both the database and code to call it, a well written interface provides all the benefits of abstraction with none of the risk.

Mapping series – MySQL queries against latitude and longitude

I’m starting this series with the simplest aspect – a MySQL query to return rows based on radius from a point.
This is well trodden territory, although not always explained clearly.

The maths of the query is all done for you, all you need is to decide if you are using miles of kilometres, and supply a radius in the same units, along with a latitude and longitude for the centre of the search.

Units of search

Consistence matters here, rather than whether you choose miles or kilometres.
If you choose miles, then the earths radius value is 3959, and for kilometres use 6371. Then you must ensure that the radius is the in the same units otherwise you won’t get the results you expect.

The MySQL query

Here is an example query; select an ID, the latitude, longitude and distance from the centre of the search, ordered by distance from the centre of the search.

SELECT
	'ID',
	'latitude' as 'lat',
	'longitude' as 'lng',
	(
		{earths radius - miles or kilometres} * acos(
			cos(
				radians( {center for search - latitude} )
			) * cos(
				radians(
					'latitude'
				)
			) * cos(
				radians(
					'longitude'
				) - radians( {center for search - longitude} )
			) + sin(
				radians( {center for search - latitude} )
			) * sin(
				radians(
					'latitude'
				)
			)
		)
	) AS distance
FROM data_table
HAVING distance < {radius for search - units to match earths radius}
ORDER BY distance

I won’t attempt to explain the math of it – for that I suggest you read up on the Haversine formula – but implementation is very straight forward as you can see.

In a later part fo this series I’ll show how this can modified to work with data stored in WordPress and it’s table structure, as well as complicate it further with an Advanced Custom Fields repeater system.

Mapping series – Working with Algolia Places address autocompletion api

For a recent mapping project I implemented Algolia Places for address autocompletion to turn an address into latitude and longitude for querying the database. In the past we’ve used Google Maps, but since this project wasn’t using Google Maps for the map display, using the Places API just for Geocoding is against their terms of use. It turns out this was a blessing in disguise – Algolia is fast, easy to implement, and very affordable. There was however one small hitch – the documentation get’s a bit patchy when you go past a basic implementation. To be fair to them – it’s actually because they start presuming that you’ll be using their multipurpose algoliasearchLite.js library rather than the simpler places.js.

Setting up autocompletion

The example from the documentation only needs a small extension to setup – populating hidden latitude and longitude fields from the returned data by using the ‘change’ event:

<script>
var places = places({
    appId: 'YOUR_PLACES_APP_ID',
    apiKey: 'YOUR_PLACES_API_KEY',
    container: address_input
});
places.on('change', function(e) {
    address_input.textContent = e.suggestion.value
    latitude_input.value = e.suggestion.latlng.lat || '';
    longitude_input.value = e.suggestion.latlng.lng || '';
});
</script>

Reverse geocoding

To give users a number of options, we also provided a geolocation button that uses the Geolocation API to let them search using the location reported by their system. The API returns latitude and longitude coordinates. While this is all that is needed to query the database – the UX isn’t ideal as it wouldn’t give a user readable representation of the location. This is important in case the returned location is wrong. Converting the coordinates into an address is called reverse geocoding.

The Places documentation has an example of reverse geocoding but unfortunately this is one that uses the wrong library. While there isn’t official support for Places, Algolia staff do monitor StackOverflow and help where they can. Luckily one such employee, Oliver, saw my query and got me on the right track.

To make a query you pass a compound latitude/longitude string value, and then an object of any options you want to change. For example:

places.reverse(
    '52.817078,-4.5697185',
    { hitsPerPage: 1 }
)

In another difference from algoliasearchLite.js – the response when using places.js is just the array of results. This makes utilising the results trivial. For example:

places.reverse(
    '52.817078,-4.5697185',
    { hitsPerPage: 1 }
).then(function(response){
    var suggestion = response[0];
    if (suggestion && (suggestion.suburb || suggestion.city)) {
        address_input.value = suggestion.suburb || suggestion.city || suggestion.county;
        address_input.value += ', ' + suggestion.country;
    }
});

Here I’ve chosen to populate the address text input field with the town (aka suburb) if available, and then the country. This gives enough information to orientate the user with what search is being done, without distracting them with the potentially/likely inaccurate house number and road level data.

From my experience so far I’d highly recommend you evaluate Algolia Places for your next autocompletion or geocoding project. The only downside I’ve found, common to all providers that rely on OSM data, is that you can’t reliable search by UK postcodes. In a subsequent post I’ll cover implementing getAddress.io – an API to turn UK postcodes into addresses using the Royal Mail PAF data.

New theme, dark mode, upcoming blogs

Trying to give myself a nudge to complete some 1/2 written posts, and write the unwritten ones, I decided to switch out the theme (again).

I’ve gone with a very minimal theme called Susty by Jack Lenox. It’s a very interesting concept – bytes matter, not just to be the fastest loading or get the highest score on a metric, but because websites have an impact on the planet. Datacentres use a lot of power, carbon, and land. Not to mention the impact on individual users – loading time affects how long a device has to be used as well as how long the radio has to be active (and so how much power is used). Design also has an impact, with lighter colours and low contrast needing more brightness from the screen to be readable. And don’t forget those bytes – unreliable and slow connections are common (“near dial-up speeds” are still common in the U.S.A.), and it’s well known that “users hate slow sites“. Leaner is better.

So. This is Susty – well, my fork of Susty. So far I’ve made some minor aesthetic changes, some accessibility changes, and introduced support for ‘dark mode’.

Dark mode

In technical terms this was just a few lines of CSS using the experimental prefers-color-scheme media query. This is still in the draft stage, but with popularity for dark modes shooting up after Mac OS 10.14 introduced them OS-wide, and support already in place in Firefox and Safari, it looks likely to stay. If your browser reports that you prefer a dark interface, the colour scheme of the site changes (basically inverts) automatically. I’m considering inverting this so dark is the default, but that’s for another day.

Plans for Susty

I have a couple of items on my todo list for Susty –

There are no guarantees – but I hope to keep some momentum and have my Susty polished to my liking over the next month.

Future blogs

In no particular order, these should get published over the next 3 months:

Exclude certain posts from search

Excluding posts (or custom post types) from search results when they meet certain criteria is easy in theory. Just use the pre_get_posts action to modify the WP_Query arguments used to get the results. However there are a couple of small pitfalls to watch out for. Below are brief explanations and snippets for excluding posts based on a taxonomy term and a custom meta field value.

Excluding posts by custom meta field

This seems obvious – a meta_query with a != compare statement. However if you stop there, you will only get posts which have the custom meta field you are matching, but not the value. Posts – e.g. different post types – that don’t even have the meta field will be excluded. To work around this we need to add another part to the query that uses the a NOT EXISTS compare and join them with an OR relation.

Exclude posts by taxonomy term

Taxonomy terms are slightly easier – but the trick is in the choice of operator – it has to be NOT IN rather than NOT EXISTS or  !=. While they all sound like they would work, only using NOT IN will allow you to exclude posts that have that term, while including all posts with other terms in the same taxonomy and also posts which do not have that taxonomy at all.

Here’s the full snippet:

 

Selecting title, meta and multiple term names with MySQL

I often run one off reports for clients directly from the MySQL database behind their (normally WordPress) websites. Most are straightforward, perhaps requiring data from the wp_posts and wp_postmeta tables. However it get's a lot more complicated if term data is needed too. This is because determining which terms belong to an object, what taxonomy they are, and what the human readable values are requires accessing 3 seperate database tables.

I'm going to step you through an example that generates a list of WooCommerce products with thier title, SKU, price and categories. You can skip the the full code if you don't want the explanation. While this example is based around WooCommerce, it would equally apply to creating a list of and post type with meta and taxonomy data.

The aim

For each product we are going to be retrieving:

  • post_title from the wp_posts table
  • meta_value from the wp_postmeta table where the meta_key is _sku
  • meta_value from the wp_postmeta table where the meta_key is _regular_price
  • name value(s) from the wp_terms table

Post title

To start with, let's get the Product names – this is the post_title field. We'll also restrict it to published Products.

SELECT
	wp_posts.post_title
	FROM wp_posts
WHERE wp_posts.post_type = 'product'
AND wp_posts.post_status = 'publish'

SKU and Price

The SKU and Price can be pulled in from the wp_postmeta table. To make sure that we only get the postmeta values we need we'll use a LEFT JOIN that matches the ID column from wp_posts to the post_id column, and restrict it to values where the meta_key is the one we are looking for. This is done once for each piece of meta.

SELECT
	wp_posts.post_title
	wp_postmeta1.meta_value
	wp_postmeta2.meta_value
FROM wp_posts
LEFT JOIN wp_postmeta wp_postmeta1
	ON wp_postmeta1.post_id = wp_posts.ID
	AND wp_postmeta1.meta_key = '_sku'
LEFT JOIN wp_postmeta wp_postmeta2
	ON wp_postmeta2.post_id = wp_posts.ID
	AND wp_postmeta2.meta_key = '_regular_price'
WHERE wp_posts.post_type = 'product'
AND wp_posts.post_status = 'publish'

You can see that the LEFT JOIN is also naming the table during the join as wp_postmeta1/2 – this means that we can reference them seperately in the SELECT.

Product category

This is where it gets a bit more complex. As before, LEFT JOINs are used to access specific data from other tables – wp_term_relationships, wp_term_taxonomy and wp_terms.

First we match object_id from wp_term_relationships against the ID of each post. This allows access to the term_taxonomy_id for each term assigned to each product.

LEFT JOIN wp_term_relationships
	ON wp_term_relationships.object_id = wp_posts.ID

Next we match the term_taxonomy_id against the same column of the wp_term_taxonomy table so that we can access the relevant term_id. Additionally we restrict it to terms that belong to the right taxonomy – product_cat.

LEFT JOIN wp_term_taxonomy
	ON wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
	AND wp_term_taxonomy.taxonomy = 'product_cat'

Finally we can match that term_id against the term_id in the wp_terms table which gives us access to the human readable name of each term.

LEFT JOIN wp_terms
	ON wp_term_taxonomy.term_id = wp_terms.term_id

Our SELECT statement can now be extended to get the term names with wp_terms.name

Putting that all together gives us:

SELECT
	wp_posts.post_title,
	wp_postmeta1.meta_value,
	wp_postmeta2.meta_value,
	wp_terms.name
FROM wp_posts
LEFT JOIN wp_postmeta wp_postmeta1
	ON wp_postmeta1.post_id = wp_posts.ID
	AND wp_postmeta1.meta_key = '_sku'
LEFT JOIN wp_postmeta wp_postmeta2
	ON wp_postmeta2.post_id = wp_posts.ID
	AND wp_postmeta2.meta_key = '_regular_price'
LEFT JOIN wp_term_relationships
	ON wp_term_relationships.object_id = wp_posts.ID
LEFT JOIN wp_term_taxonomy
	ON wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
	AND wp_term_taxonomy.taxonomy = 'product_cat'
LEFT JOIN wp_terms
	ON wp_term_taxonomy.term_id = wp_terms.term_id
WHERE wp_posts.post_type = 'product'
AND wp_posts.post_status = 'publish'

Tidying up

We have all the data now, but the returned results are very messy. There will be multiple rows for each Product if they have more than one category, and they are in no particular order.

Sorting out the ordering is as simple as ORDER BY wp_posts.post_title ASC. To aggregate the rows we can use GROUP BY wp_posts.ID, but this then means that only 1 category will ever be shown. To solve that we can use the GROUP_CONCAT operator. Combining it with ORDER BY and supplying the optional SEPERATOR allows the categories to appear in a single column, in alphabetical order, and seperated with a comma and space.

GROUP_CONCAT( wp_terms.name ORDER BY wp_terms.name SEPARATOR ', ' )

The complete code

Remember – this is useful for ad hoc generation. If you want to retrieve this data regularly – whether to show it on the front end of the site or in the admin area, you should atleast use the native WordPress functions for accessing the database, and potentially take it from database queries to WP_Query and related abstrations.

Ignore spaces when group MySQL query results

Let's get a list of all duplicate postcodes (could be any string of course) out of a specific field in the database. This could be handled in PHP but, with a decently spec'd server especially, it's quicker to handle it in the database.

The basic query looks like this: (this example is from WordPress but the principle is the same regardless)

global $wpdb;
$meta_key = 'customer_address_postcode';
$postcode_counts = $wpdb->get_results( $wpdb->prepare(
"
SELECT meta_value as postcode, COUNT(*) as count
FROM $wpdb->postmeta
WHERE meta_key = %s
GROUP BY meta_value
HAVING count > 1
ORDER BY count DESC
",
$meta_key
) );

This will return an array of objects each with 2 values – postcode and count, in descending order of the number of times they occur, and only if they occur more than once.

However, spaces have no meaning in a string like this. BS1 3XX, BS13XX and even B S 1 3 X X are all the same postcode and need to be considered duplicates.

Ignoring spaces

I'm not a SQL expert, and there are a lot of functions that I have never heard of, but there is the very sensibly named REPLACE function that replaces all occurances of a specified string within a string. It would perhaps be more common to see it used when updating the database, but we can use it to ignore spaces in our GROUP BY.

global $wpdb;
$meta_key = 'customer_address_postcode';
$postcode_counts = $wpdb->get_results( $wpdb->prepare(
"
SELECT meta_value as postcode, COUNT(*) as count
FROM $wpdb->postmeta
WHERE meta_key = %s
GROUP BY REPLACE( meta_value, ' ', '' )
HAVING count > 1
ORDER BY count DESC
",
$meta_key
) );

This comes with a performance penalty of course but depending on how frequently you need to run the query, and if you can cache the results, it is an effective solution to matching like this. You could also add the same REPLACE into the SELECT if you want the results to be spaceless as well.

Case insensitivity on Apache servers

Regularly when we launch a new site we also put up a static scrape of the old site so that users can access content that has been removed from the new site. Commonly this is old news content that the client want’s to keep available for a transition period, but doesn’t want cluttering up the new site.

This is fairly straightforward – feed an application such as SiteSucker a URL, tweak a few settings and let it loose. The resulting files can be dropped into a /archive/ folder and a link put on the site to direct users there for old content.

The most common hitch we hit, is when the old site was hosted on a Windows server as IIS is, largely, case insensitive. This means that /folderone/image.jpg and /FolderOne/image.jpg will both resolve to the same file. Apache however is case sensitive and will treat those 2 as different paths resulting in broken links in our archive. This won’t necessarily be a problem, but the fact IIS doesn’t mind means that the developers and site maintainers can be a little sloppy with their capitalisation, and there are often multiple versions used.

Configuring apache to be case insensitive

I personally consider case sensitivity to be appropriate normally, but it is possible to allow Apache to match case insensitivly using mod_speling (and no, I haven’t mistyped ‘spelling’…). Mod_speling is included as part of the standard Apache module bundle, but may not be activated. You can check with apache2ctl -M and if you don’t see it, enable it with a2enmod speling and then restart apache with service apache2 restart (these commands might vary depending on which OS you are running).

Now it’s available for use, it needs to be enabled within your vhost configuration or in .htaccess. Since I only want it to affect /archive/ I created a .htaccess file there and added:

<IfModule mod_speling.c>
    CheckCaseOnly on
    CheckSpelling on
</IfModule>

Counterintuitively this tells mod_speling to only check for case mis-matches and not attempt to correct misspellings – there’s a full explanation of how to use mod_speling in the official documentation for it.

Problems with rewrites

This works fine, unless the root site is using rewrites – as any installation of a CMS will be. This is because mod_rewrite prevents mod_speling from working; but as long as you don’t need rewrites in /archive/ the solution is simple – turn off the rewrite engine in that directory. Just add RewriteEngine off to the start of /archive/.htaccess before the mod_speling block:

RewriteEngine off
<IfModule mod_speling.c>
    CheckCaseOnly on
    CheckSpelling on
</IfModule>

Reverse relationship query for last ACF repeater sub-field

Advanced Custom Fields (ACF) is a great WordPress plugin for adding custom meta fields. It has a very useful relationship field that can be used to denote a connection from one post to another – importantly this is a one-way relationship. When you are on PostA you can generate a list of all the posts that it is linked to.

Going in reverse

ACF documentation highlights how a clever WP_Query can be used to do a `reverse query`, i.e. when you view PostB you can get a list of all the posts that link to PostB.

What about sub-fields

The reverse query works fine as it is for top level fields, but does not work for sub-fields within, for example, a repeater. Luckily Elliot on the ACF support forum shared some code for doing a reverse query against a sub-field.
The key is just using a LIKE for the meta query.

Note – for a Relationship field, where the value is a serialised array, use:

'meta_query' => array(
	array(
		'key' => 'fieldName', // name of custom field
		'value' => '"' . get_the_ID() . '"', // matches exaclty "123", not just 123. This prevents a match for "1234"
		'compare' => 'LIKE'
	)
)

but for a Post Object field, where the value is an integer, use:

'meta_query' => array(
	array(
		'key' => 'fieldName', // name of custom field
		'value' => get_the_ID(),
		'compare' => '='
	)
)

Latest item in repeater only

Just to get more complicated, now let’s do a reverse relationship query, against a sub-field, but only the sub-field within the latest item in the repeater…
Imagine a post type of Business that has a repeater called ‘Audit’, and within it sub-fields for ‘Audit firm’ and ‘Fee’. The ‘Audit firm’ sub-field is a relationship to another post type called Auditor. On the single Auditor pages I want to show the name of each Business who they are currently auditing, i.e., where they are the ‘Audit firm’ in the last repeater entry.

To get a list of Business post IDs we have to use a $wpdb query; the key is the use of MAX(meta_key) to get the last item in the repeater. This works because ACF names it’s repeater fields repeaterName_X_fieldName, where X is the number denoting when the item was added.

The solution

The code below is heavily based on a Stack Overflow answer from Elliot (coincidence?) with added WordPress and ACF magic and help from Luke Oatham.

$meta_key = 'audit_%_audit_firm'; // meta_key for repeater sub-field.
$meta_value = '%"'. get_the_id() . '"%'; // meta_value, wrapped for use in a LIKE.
$post_status = 'publish';
$businesses = $wpdb->get_col( $wpdb->prepare(
	"
	SELECT businessMeta.post_id // Field we want in the returned column.
	FROM $wpdb->postmeta businessMeta
	INNER JOIN
		(SELECT post_id, MAX(meta_key) AS latestAuditRepeater
		FROM $wpdb->postmeta
		WHERE meta_key LIKE '%s'
		GROUP BY post_id) groupedBusinessMeta
	ON businessMeta.post_id = groupedBusinessMeta.post_id
	AND businessMeta.meta_key = groupedBusinessMeta.latestAuditRepeater
	WHERE meta_value LIKE '%s'
	AND abMeta.post_id IN
		(SELECT ID
		FROM $wpdb->posts
		WHERE post_status = '%s')
	",
	$meta_key,
	$meta_value,
	$post_status
) );

Avoiding permissions problems when creating Zip files in PHP

A typical PHP snippet to create a Zip file looks something like this:

$zip = new ZipArchive();
$zipname = 'package_name.zip';
if ( true === $zip->open( $zipname, ZipArchive::CREATE ) ) {
    $zip->addFromString( 'file_name.txt', $file_contents );
    $zip->close();
    header( 'Content-Type: application/zip' );
    header( 'Content-disposition: attachment; filename=' . $zipname );
    header( 'Content-Length: ' . filesize( $zipname ) );
    readfile( $zipname );
    unlink( $zipname );
    exit;
}

But did you ever stop to think about where the temporary ‘package_name.zip’ file is created? Neither have I. What I have come across, are permissions errors when creating zip files as well as the slightly more mysterious 0Kb Zip files.

Watching the filesystem and removing the unlink() reveals that the Zip file is created in the current working directory, i.e. “the path of the ‘main’ script referenced in the URL”. For most web applications or CMS this will mean the root of the application/CMS.

This is fine when you have fairly relaxed permissions, but if things are more locked down you end up with the errors described above as the temporary file can’t be created.

A simple solution

As is so often the case the solution is actually very simple – tell PHP to move it’s working directory using the chdir() function. To keep things re-usable, we can combine chdir() with another function that returns the temporary directory specified in php.ini – which should always be writable by PHP – sys_get_temp_dir().

Below is the updated snippet, with one extra feature – it stores the temp file with a random name to reduce the liklihood of collisions if multiple people access your script at once.

chdir( sys_get_temp_dir() ); // Zip always get's created in current working dir so move to tmp.
$zip = new ZipArchive;
$tmp_zipname = uniqid(); // Generate a temp UID for the file on disk.
$zipname = 'package_name.zip'; // True filename used when serving to user.
if ( true === $zip->open( $tmp_zipname, ZipArchive::CREATE ) ) {
    $zip->addFromString( 'file_name.txt', $file_contents );
    $zip->close();
    header( 'Content-Type: application/zip' );
    header( 'Content-disposition: attachment; filename=' . $zipname );
    header( 'Content-Length: ' . filesize( $tmp_zipname ) );
    readfile( $tmp_zipname );
    unlink( $tmp_zipname );
    exit;
}