bnks.xyz

Menu

Author: Phil

Working with Algolia Places address autocompletion api

For a recent mapping project I implemented Algolia Places for address autocompletion to turn an address into latitude and longitude for querying the database. In the past we’ve used Google Maps, but since this project wasn’t using Google Maps for the map display, using the Places API just for Geocoding is against their terms of use. It turns out this was a blessing in disguise – Algolia is fast, easy to implement, and very affordable. There was however one small hitch – the documentation get’s a bit patchy when you go past a basic implementation. To be fair to them – it’s actually because they start presuming that you’ll be using their multipurpose algoliasearchLite.js library rather than the simpler places.js.

Setting up autocompletion

The example from the documentation only needs a small extension to setup – populating hidden latitude and longitude fields from the returned data by using the ‘change’ event:

<script>
var places = places({
    appId: 'YOUR_PLACES_APP_ID',
    apiKey: 'YOUR_PLACES_API_KEY',
    container: address_input
});
places.on('change', function(e) {
    address_input.textContent = e.suggestion.value
    latitude_input.value = e.suggestion.latlng.lat || '';
    longitude_input.value = e.suggestion.latlng.lng || '';
});
</script>

Reverse geocoding

To give users a number of options, we also provided a geolocation button that uses the Geolocation API to let them search using the location reported by their system. The API returns latitude and longitude coordinates. While this is all that is needed to query the database – the UX isn’t ideal as it wouldn’t give a user readable representation of the location. This is important in case the returned location is wrong. Converting the coordinates into an address is called reverse geocoding.

The Places documentation has an example of reverse geocoding but unfortunately this is one that uses the wrong library. While there isn’t official support for Places, Algolia staff do monitor StackOverflow and help where they can. Luckily one such employee, Oliver, saw my query and got me on the right track.

To make a query you pass a compound latitude/longitude string value, and then an object of any options you want to change. For example:

places.reverse(
    '52.817078,-4.5697185',
    { hitsPerPage: 1 }
)

In another difference from algoliasearchLite.js – the response when using places.js is just the array of results. This makes utilising the results trivial. For example:

places.reverse(
    '52.817078,-4.5697185',
    { hitsPerPage: 1 }
).then(function(response){
    var suggestion = response[0];
    if (suggestion && (suggestion.suburb || suggestion.city)) {
        address_input.value = suggestion.suburb || suggestion.city || suggestion.county;
        address_input.value += ', ' + suggestion.country;
    }
});

Here I’ve chosen to populate the address text input field with the town (aka suburb) if available, and then the country. This gives enough information to orientate the user with what search is being done, without distracting them with the potentially/likely inaccurate house number and road level data.

From my experience so far I’d highly recommend you evaluate Algolia Places for your next autocompletion or geocoding project. The only downside I’ve found, common to all providers that rely on OSM data, is that you can’t reliable search by UK postcodes. In a subsequent post I’ll cover implementing getAddress.io – an API to turn UK postcodes into addresses using the Royal Mail PAF data.

New theme, dark mode, upcoming blogs

Trying to give myself a nudge to complete some 1/2 written posts, and write the unwritten ones, I decided to switch out the theme (again).

I’ve gone with a very minimal theme called Susty by Jack Lenox. It’s a very interesting concept – bytes matter, not just to be the fastest loading or get the highest score on a metric, but because websites have an impact on the planet. Datacentres use a lot of power, carbon, and land. Not to mention the impact on individual users – loading time affects how long a device has to be used as well as how long the radio has to be active (and so how much power is used). Design also has an impact, with lighter colours and low contrast needing more brightness from the screen to be readable. And don’t forget those bytes – unreliable and slow connections are common (“near dial-up speeds” are still common in the U.S.A.), and it’s well known that “users hate slow sites“. Leaner is better.

So. This is Susty – well, my fork of Susty. So far I’ve made some minor aesthetic changes, some accessibility changes, and introduced support for ‘dark mode’.

Dark mode

In technical terms this was just a few lines of CSS using the experimental prefers-color-scheme media query. This is still in the draft stage, but with popularity for dark modes shooting up after Mac OS 10.14 introduced them OS-wide, and support already in place in Firefox and Safari, it looks likely to stay. If your browser reports that you prefer a dark interface, the colour scheme of the site changes (basically inverts) automatically. I’m considering inverting this so dark is the default, but that’s for another day.

Plans for Susty

I have a couple of items on my todo list for Susty –

There are no guarantees – but I hope to keep some momentum and have my Susty polished to my liking over the next month.

Future blogs

In no particular order, these should get published over the next 3 months:

Exclude certain posts from search

Excluding posts (or custom post types) from search results when they meet certain criteria is easy in theory. Just use the pre_get_posts action to modify the WP_Query arguments used to get the results. However there are a couple of small pitfalls to watch out for. Below are brief explanations and snippets for excluding posts based on a taxonomy term and a custom meta field value.

Excluding posts by custom meta field

This seems obvious – a meta_query with a != compare statement. However if you stop there, you will only get posts which have the custom meta field you are matching, but not the value. Posts – e.g. different post types – that don’t even have the meta field will be excluded. To work around this we need to add another part to the query that uses the a NOT EXISTS compare and join them with an OR relation.

Exclude posts by taxonomy term

Taxonomy terms are slightly easier – but the trick is in the choice of operator – it has to be NOT IN rather than NOT EXISTS or  !=. While they all sound like they would work, only using NOT IN will allow you to exclude posts that have that term, while including all posts with other terms in the same taxonomy and also posts which do not have that taxonomy at all.

Here’s the full snippet:

 

Selecting title, meta and multiple term names with MySQL

I often run one off reports for clients directly from the MySQL database behind their (normally WordPress) websites. Most are straightforward, perhaps requiring data from the wp_posts and wp_postmeta tables. However it get's a lot more complicated if term data is needed too. This is because determining which terms belong to an object, what taxonomy they are, and what the human readable values are requires accessing 3 seperate database tables.

I'm going to step you through an example that generates a list of WooCommerce products with thier title, SKU, price and categories. You can skip the the full code if you don't want the explanation. While this example is based around WooCommerce, it would equally apply to creating a list of and post type with meta and taxonomy data.

The aim

For each product we are going to be retrieving:

  • post_title from the wp_posts table
  • meta_value from the wp_postmeta table where the meta_key is _sku
  • meta_value from the wp_postmeta table where the meta_key is _regular_price
  • name value(s) from the wp_terms table

Post title

To start with, let's get the Product names – this is the post_title field. We'll also restrict it to published Products.

SELECT
	wp_posts.post_title
	FROM wp_posts
WHERE wp_posts.post_type = 'product'
AND wp_posts.post_status = 'publish'

SKU and Price

The SKU and Price can be pulled in from the wp_postmeta table. To make sure that we only get the postmeta values we need we'll use a LEFT JOIN that matches the ID column from wp_posts to the post_id column, and restrict it to values where the meta_key is the one we are looking for. This is done once for each piece of meta.

SELECT
	wp_posts.post_title
	wp_postmeta1.meta_value
	wp_postmeta2.meta_value
FROM wp_posts
LEFT JOIN wp_postmeta wp_postmeta1
	ON wp_postmeta1.post_id = wp_posts.ID
	AND wp_postmeta1.meta_key = '_sku'
LEFT JOIN wp_postmeta wp_postmeta2
	ON wp_postmeta2.post_id = wp_posts.ID
	AND wp_postmeta2.meta_key = '_regular_price'
WHERE wp_posts.post_type = 'product'
AND wp_posts.post_status = 'publish'

You can see that the LEFT JOIN is also naming the table during the join as wp_postmeta1/2 – this means that we can reference them seperately in the SELECT.

Product category

This is where it gets a bit more complex. As before, LEFT JOINs are used to access specific data from other tables – wp_term_relationships, wp_term_taxonomy and wp_terms.

First we match object_id from wp_term_relationships against the ID of each post. This allows access to the term_taxonomy_id for each term assigned to each product.

LEFT JOIN wp_term_relationships
	ON wp_term_relationships.object_id = wp_posts.ID

Next we match the term_taxonomy_id against the same column of the wp_term_taxonomy table so that we can access the relevant term_id. Additionally we restrict it to terms that belong to the right taxonomy – product_cat.

LEFT JOIN wp_term_taxonomy
	ON wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
	AND wp_term_taxonomy.taxonomy = 'product_cat'

Finally we can match that term_id against the term_id in the wp_terms table which gives us access to the human readable name of each term.

LEFT JOIN wp_terms
	ON wp_term_taxonomy.term_id = wp_terms.term_id

Our SELECT statement can now be extended to get the term names with wp_terms.name

Putting that all together gives us:

SELECT
	wp_posts.post_title,
	wp_postmeta1.meta_value,
	wp_postmeta2.meta_value,
	wp_terms.name
FROM wp_posts
LEFT JOIN wp_postmeta wp_postmeta1
	ON wp_postmeta1.post_id = wp_posts.ID
	AND wp_postmeta1.meta_key = '_sku'
LEFT JOIN wp_postmeta wp_postmeta2
	ON wp_postmeta2.post_id = wp_posts.ID
	AND wp_postmeta2.meta_key = '_regular_price'
LEFT JOIN wp_term_relationships
	ON wp_term_relationships.object_id = wp_posts.ID
LEFT JOIN wp_term_taxonomy
	ON wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
	AND wp_term_taxonomy.taxonomy = 'product_cat'
LEFT JOIN wp_terms
	ON wp_term_taxonomy.term_id = wp_terms.term_id
WHERE wp_posts.post_type = 'product'
AND wp_posts.post_status = 'publish'

Tidying up

We have all the data now, but the returned results are very messy. There will be multiple rows for each Product if they have more than one category, and they are in no particular order.

Sorting out the ordering is as simple as ORDER BY wp_posts.post_title ASC. To aggregate the rows we can use GROUP BY wp_posts.ID, but this then means that only 1 category will ever be shown. To solve that we can use the GROUP_CONCAT operator. Combining it with ORDER BY and supplying the optional SEPERATOR allows the categories to appear in a single column, in alphabetical order, and seperated with a comma and space.

GROUP_CONCAT( wp_terms.name ORDER BY wp_terms.name SEPARATOR ', ' )

The complete code

Remember – this is useful for ad hoc generation. If you want to retrieve this data regularly – whether to show it on the front end of the site or in the admin area, you should atleast use the native WordPress functions for accessing the database, and potentially take it from database queries to WP_Query and related abstrations.

Ignore spaces when group MySQL query results

Let's get a list of all duplicate postcodes (could be any string of course) out of a specific field in the database. This could be handled in PHP but, with a decently spec'd server especially, it's quicker to handle it in the database.

The basic query looks like this: (this example is from WordPress but the principle is the same regardless)

global $wpdb;
$meta_key = 'customer_address_postcode';
$postcode_counts = $wpdb->get_results( $wpdb->prepare(
"
SELECT meta_value as postcode, COUNT(*) as count
FROM $wpdb->postmeta
WHERE meta_key = %s
GROUP BY meta_value
HAVING count > 1
ORDER BY count DESC
",
$meta_key
) );

This will return an array of objects each with 2 values – postcode and count, in descending order of the number of times they occur, and only if they occur more than once.

However, spaces have no meaning in a string like this. BS1 3XX, BS13XX and even B S 1 3 X X are all the same postcode and need to be considered duplicates.

Ignoring spaces

I'm not a SQL expert, and there are a lot of functions that I have never heard of, but there is the very sensibly named REPLACE function that replaces all occurances of a specified string within a string. It would perhaps be more common to see it used when updating the database, but we can use it to ignore spaces in our GROUP BY.

global $wpdb;
$meta_key = 'customer_address_postcode';
$postcode_counts = $wpdb->get_results( $wpdb->prepare(
"
SELECT meta_value as postcode, COUNT(*) as count
FROM $wpdb->postmeta
WHERE meta_key = %s
GROUP BY REPLACE( meta_value, ' ', '' )
HAVING count > 1
ORDER BY count DESC
",
$meta_key
) );

This comes with a performance penalty of course but depending on how frequently you need to run the query, and if you can cache the results, it is an effective solution to matching like this. You could also add the same REPLACE into the SELECT if you want the results to be spaceless as well.

Case insensitivity on Apache servers

Regularly when we launch a new site we also put up a static scrape of the old site so that users can access content that has been removed from the new site. Commonly this is old news content that the client want’s to keep available for a transition period, but doesn’t want cluttering up the new site.

This is fairly straightforward – feed an application such as SiteSucker a URL, tweak a few settings and let it loose. The resulting files can be dropped into a /archive/ folder and a link put on the site to direct users there for old content.

The most common hitch we hit, is when the old site was hosted on a Windows server as IIS is, largely, case insensitive. This means that /folderone/image.jpg and /FolderOne/image.jpg will both resolve to the same file. Apache however is case sensitive and will treat those 2 as different paths resulting in broken links in our archive. This won’t necessarily be a problem, but the fact IIS doesn’t mind means that the developers and site maintainers can be a little sloppy with their capitalisation, and there are often multiple versions used.

Configuring apache to be case insensitive

I personally consider case sensitivity to be appropriate normally, but it is possible to allow Apache to match case insensitivly using mod_speling (and no, I haven’t mistyped ‘spelling’…). Mod_speling is included as part of the standard Apache module bundle, but may not be activated. You can check with apache2ctl -M and if you don’t see it, enable it with a2enmod speling and then restart apache with service apache2 restart (these commands might vary depending on which OS you are running).

Now it’s available for use, it needs to be enabled within your vhost configuration or in .htaccess. Since I only want it to affect /archive/ I created a .htaccess file there and added:

<IfModule mod_speling.c>
    CheckCaseOnly on
    CheckSpelling on
</IfModule>

Counterintuitively this tells mod_speling to only check for case mis-matches and not attempt to correct misspellings – there’s a full explanation of how to use mod_speling in the official documentation for it.

Problems with rewrites

This works fine, unless the root site is using rewrites – as any installation of a CMS will be. This is because mod_rewrite prevents mod_speling from working; but as long as you don’t need rewrites in /archive/ the solution is simple – turn off the rewrite engine in that directory. Just add RewriteEngine off to the start of /archive/.htaccess before the mod_speling block:

RewriteEngine off
<IfModule mod_speling.c>
    CheckCaseOnly on
    CheckSpelling on
</IfModule>

Reverse relationship query for last ACF repeater sub-field

Advanced Custom Fields (ACF) is a great WordPress plugin for adding custom meta fields. It has a very useful relationship field that can be used to denote a connection from one post to another – importantly this is a one-way relationship. When you are on PostA you can generate a list of all the posts that it is linked to.

Going in reverse

ACF documentation highlights how a clever WP_Query can be used to do a `reverse query`, i.e. when you view PostB you can get a list of all the posts that link to PostB.

What about sub-fields

The reverse query works fine as it is for top level fields, but does not work for sub-fields within, for example, a repeater. Luckily Elliot on the ACF support forum shared some code for doing a reverse query against a sub-field.
The key is just using a LIKE for the meta query.

Note – for a Relationship field, where the value is a serialised array, use:

'meta_query' => array(
	array(
		'key' => 'fieldName', // name of custom field
		'value' => '"' . get_the_ID() . '"', // matches exaclty "123", not just 123. This prevents a match for "1234"
		'compare' => 'LIKE'
	)
)

but for a Post Object field, where the value is an integer, use:

'meta_query' => array(
	array(
		'key' => 'fieldName', // name of custom field
		'value' => get_the_ID(),
		'compare' => '='
	)
)

Latest item in repeater only

Just to get more complicated, now let’s do a reverse relationship query, against a sub-field, but only the sub-field within the latest item in the repeater…
Imagine a post type of Business that has a repeater called ‘Audit’, and within it sub-fields for ‘Audit firm’ and ‘Fee’. The ‘Audit firm’ sub-field is a relationship to another post type called Auditor. On the single Auditor pages I want to show the name of each Business who they are currently auditing, i.e., where they are the ‘Audit firm’ in the last repeater entry.

To get a list of Business post IDs we have to use a $wpdb query; the key is the use of MAX(meta_key) to get the last item in the repeater. This works because ACF names it’s repeater fields repeaterName_X_fieldName, where X is the number denoting when the item was added.

The solution

The code below is heavily based on a Stack Overflow answer from Elliot (coincidence?) with added WordPress and ACF magic and help from Luke Oatham.

$meta_key = 'audit_%_audit_firm'; // meta_key for repeater sub-field.
$meta_value = '%"'. get_the_id() . '"%'; // meta_value, wrapped for use in a LIKE.
$post_status = 'publish';
$businesses = $wpdb->get_col( $wpdb->prepare(
	"
	SELECT businessMeta.post_id // Field we want in the returned column.
	FROM $wpdb->postmeta businessMeta
	INNER JOIN
		(SELECT post_id, MAX(meta_key) AS latestAuditRepeater
		FROM $wpdb->postmeta
		WHERE meta_key LIKE '%s'
		GROUP BY post_id) groupedBusinessMeta
	ON businessMeta.post_id = groupedBusinessMeta.post_id
	AND businessMeta.meta_key = groupedBusinessMeta.latestAuditRepeater
	WHERE meta_value LIKE '%s'
	AND abMeta.post_id IN
		(SELECT ID
		FROM $wpdb->posts
		WHERE post_status = '%s')
	",
	$meta_key,
	$meta_value,
	$post_status
) );

Avoiding permissions problems when creating Zip files in PHP

A typical PHP snippet to create a Zip file looks something like this:

$zip = new ZipArchive();
$zipname = 'package_name.zip';
if ( true === $zip->open( $zipname, ZipArchive::CREATE ) ) {
    $zip->addFromString( 'file_name.txt', $file_contents );
    $zip->close();
    header( 'Content-Type: application/zip' );
    header( 'Content-disposition: attachment; filename=' . $zipname );
    header( 'Content-Length: ' . filesize( $zipname ) );
    readfile( $zipname );
    unlink( $zipname );
    exit;
}

But did you ever stop to think about where the temporary ‘package_name.zip’ file is created? Neither have I. What I have come across, are permissions errors when creating zip files as well as the slightly more mysterious 0Kb Zip files.

Watching the filesystem and removing the unlink() reveals that the Zip file is created in the current working directory, i.e. “the path of the ‘main’ script referenced in the URL”. For most web applications or CMS this will mean the root of the application/CMS.

This is fine when you have fairly relaxed permissions, but if things are more locked down you end up with the errors described above as the temporary file can’t be created.

A simple solution

As is so often the case the solution is actually very simple – tell PHP to move it’s working directory using the chdir() function. To keep things re-usable, we can combine chdir() with another function that returns the temporary directory specified in php.ini – which should always be writable by PHP – sys_get_temp_dir().

Below is the updated snippet, with one extra feature – it stores the temp file with a random name to reduce the liklihood of collisions if multiple people access your script at once.

chdir( sys_get_temp_dir() ); // Zip always get's created in current working dir so move to tmp.
$zip = new ZipArchive;
$tmp_zipname = uniqid(); // Generate a temp UID for the file on disk.
$zipname = 'package_name.zip'; // True filename used when serving to user.
if ( true === $zip->open( $tmp_zipname, ZipArchive::CREATE ) ) {
    $zip->addFromString( 'file_name.txt', $file_contents );
    $zip->close();
    header( 'Content-Type: application/zip' );
    header( 'Content-disposition: attachment; filename=' . $zipname );
    header( 'Content-Length: ' . filesize( $tmp_zipname ) );
    readfile( $tmp_zipname );
    unlink( $tmp_zipname );
    exit;
}

Extending WP CLI – wp config update

One gap in the abilities of WP CLI at the moment is the ability to modify an already existing wp-config.php file.

v1.2 introduced the --force flag to overwrite an existing one, but that is the sledgehammer option – so I started working on it myself. I have put a very initial version on GitHub and would welcome feedback and pull-requests. Please don’t use this on a live site!

Is this something you would use? What features do you think it should have? Let me know in the comments.

Automatic SOCKS proxy for a single domain on Mac

VPNs and proxies are great – but almost always limited to funnelling all traffic through them. But what if you want to access only a single site/domain without affecting the rest of your browsing? Perhaps to access a staging site not publically available.

While what I discuss below could be used as a privacy measure, that’s not the focus of this blog post.

You can do clever things with Web Proxy Auto Discovery Protocol (WPAD) or browser specific tools like Firefox’s Proxy Auto-Configuration (PAC) but this isn’t exactly user friendly to setup. What I was looking for was something as near to zero-configuration as possible, so I could share it with my collegues without causing problems.

In the end I came to a zip file that contains a copy of Google Chrome Canary and an Automator script to launch and configure everything.
Why Canary? Chrome let’s me configure proxy settings when launching it from a bash script; but since many of us already use it, bundling Canary allows me to avoid any clashes.

The zip is structured like this:
– ProxyAccess.app (the Automator app)
– Resources (folder)
– – Google Chome Canary.app

The magic is really a one line command on the Automator app that opens the SOCKS proxy, then launches the browser to the destination URL, and the closes everything when you quit Canary:

./Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --temp-profile --proxy-server='socks5://localhost:5555' 'http://dev.site.co.uk' | ssh -D 5555 proxy.server.co.uk cat

This is based of Tim Weber’s work which suggested using cat and piping the commands together to create the automatically closing proxy. http://dev.site.co.uk is the URL to be opened when Canary launches, 5555 is the port to run the proxy on, and proxy.server.co.uk is the server that we are routing traffic through.

The Automator app is very simple and looks like this:

It uses Applescript to get the current working directory, then passes that to bash so it can find the bundled copy of Canary to run.

This falls in the quick and dirty category, but it gets the job done with minimal overhead or potential impact on a users machine.