While browsing through the advice from top SEO and translation firms around the world on how best to manage a site targeted at users in various languages, I came across this page from Google which basically states that the search giant ignores HTML tags as they are no longer reliable. Instead it uses its own systems to determine the language of a page on page-by-page level.
This was great news for me in considering how to structure any French posts I’ll make on this site (subdomain, subdirectory etc.). As long as I keep the main body of content in one language, there is no need for complex multi-site mirror installs and code-heavy plug-ins.
This is ideal in my case as readers will mostly be arriving from Google to read the one post they are seeking information on. They can easily avoid clicking through to posts in other languages, and yet still have the option to if they are able (perhaps more likely on a language related blog). They wouldn’t have these options if I were forced to separate out the content.
I would, of course, advise clients with web-shops or fully-localised brands to split their site by subdomain or subdirectory so as not to confuse their buyers, but it seems I’m clear of that concern if I keep the majority of a page’s content in a single language.
PS: The post in the Google Webmaster Central blog linked to above also includes a great line on using automated, machine translation:
We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam. More importantly, the point of making a multilingual website is to reach a larger audience by providing valuable content in several languages. If your users can’t understand an automated translation or if it feels artificial to them, you should ask yourself whether you really want to present this kind of content to them.