TuttoSemplice Multilingual v2.6.5: Architecture of a Production-Grade WordPress Multilingual Plugin

Architecture diagram of a WordPress multilingual plugin showing AI-powered translation routing across language nodes (IT, EN, FR, DE) with an async job queue and hreflang SEO layer

Building a multilingual WordPress site beyond the WPML/Polylang commodity model requires rethinking content routing, translation pipelines, and SEO signal propagation from first principles. TuttoSemplice Multilingual (v2.6.5) is a self-contained plugin that handles subfolder URL routing, AI-powered translation via Google Gemini on Vertex AI, a persistent async job queue backed by a custom MySQL table, and full hreflang/canonical injection — without depending on any third-party SaaS. This analysis dissects its 18-class architecture to surface the engineering decisions and performance trade-offs relevant to any independent publisher running a multilingual AI-driven content operation at scale.

Plugin Architecture

Singleton Orchestrator and the 18-Class Component Graph

The entire plugin is bootstrapped through a central singleton, TuttoSemplice_Multilingual, instantiated once at request time via tsm_multilingual_init() at the bottom of the main entry file. The constructor executes four sequential phases:

  • load_dependencies() — unconditionally requires all 18+ class files; admin-only classes (TSM_Metabox, TSM_Bulk_Translate_Page, TSM_Notifications) are gated inside is_admin() to avoid unnecessary object instantiation on frontend requests.
  • init_hooks() — registers WordPress actions and filters for URL rewriting, archive query filtering, permalink transformation, and social text localization.
  • init_components() — instantiates every subsystem and stores it as a public property on the singleton (e.g. $this->background_translator, $this->google_ai_translator), providing a centralized service locator accessible via TuttoSemplice_Multilingual::get_instance()->component.
  • init_cron() — registers the tsm_update_translated_links_cron event and schedules it if no future execution is already queued.

The component graph is deliberately flat: no dependency injection container, no lazy loading. Every object is constructed at plugin boot. For a site with a moderate content volume (hundreds to low thousands of posts), this is an acceptable trade-off — the memory footprint of 18 objects is negligible compared to the overhead of a DI container bootstrap.

// Simplified boot sequence (tuttosemplice-multilingual.php)
function tsm_multilingual_init() {
    return TuttoSemplice_Multilingual::get_instance(); // singleton
}
tsm_multilingual_init(); // called at file scope, triggers full init

// Inside init_components(): every subsystem is eagerly instantiated
$this->rewrite            = new TSM_Rewrite();
$this->seo                = new TSM_SEO();
$this->background_translator = TSM_Background_Translator::get_instance();
$this->google_ai_translator  = new TSM_Google_AI_Translator();
// ... 14 more classes
URL Routing

Subfolder URL Strategy: Why /fr/ Beats fr.domain.com

TSM_Rewrite implements the industry-standard subfolder pattern (e.g. example.com/fr/article-slug) rather than subdomain separation. The choice is architecturally significant and has concrete implications for PageRank consolidation, CDN cache partitioning, and deployment complexity.

ApproachDomain AuthorityCDN CacheSSL OverheadWordPress Config
Subfolder /fr/Single domain, full consolidationSingle origin, path-based routingOne certificateSingle WP install, rewrite rules
Subdomain fr.Split authority per hreflangSeparate CNAME per regionWildcard cert requiredMultisite or separate installs
ccTLD .frGeo-restricted authority signalSeparate originOne cert per TLDSeparate installs + cross-linking

The rewrite rules are prepended to WordPress’s rewrite array at priority 1 (the lowest numeric value, meaning highest precedence in rewrite_rules_array), overriding any conflicting rules from other plugins. A custom query variable lang is registered with query_vars so WordPress does not strip it from the parsed request object.

A particularly important detail is the aggressive redirect mechanism in TSM_Rewrite::aggressive_redirect(), hooked into parse_request at priority 1 — before any template loading. When a translated post is accessed without its language prefix (e.g. /some-french-slug instead of /fr/some-french-slug), the plugin issues a 301 redirect to the canonical prefixed URL. This prevents duplicate content penalties and correctly propagates link equity to the language-prefixed version.

// TSM_Rewrite — permalink filter applies the /lang-code/ prefix
// Only for non-default languages; default lang URLs remain clean
add_filter( 'post_link', array( $this, 'translate_permalink' ), 10, 2 );
add_filter( 'page_link', array( $this, 'translate_permalink' ), 10, 2 );
add_filter( 'post_type_link', array( $this, 'translate_permalink' ), 10, 2 );

The filter_home_url filter ensures that any internal call to home_url() also returns the language-prefixed base when the current context is a non-default language page — a subtle but critical correctness requirement for pagination links and feed URLs.

AI Translation Engine

Gemini on Vertex AI: Chunked Translation with 28,000-Codepoint Segments

The primary translation engine is TSM_Google_AI_Translator, which calls Google Gemini via the Vertex AI API (currently targeting the gemini-2.5-pro-preview model family, updated to gemini-3.1-pro-preview in v2.5.0 with global location routing for reduced latency). Authentication is handled via a service account JSON key file referenced by the TSM_GOOGLE_KEY_PATH PHP constant defined outside the plugin — keeping credentials out of the plugin’s own option store.

A key engineering constraint is the API’s 30,720 codepoint limit per request. The plugin implements chunking at a conservative 28,000-codepoint threshold (MAX_CODEPOINTS = 28000) to maintain a safety margin. Long-form articles are split into segments, translated independently, and reassembled. This design means a 10,000-word article does not block the PHP-FPM worker for more than the time of a single chunk’s round-trip, though in synchronous mode (foreground translation) a large article can still saturate one FPM process for the duration of multiple sequential API calls.

// class-google-ai-translator.php
private const MAX_CODEPOINTS = 28000; // 8.5% safety margin vs. API 30,720 limit

// The payload sent to Vertex AI includes:
// - post title, post content (chunked if > 28K codepoints)
// - Yoast SEO fields (_yoast_wpseo_title, _yoast_wpseo_metadesc)
// - ACF fields (auto-detected via TSM_Shortcode_Detector)
// - custom meta fields (configurable via Settings > Custom Meta Fields)
// - taxonomy terms (categories and tags) for AI translation

The prepare_translation_payload() method constructs a rich translation context that goes well beyond the post body. It collects Yoast SEO metadata, ACF fields, custom meta fields, and taxonomy terms — building a unified JSON payload that the LLM can translate holistically. This is architecturally superior to field-by-field translation because it allows the model to maintain terminology consistency across title, meta description, and body in a single inference pass.

The API timeout is explicitly set to 180 seconds (increased from the WordPress default of 5 seconds). This is a deliberate trade-off: a long-running synchronous translation will hold one PHP-FPM child process until completion. With pm.max_children = 10 on the target server, a single bulk operation could saturate the process pool if triggered concurrently. This is precisely the problem that the background queue system was designed to solve.

Shortcode Handling: The Mapping-Over-Translation Pivot

Prior to v2.0.0, shortcodes within post content were detected and expanded before sending the payload to the LLM — allowing the AI to see the rendered output rather than the shortcode tag. This approach was abandoned in v2.0.0 in favor of a mapping system: shortcodes are left in place, and an administrator defines per-language equivalents (e.g. [quiz_it][quiz_en]). The rationale is correctness: shortcodes often contain IDs or references to language-specific assets that a generic LLM cannot resolve; a deterministic mapping is safer than an LLM guess.

Background Processing

Persistent Async Translation Queue: MySQL-Backed Job Table

The most infrastructure-critical component is TSM_Background_Translator (introduced in v2.2.0), which decouples translation requests from the HTTP request lifecycle. Rather than blocking the admin user’s browser while Gemini processes a 3,000-word article, the plugin inserts a job record into a dedicated database table and processes it on the next cron tick.

Database Schema

The queue is persisted in a custom table ({prefix}tsm_translation_queue) created with dbDelta() on plugin activation:

CREATE TABLE IF NOT EXISTS {prefix}tsm_translation_queue (
    id              BIGINT(20)   NOT NULL AUTO_INCREMENT,
    post_id         BIGINT(20)   NOT NULL,
    target_language VARCHAR(10)  NOT NULL,
    status          VARCHAR(20)  NOT NULL DEFAULT 'pending',  -- pending|processing|completed|failed|cancelled
    priority        INT(11)      NOT NULL DEFAULT 10,
    attempts        INT(11)      NOT NULL DEFAULT 0,
    max_attempts    INT(11)      NOT NULL DEFAULT 3,
    error_message   TEXT,
    translated_post_id BIGINT(20),
    user_id         BIGINT(20)   NOT NULL,
    created_at      DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
    started_at      DATETIME,
    completed_at    DATETIME,
    PRIMARY KEY (id),
    KEY post_id (post_id),
    KEY status (status),
    KEY priority_status (priority DESC, status, created_at)  -- composite for efficient dequeue
);

The composite index priority_status covers the exact column order used by the dequeue query (ORDER BY priority DESC, created_at ASC), which is the correct approach for a priority queue — avoiding a filesort on every worker tick.

Worker Concurrency Control: Transient-Based Mutex

process_queue() implements a mutual exclusion primitive using a WordPress transient as a distributed lock:

public function process_queue() {
    $lock_key = 'tsm_queue_processing_lock';

    if ( get_transient( $lock_key ) ) {
        return; // another worker is active — exit immediately, no double processing
    }

    set_transient( $lock_key, true, 300 ); // lock expires after 5 minutes

    try {
        // ... dequeue and execute one job per tick ...
    } finally {
        delete_transient( $lock_key ); // always release, even on exception
    }
}

This is a single-consumer model: only one translation runs per cron tick (every 5 minutes). It is the correct default for a server with pm.max_children = 10 — a translation API call consuming 180 seconds while holding a FPM worker is too expensive to parallelize without a process budget strategy. The design trades throughput for stability.

Watchdog: Stuck Job Recovery

The system includes a watchdog cron event (tsm_check_stuck_queue, fired every 30 minutes) that queries for jobs stuck in processing status for more than 15 minutes. On detection, the watchdog either resets the job to pending (if below max retries) or marks it as failed and issues a failure notification. An orphan lock cleanup is also performed: if the transient mutex is set but no job has been in processing for the last 5 minutes, it is assumed to be a stale lock and is deleted.

A critical nuance in the worker implementation: since WP-Cron runs under user ID 0 (no authenticated user), calls to wp_insert_post() in cron context would strip HTML tags from post content because anonymous users lack the unfiltered_html capability. The plugin solves this by temporarily switching the current user to the requesting user (or the post author) before executing the translation, then restoring the original user ID afterward.

// Impersonate the requester so wp_insert_post() preserves raw HTML blocks
$original_user_id    = get_current_user_id(); // 0 in cron context
$translation_user_id = $job->user_id ?: get_post($job->post_id)->post_author;

wp_set_current_user( $translation_user_id );
$result = $this->execute_translation( $job->post_id, $job->target_language );
wp_set_current_user( $original_user_id ); // always restore
International SEO

SEO Infrastructure: hreflang, Canonical Signals, and Yoast Integration

TSM_SEO injects <link rel="alternate" hreflang="..."> tags into wp_head at priority 1 (before any plugin), covering three content contexts: singular posts/pages, taxonomy archive pages (categories and tags), and the site homepage. The implementation reads the translation map stored in _tsm_translation_map post meta on the original post and constructs valid hreflang tags for every available language, including x-default pointing to the default-language version.

A documented regression that required a dedicated fix (v2.0.5) is the double-prefix URL contamination: when building hreflang URLs for translated posts, naive use of get_permalink() would return URLs already containing the language prefix (because the post_link filter was active), and the hreflang builder would then prepend the prefix again, producing /fr/fr/slug. The fix bypasses the permalink filters and constructs the canonical URL manually from the site base URL and the post slug.

Integration with Yoast SEO is handled through three dedicated filters:

  • wpseo_canonical — corrects the canonical URL for translated posts to include the language prefix;
  • wpseo_opengraph_url — fixes the og:url meta tag for social sharing of translated content;
  • wpseo_og_locale — replaces the og:locale value with the correct locale for the target language (e.g. fr_FR instead of it_IT).

Since v2.3.1, hreflang tags use only ISO 639-1 two-letter language codes without country suffixes (e.g. hreflang="fr" instead of hreflang="fr-FR"). This deliberately targets all speakers of a language globally rather than a specific regional variant — a sensible default for an independent publisher without country-specific content variants.

Sitemap Correctness for Multilingual Taxonomy Pages

TSM_SEO_Sitemap addresses a subtle but important Yoast SEO conflict (fixed in v2.5.2): when Yoast generates the XML sitemap for categories and tags, it calls get_term_link() internally, which for multilingual terms (carrying a _lang slug suffix) returned 404-equivalent URLs that then appeared as broken entries in the sitemap. The fix intercepts Yoast’s sitemap generation hooks and constructs the correct language-prefixed taxonomy archive URLs manually, bypassing the term link resolution path.

Content Taxonomy

Taxonomy Translation: Per-Language Terms with Slug Suffixing

One of the more architecturally interesting problems in multilingual WordPress is taxonomy term management: should a French translation of a post use the same Informatica category as the Italian original, or a separate Informatique category? TuttoSemplice Multilingual takes the latter approach: each non-default language gets its own set of term objects with a _{lang-code} suffix appended to the slug (e.g. category Technology with slug technology_fr for French).

This design has three concrete advantages:

  • Archive isolation: /fr/category/technology_fr/ returns only French posts, while /category/technology/ returns only Italian posts, because the main query filter (filter_main_query_by_language()) already partitions posts by the tsm_language internal taxonomy. Separate term slugs reinforce this isolation at the URL level.
  • Redirect prevention: if two terms in different languages share a slug, WordPress canonical redirect logic can incorrectly redirect language-prefixed category URLs to the non-prefixed version of the other language’s term. Slug suffixing eliminates this ambiguity (v2.0.14).
  • Sitemap integrity: distinct term objects map to distinct sitemap entries, preventing term link collisions in Yoast’s sitemap builder.

A parallel manual mapping system (TSM_Mapping_Handler, TSM_Taxonomy_Mapping) allows administrators to define explicit category-to-category and tag-to-tag correspondences between languages — overriding the AI-translated term creation. When a manual mapping is found for a term, it takes absolute priority over AI-generated terms. This is the correct approach for editorially curated classification systems where machine translation of category names is insufficiently precise.

Query Filtering

Archive Query Partitioning via Internal Taxonomy

The plugin registers a private, non-public taxonomy tsm_language and assigns language codes as term slugs to every post (both original and translated). The main query filter filter_main_query_by_language() modifies WP_Query::tax_query on every frontend main query to enforce language partitioning:

  • Non-default language context (e.g. /fr/): adds a TAX IN [fr] clause — shows only posts tagged with the French language term;
  • Default language context (e.g. /): adds a TAX NOT IN [fr, de, es, ...] OR TAX NOT EXISTS clause — excludes all translated posts while including posts with no language tag at all (legacy content).
// Default language: exclude translated content with an OR compound clause
$lang_tax_query = array(
    'relation' => 'OR',
    array(
        'taxonomy' => 'tsm_language',
        'operator' => 'NOT EXISTS',    // include posts with no language tag (legacy)
    ),
    array(
        'taxonomy' => 'tsm_language',
        'field'    => 'slug',
        'terms'    => $other_lang_codes, // all non-default language codes
        'operator' => 'NOT IN',
    ),
);

This results in a LEFT JOIN on the taxonomy table in the generated SQL — acceptable for sites with hundreds to low thousands of posts, but worth monitoring on databases with tens of thousands of posts where the JOIN cost becomes significant. Adding a MySQL composite index on (object_id, term_taxonomy_id) in wp_term_relationships is the correct mitigation if query time grows above acceptable thresholds.

Link Consistency

Internal Link Consistency: Regex Scan and WP-Cron Batch Updater

When a French translation of post A is created, any French translation of post B that contains an internal link to the Italian version of A now has a stale link. The plugin addresses this through a two-pronged mechanism:

On-save filter (wp_insert_post_data, v2.1.4): when a translated post is saved, the post content is scanned with a regex to find all href attributes pointing to the site domain. For each internal URL, url_to_postid() resolves the post ID, then get_translated_post_id() looks up the corresponding translation in the target language. If found, the href is replaced with the translated post’s permalink before the post is written to the database.

WP-Cron batch updater (v2.1.5, enhanced in v2.2.1): a scheduled event processes translated posts in configurable batches (default 100, configurable between 10–500). The batch updater skips execution entirely if no new translations were created in the past 24 hours — a demand-driven pattern that avoids unnecessary DB queries during periods with no translation activity.

The look-up function get_translated_post_id() implements a two-layer caching strategy: a static PHP array for same-request deduplication, and a wp_cache_set() persistent cache entry (compatible with Redis/Memcached) with a 12-hour TTL. This is the correct approach for a function that may be called hundreds of times on a single request (once per internal link in the post content).

// Two-layer cache for translated post ID lookups
static $tsm_translated_id_cache = array(); // per-request static cache
$cache_key = $original_post_id . '_' . $language_code;

// Layer 1: static in-memory (zero DB hits on repeated lookups within same request)
if ( isset( $tsm_translated_id_cache[ $cache_key ] ) ) {
    return $tsm_translated_id_cache[ $cache_key ];
}

// Layer 2: persistent object cache (Redis/Memcached) — 12h TTL
$persist_cache_key = 'tsm_translated_id_' . $cache_key;
$cached_id = wp_cache_get( $persist_cache_key, 'tsm_translations' );
if ( false !== $cached_id ) {
    $tsm_translated_id_cache[ $cache_key ] = (int) $cached_id;
    return (int) $cached_id;
}

// Layer 3: double-JOIN MySQL query (only on cache miss)
$query = $wpdb->prepare(
    "SELECT p.ID FROM {$wpdb->posts} p
     INNER JOIN {$wpdb->postmeta} m1 ON p.ID = m1.post_id
     INNER JOIN {$wpdb->postmeta} m2 ON p.ID = m2.post_id
     WHERE m1.meta_key = '_tsm_original_post_id' AND m1.meta_value = %d
     AND   m2.meta_key = '_tsm_language_code'    AND m2.meta_value = %s
     AND   p.post_status = 'publish'
     LIMIT 1",
    $original_post_id, $language_code
);
Geo-Targeting

Geo-Redirect and Browser Language Detection

TSM_Geo_Redirect implements client language detection and automatic redirect, gated behind a feature flag (tsm_enable_geo_redirect) that defaults to false. When enabled, it fires on template_redirect at priority 1 — before any template is loaded — and checks a cookie (tsm_language_preference) for an explicit user preference. In the absence of a cookie, it falls back to the Accept-Language HTTP header to infer the preferred language and issues a 302 redirect to the corresponding language-prefixed URL.

The cookie-based override persists for 30 days (30 * DAY_IN_SECONDS) and is set via an AJAX endpoint available to both authenticated and anonymous users, allowing a floating language switcher widget to persist the user’s choice without requiring login. This is the correct UX pattern: an initial auto-redirect based on language preference, with user-controlled override that is sticky for a month.

Additional Subsystems

Supplementary Subsystems: Image Metadata, Schema.org, WPCode

Image Metadata Translation

TSM_Image_Metadata_Handler (v2.3.2) extends translation scope to media attachments: when a post is translated, the plugin also creates translated versions of the image alt text and title stored in WordPress’s attachment post meta. This ensures that translated pages do not carry source-language alt attributes — a crawlability and accessibility issue that multilingual solutions frequently overlook.

Schema.org inLanguage and Wikipedia/Wikidata Mapping

TSM_Schema_About (v2.4.0) injects inLanguage into the Yoast SEO JSON-LD graph and maps content entities to their Wikipedia/Wikidata equivalents in the target language. This enables structured data consumers (Google’s Knowledge Graph, Bing’s entity graph) to correctly associate translated content with the appropriate language-specific entity references rather than defaulting to the source-language Wikipedia article.

WPCode Snippet Multilingualization

TSM_WPCode_Multilingual (v2.3.0) integrates with the WPCode plugin (formerly Insert Headers and Footers) to support language-conditional code snippet injection. This enables serving language-specific structured data snippets, conversion pixels, or chat widget configurations based on the active language — without duplicating snippet logic across multiple WPCode entries.

Performance Analysis

Performance Characteristics Under Production Load

On a PHP-FPM server with pm.max_children = 10 and pm = ondemand (the reference configuration for this deployment), the plugin’s per-request overhead consists of:

OperationFrequencyEstimated Cost
Plugin bootstrap (18 class instantiations)Every request~1–3ms (no DB calls at boot)
get_current_language() resolutionEvery request, cached statically~0.1ms after first call
Main query tax_query injectionEvery archive/home requestMySQL JOIN cost (varies)
filter_locale() with static cacheHundreds of calls/request~0ms after first call (static cache)
Internal link regex scan (the_content)Every singular translated post viewO(n) regex on content length
Translated post ID lookups (per link)N calls per translated post view~0ms with warm Redis cache

The most operationally significant performance constraint is WP-Cron reliability. In a server configuration where OPcache JIT is enabled (opcache.jit=1254, opcache.jit_buffer_size=32M), PHP execution overhead is minimal. However, WP-Cron fires are triggered by incoming HTTP requests: if the site has low organic traffic (common during a multilingual bootstrapping phase), cron jobs may fire with significant delay relative to their scheduled interval. The correct mitigation is to disable WP-Cron from the HTTP request path (define('DISABLE_WP_CRON', true) in wp-config.php) and schedule a real system cron job to call wp-cron.php directly at the desired interval.

System Stability

Stability Patterns: Manual Process Isolation, Cron Pause, and Cache Invalidation

The manual link update process (triggered from the admin UI) correctly pauses the automatic cron job before starting, preventing two concurrent link-update processes from operating on overlapping post sets. It accomplishes this by unsetting the scheduled cron event and setting a flag (tsm_cron_enabled = 'no') that the cron handler respects before executing. On completion of the manual process, the cron is rescheduled with the previously configured frequency — a clean pause-and-resume state machine.

Cache invalidation after bulk link updates covers three WordPress caching layers: WP-Optimize (wpo_cache_flush()), W3 Total Cache (w3tc_flush_all()), and WP Super Cache (wp_cache_clear_cache()). Individual post cache entries are invalidated immediately with clean_post_cache() after each direct $wpdb->update() call — bypassing wp_update_post() intentionally to avoid triggering save hooks (which would fire translation-creation hooks and create an infinite loop).

Conclusions

Engineering Assessment: What Works, What Scales, What Needs Attention

TuttoSemplice Multilingual makes a coherent set of engineering bets: single-install simplicity over multisite complexity, LLM translation over rule-based MT, persistent queue over transient async patterns, and eager component initialization over lazy loading. For a publisher operating at the scale of hundreds to low thousands of posts across 3–6 languages, these bets are well-calibrated.

The areas that demand attention as content volume scales are:

  • Internal link regex on the_content: a PHP regex scan on every singular translated post view is O(n) on content length. With warm cache, the DB cost is near zero, but the regex itself allocates memory proportional to content size. For very long posts (10,000+ words), pre-processing links at save time (already partially implemented via replace_links_on_save) is preferable to filtering on every render.
  • Tax query JOIN depth: adding multiple tsm_language terms per post (one per supported language in some edge cases) would increase the JOIN complexity. Keeping exactly one language term per post — as the plugin is designed — is the correct invariant.
  • WP-Cron as translation worker: for a high-throughput bulk translation scenario (hundreds of posts per day), the 5-minute cron interval with single-consumer processing is a throughput ceiling. A real-time queue consumer (e.g. a persistent PHP process via Supervisor, or a serverless function triggered by a webhook) would remove this ceiling while keeping the queue schema intact.

Overall, TuttoSemplice Multilingual demonstrates that a well-structured WordPress plugin can implement a production-grade multilingual pipeline — with AI translation, persistent async queues, and correct SEO signal propagation — without requiring external SaaS dependencies, multisite configuration, or modifications to the WordPress core. The architecture is a solid reference point for any engineering team building a multilingual content infrastructure on top of WordPress.

Similar Posts