Skip to content

Commit

Permalink
PLATFORM-9062 | Allow Cargo to use replica databases
Browse files Browse the repository at this point in the history
The Cargo extension currently uses a single database connection to
access and manage its database tables, managed entirely by CargoUtils
without going through the usual MW DBAL layers. This ipso facto
precludes the use of a proper primary - replica setup, as this
connection must allow database writes. We want to do just that, so that
we can offload read queries to replicas where possible and avoid putting
undue load on the primary DB in an active-active world. So, first make
it possible for Cargo to utilize replicas at all by factoring out its DB
connection management logic into a new class that can be optionally
configured to use the MW DBAL to obtain DB connections while keeping
backwards compatibility with the existing setup.

With this done, the main source of primary DB access in Cargo - the
cargo_query parser function, which often gets run in case of parser
cache misses - can actually be switched to use a replica, as it does not
really need to read from the primary. However, one may imagine a case
where it is necessary for a query to lead the latest and greatest data
from the primary, e.g. on a page that stores some data via cargo_store
then queries some of that data via cargo_query further down the line.
Sidestep this by leveraging MW's hasOrMadeRecentPrimaryChanges() helper
to always return a primary DB connection irrespective of the caller, if
writes have been detected on the configured Cargo LB.

This code has not yet been actually tested - I am submitting this to
gather opinions and suggestions.

Change-Id: I5ed6661f46be257d1ea6b194aaccbbc5b02c406a
  • Loading branch information
mszabo-wikia committed Feb 15, 2024
1 parent 71f36dd commit 5f918be
Show file tree
Hide file tree
Showing 7 changed files with 512 additions and 112 deletions.
4 changes: 3 additions & 1 deletion extension.json
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@
"CargoLinksUpdateHandler": "includes/hooks/CargoLinksUpdateHandler.php",
"CargoSearchMySQL": "includes/search/CargoSearchMySQL.php",
"CargoPageSchemas": "includes/CargoPageSchemas.php",
"CargoConnectionProvider": "includes/CargoConnectionProvider.php",
"CargoServices": "includes/CargoServices.php",
"CargoAppliedFilter": "drilldown/CargoAppliedFilter.php",
"CargoFilter": "drilldown/CargoFilter.php",
"CargoFilterValue": "drilldown/CargoFilterValue.php",
Expand Down Expand Up @@ -483,7 +485,7 @@
"CargoDBpassword": null,
"CargoDBprefix": null,
"CargoDBRowFormat": null,
"CargoDBIndex": null,
"CargoDBCluster": null,
"CargoDefaultStringBytes": 300,
"CargoDefaultQueryLimit": 100,
"CargoMaxQueryLimit": 5000,
Expand Down
182 changes: 182 additions & 0 deletions includes/CargoConnectionProvider.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
<?php

use MediaWiki\Config\ServiceOptions;
use Wikimedia\Rdbms\Database;
use Wikimedia\Rdbms\DatabaseFactory;
use Wikimedia\Rdbms\IDatabase;
use Wikimedia\Rdbms\ILBFactory;
use Wikimedia\Rdbms\ILoadBalancer;

/**
* Class to manage access to the Cargo database.
*
* By default, this class creates and manages a single connection to the local wiki DB.
* It can be configured to connect to a different DB using the 'wgCargoDB*' settings,
* or to eschew manual connection management in favor of MediaWiki's DBAL by setting the 'CargoDBCluster' option.
*/
class CargoConnectionProvider {
public const CONSTRUCTOR_OPTIONS = [
// MediaWiki DB setup variables.
'DBuser',
'DBpassword',
'DBport',
'DBprefix',
'DBservers',

// Optional Cargo-specific DB setup variables.
'CargoDBserver',
'CargoDBname',
'CargoDBuser',
'CargoDBpassword',
'CargoDBprefix',
'CargoDBtype',

// Optional external cluster name to use for Cargo.
// Supersedes all above configuration if present.
'CargoDBCluster'
];

private ILBFactory $lbFactory;

/**
* Connection factory to use for creating DB connections on MW 1.39 and newer.
* @var DatabaseFactory|null
*/
private ?DatabaseFactory $databaseFactory;

/**
* Configuration options used by this service.
* @var ServiceOptions
*/
private ServiceOptions $serviceOptions;

/**
* The database connection to use for accessing Cargo data, if 'CargoDBCluster' is not set.
* @var IDatabase|null
*/
private ?IDatabase $connection = null;

public function __construct(
ILBFactory $lbFactory,
?DatabaseFactory $databaseFactory,
ServiceOptions $serviceOptions
) {
$serviceOptions->assertRequiredOptions( self::CONSTRUCTOR_OPTIONS );

$this->lbFactory = $lbFactory;
$this->databaseFactory = $databaseFactory;
$this->serviceOptions = $serviceOptions;
}

/**
* Get a database connection for accessing Cargo data.
* @param int $dbType DB type to use (primary or replica)
* @return IDatabase
*/
public function getConnection( int $dbType ): IDatabase {
$cluster = $this->serviceOptions->get( 'CargoDBCluster' );

// If a cluster is specified, let MediaWiki's DBAL manage the lifecycle of Cargo-related connections.
if ( $cluster !== null ) {
$lb = $this->lbFactory->getExternalLB( $cluster );

// Fall back to the primary DB if there were recent writes, to ensure that Cargo sees its own changes.
$dbType = $lb->hasOrMadeRecentPrimaryChanges() ? ILoadBalancer::DB_PRIMARY : $dbType;
$conn = $lb->getConnection( $dbType );

// Fandom change: Ensure Cargo DB connections use 4-byte UTF-8 client character set (UGC-4625).
self::setClientCharacterSet( $conn );
return $conn;
}

if ( $this->connection === null ) {
$this->connection = $this->initConnection();

// Fandom change: Ensure Cargo DB connections use 4-byte UTF-8 client character set (UGC-4625).
self::setClientCharacterSet( $this->connection );
}

return $this->connection;
}

/**
* Get the DB type (e.g. 'postgres') of the Cargo database.
* This is mainly useful for code that needs to generate platform-specific SQL.
* @return string
*/
public function getDBType(): string {
return $this->serviceOptions->get( 'CargoDBtype' ) ?? $this->getConnection( DB_REPLICA )->getType();
}

/**
* Create a database connection for Cargo data managed entirely by this class.
* @return IDatabase
*/
private function initConnection(): IDatabase {
$lb = $this->lbFactory->getMainLB();
$dbr = $lb->getConnection( DB_REPLICA );

$dbServers = $this->serviceOptions->get( 'DBservers' );
$dbUser = $this->serviceOptions->get( 'DBuser' );
$dbPassword = $this->serviceOptions->get( 'DBpassword' );

$dbServer = $this->serviceOptions->get( 'CargoDBserver' ) ?? $dbr->getServer();
$dbName = $this->serviceOptions->get( 'CargoDBname' ) ?? $dbr->getDBname();
$dbType = $this->serviceOptions->get( 'CargoDBtype' ) ?? $dbr->getType();

// Server (host), db name, and db type can be retrieved from $dbr via
// public methods, but username and password cannot. If these values are
// not set for Cargo, get them from either $wgDBservers or wgDBuser and
// $wgDBpassword, depending on whether or not there are multiple DB servers.
$dbUsername = $this->serviceOptions->get( 'CargoDBuser' ) ?? $dbServers[0]['user'] ?? $dbUser;
$dbPassword = $this->serviceOptions->get( 'CargoDBpassword' ) ?? $dbServers[0]['password'] ?? $dbPassword;
$dbTablePrefix = $this->serviceOptions->get( 'CargoDBprefix' )
?? $this->serviceOptions->get( 'DBprefix' ) . 'cargo__';

$params = [
'host' => $dbServer,
'user' => $dbUsername,
'password' => $dbPassword,
'dbname' => $dbName,
'tablePrefix' => $dbTablePrefix,
];

if ( $dbType === 'sqlite' ) {
/** @var \Wikimedia\Rdbms\DatabaseSqlite $dbr */
$params['dbFilePath'] = $dbr->getDbFilePath();
} elseif ( $dbType === 'postgres' ) {
// @TODO - a $wgCargoDBport variable is still needed.
$params['port'] = $this->serviceOptions->get( 'DBport' );
}

if ( $this->databaseFactory !== null ) {
return $this->databaseFactory->create( $dbType, $params );
}

return Database::factory( $dbType, $params );
}

/**
* Set the client character set of a database connection handle to 4-byte UTF-8.
* This is necessary because Cargo utilizes functions such as REGEXP_LIKE(),
* which fail if the client character set is "binary".
*
* @param IDatabase $dbw Database connection handle.
*/
private static function setClientCharacterSet( IDatabase $dbw ): void {
if ( $dbw instanceof DatabaseMysqli ) {
// Force open the database connection so that we can obtain the underlying native connection handle.
$dbw->ping();

$ref = new ReflectionMethod( $dbw, 'getBindingHandle' );
$ref->setAccessible( true );

/** @var mysqli $mysqli */
$mysqli = $ref->invoke( $dbw );
if ( $mysqli->character_set_name() !== 'utf8mb4' ) {
$mysqli->set_charset( 'utf8mb4' );
}
}
}

}
3 changes: 1 addition & 2 deletions includes/CargoSQLQuery.php
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class CargoSQLQuery {
public $mDateFieldPairs = [];

public function __construct() {
$this->mCargoDB = CargoUtils::getDB();
$this->mCargoDB = CargoServices::getCargoConnectionProvider()->getConnection( DB_REPLICA );
}

/**
Expand All @@ -56,7 +56,6 @@ public static function newFromValues( $tablesStr, $fieldsStr, $whereStr, $joinOn
$havingStr, $orderByStr, $limitStr, $offsetStr, $allowFieldEscaping );

$sqlQuery = new CargoSQLQuery();
$sqlQuery->mCargoDB = CargoUtils::getDB();
$sqlQuery->mTablesStr = $tablesStr;
$sqlQuery->setAliasedTableNames();
$sqlQuery->mFieldsStr = $fieldsStr;
Expand Down
12 changes: 12 additions & 0 deletions includes/CargoServices.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?php

use MediaWiki\MediaWikiServices;

/**
* Typed service locator class for Cargo service classes.
*/
class CargoServices {
public static function getCargoConnectionProvider(): CargoConnectionProvider {
return MediaWikiServices::getInstance()->getService( 'CargoConnectionProvider' );
}
}
116 changes: 7 additions & 109 deletions includes/CargoUtils.php
Original file line number Diff line number Diff line change
Expand Up @@ -9,117 +9,17 @@
use MediaWiki\Linker\LinkRenderer;
use MediaWiki\Linker\LinkTarget;
use MediaWiki\MediaWikiServices;
use Wikimedia\Rdbms\DatabaseMysqli;
use Wikimedia\Rdbms\IDatabase;

class CargoUtils {

private static $CargoDB = null;

/**
* @return Database or DatabaseBase
*/
public static function getDB() {
if ( self::$CargoDB != null && self::$CargoDB->isOpen() ) {
return self::$CargoDB;
}

global $wgDBuser, $wgDBpassword, $wgDBprefix, $wgDBservers;
global $wgCargoDBserver, $wgCargoDBname, $wgCargoDBuser, $wgCargoDBpassword,
$wgCargoDBprefix, $wgCargoDBtype, $wgCargoDBIndex;

$services = MediaWikiServices::getInstance();
$lb = $services->getDBLoadBalancer();
$dbIndex = $wgCargoDBIndex !== null ?: DB_PRIMARY;
$dbr = $lb->getConnectionRef( $dbIndex );

$server = $dbr->getServer();
$name = $dbr->getDBname();
$type = $dbr->getType();

// We need $wgCargoDBtype for other functions.
if ( $wgCargoDBtype === null ) {
$wgCargoDBtype = $type;
}
$dbServer = $wgCargoDBserver === null ? $server : $wgCargoDBserver;
$dbName = $wgCargoDBname === null ? $name : $wgCargoDBname;

// Server (host), db name, and db type can be retrieved from $dbw via
// public methods, but username and password cannot. If these values are
// not set for Cargo, get them from either $wgDBservers or wgDBuser and
// $wgDBpassword, depending on whether or not there are multiple DB servers.
if ( $wgCargoDBuser !== null ) {
$dbUsername = $wgCargoDBuser;
} elseif ( is_array( $wgDBservers ) && isset( $wgDBservers[0] ) ) {
$dbUsername = $wgDBservers[0]['user'];
} else {
$dbUsername = $wgDBuser;
}
if ( $wgCargoDBpassword !== null ) {
$dbPassword = $wgCargoDBpassword;
} elseif ( is_array( $wgDBservers ) && isset( $wgDBservers[0] ) ) {
$dbPassword = $wgDBservers[0]['password'];
} else {
$dbPassword = $wgDBpassword;
}

if ( $wgCargoDBprefix !== null ) {
$dbTablePrefix = $wgCargoDBprefix;
} else {
$dbTablePrefix = $wgDBprefix . 'cargo__';
}

$params = [
'host' => $dbServer,
'user' => $dbUsername,
'password' => $dbPassword,
'dbname' => $dbName,
'tablePrefix' => $dbTablePrefix,
// MySQL >= 8.0.22 rejects using binary strings in regular expression functions
// such as REGEXP_LIKE(), heavily used across Cargo, so force UTF-8 client charset here.
'utf8Mode' => true,
];

if ( $type === 'sqlite' ) {
$params['dbFilePath'] = $dbr->getDbFilePath();
} elseif ( $type === 'postgres' ) {
global $wgDBport;
// @TODO - a $wgCargoDBport variable is still needed.
$params['port'] = $wgDBport;
}

if ( method_exists( $services, 'getDatabaseFactory' ) ) {
// MW 1.39+
self::$CargoDB = $services->getDatabaseFactory()->create( $wgCargoDBtype, $params );
} else {
self::$CargoDB = Database::factory( $wgCargoDBtype, $params );
}

// Fandom change: Ensure Cargo DB connections use 4-byte UTF-8 client character set (UGC-4625).
self::setClientCharacterSet( self::$CargoDB );

return self::$CargoDB;
}

/**
* Set the client character set of a database connection handle to 4-byte UTF-8.
* This is necessary because Cargo utilizes functions such as REGEXP_LIKE(),
* which fail if the client character set is "binary".
*
* @param IDatabase $dbw Database connection handle.
* Get the Cargo database connection.
* @deprecated Use {@link CargoConnectionProvider::getConnection()} directly instead.
* @param int $dbType
* @return \Wikimedia\Rdbms\IDatabase
*/
private static function setClientCharacterSet( IDatabase $dbw ): void {
if ( $dbw instanceof DatabaseMysqli ) {
// Force open the database connection so that we can obtain the underlying native connection handle.
$dbw->ping();

$ref = new ReflectionMethod( $dbw, 'getBindingHandle' );
$ref->setAccessible( true );

/** @var mysqli $mysqli */
$mysqli = $ref->invoke( $dbw );
$mysqli->set_charset( 'utf8mb4' );
}
public static function getDB( int $dbType = DB_PRIMARY ) {
return CargoServices::getCargoConnectionProvider()->getConnection( $dbType );
}

/**
Expand Down Expand Up @@ -500,13 +400,11 @@ public static function isSQLStringLiteral( $string ) {
}

public static function getDateFunctions( $dateDBField ) {
global $wgCargoDBtype;

// Unfortunately, date handling in general - and date extraction
// specifically - is done differently in almost every DB
// system. If support was ever added for SQLite,
// that would require special handling as well.
if ( $wgCargoDBtype == 'postgres' ) {
if ( CargoServices::getCargoConnectionProvider()->getDBType() == 'postgres' ) {
$yearValue = "EXTRACT(YEAR FROM $dateDBField)";
$monthValue = "EXTRACT(MONTH FROM $dateDBField)";
$dayValue = "EXTRACT(DAY FROM $dateDBField)";
Expand Down
16 changes: 16 additions & 0 deletions includes/ServiceWiring.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<?php

use MediaWiki\Config\ServiceOptions;
use MediaWiki\MediaWikiServices;

return [
'CargoConnectionProvider' => static function ( MediaWikiServices $services ): CargoConnectionProvider {
// DatabaseFactory only exists on MW 1.39 and newer.
$databaseFactory = $services->hasService( 'DatabaseFactory' ) ? $services->getDatabaseFactory() : null;
return new CargoConnectionProvider(
$services->getDBLoadBalancerFactory(),
$databaseFactory,
new ServiceOptions( CargoConnectionProvider::CONSTRUCTOR_OPTIONS, $services->getMainConfig() )
);
}
];
Loading

0 comments on commit 5f918be

Please sign in to comment.