Repligard configuration

Repligard uses XML files because of two requirements, flexibility and minimizing the amount of code needed to process external files. It was easier to use XML processing than some custom configuration format.

In its current version, Midgard has support for only 8-bit encoded information. Because of this, Unicode can't be used in storage. This limitation is both caused by Midgard design and current MySQL's capabilities. However, Repligard is Unicode-safe, and so Unicode can be used as soon as Midgard and the database will support it. The XML parser already supports Unicode.

Since XML is flexible, information can be stored using whichever encoding desired. Repligard handles translating XML files between latin-1 and other encodings, so users don't need to handle these matters. The only requirement is to have ICONV(3) support on your system which is standard for GNU systems and easily can be added using libiconv package to other ones. ICONV(3) interface is a part of X/Open Portability Guide version 2 standard and is supported more or less by most of commercial and open source Unix-like systems.

All this information needs to be specified in Repligard's configuration file using a encoding's names understood by the ICONV library. (ICONV generally provides interactive tool named 'iconv' which could be used for quering about names of supported encodings. On GNU systems corresponding command line call will be 'iconv -l')

The configuration file includes many elements. First one is database, which contains information about the database used, including schema file used, and database's location and administrator account, similar to the MidgardDatabase directive in Apache configuration. Example:


<database
   schema="/path/to/schema.xml"
   database="midgard"
   username="midgard"
   password="midgard"
   encoding="ISO-8859-1"
   blobdir="/var/www/blobs"
/>

All information from the importing XML file will be translated to the encoding specified before entering to the database.

After we've said what database we want to use, we need to specify which Midgard administrative account we will use for accessing the database. Example:


<login
   username="admin"
   password="password"
/>

If you desire to login to a sitegroup, you need to specify which sitegroup to use since Repligard can't use host information for determining it. The format is the same as with regular admin site, username+sitegroup. All modifiers noted in Midgard manual can be used here. Repligard will handle the exports and imports using the privileges of the user specified.

Because of this, administrators of sitegroups in a co-hosted setup can do theirown replication configurations. However, in this setup, the configuration files should not include database administrator accounts. For this, there is a directive include for including global configuration files. Example:


<include name="/path/to/global/config.xml" />

Repligard supports two general operations, export and import. Import is the easiest one. When importing, user doesn't need to specify anything besides database and login information in the configuration file.

For export, users need to specify which objects to export. For this, there is the replicate element. Easiest is replicating everything in database. Example:


<replicate all="yes" />

Another way is to specify resources using resource structures within the replicate element. Example:


<replicate>
   <resource id="1" type="article" />
   <resource guid="3e6729abe2891fca92fad03" />
</replicate>

Make sure that the IDs specified here are local IDs of the object, as the IDs of objects wary between Midgard installations.

Another possibility is to specify the object using its GUID. Since GUIDs don't change between databases, the same resource string can be used on many databases. The GUID is a 32 character string currently produced using sophisticated algorithm to ensure uniqueness of resource in 128-bit space. It gives us about 3*10^39 different objects (exact value is 340282366920938463463374607431768211456).

As many resources can be specified as needed. However, if the resource has dependencies, Repligard will replicate the whole dependency tree under it. For example, topics are trees, so if a topic is specified, its subtopics and articles will be exported as well.

Repligard can export either only changes since last replication, or everything. This can be decided using Repligard's commandline arguments. If the option '-a' is used, Repligard will export the complete database. Otherwise, only changes will be replicated.

Another functionality in the configuration file is location of the BLOB directory. Repligard will embbed BLOB files to the replication XML file, so they will not be needed to be transferred separately. Repligard uses streams for handling objects, so it doesn't matter whether an object is 5KB long, or 5GB long.

Currently BLOBs are not streamed in import phase, so if a BLOB doesn't fit into system memory, it can't be imported. But given requirement of processing one object per time, it means that BLOBs could be quite big, for example, 50Mb or even more, if you have enough memory to store it.