How to solve CDB Disk Issues
What is CDB in brief?
The CDB (aka Embedded Database) is an in-Memory DB, to speed up (cache) searches in the censhare system. It’s a central part of censhare and used by all commands, modules as well as the java client and web client. Queries to the database will be done locally in the server. It supports fulltext and faceted search with fuzzy mode.
The CDB contains information about every asset in the system (current version and checked out version only). A modification (e.g. check out and check in) results in a write operation to the CDB file(s). If a CDB file reaches a certain file limit a new file is created. A cleaner mechanism runs periodically to clean up CDB files if necessary. Remaining data is copied to a new CDB file and the old one is deleted. This will keep the total size of the CDB constant.
censhare Server utililizes a cache system for the CDB, the cache is part of the heap. It is configured in the Embedded Database service. A node cache and a larger data cache is used for this. On systems pre 2018.3 this was 300 MB node cache and 800 MB data cache. Since 2018.3 it is 2 GB and 5 GB which suits more to modern installations. The actual size needs to be adapted to the requirements of the customer.
Why does a CDB (total file size on disk) grow?
Once asset changes happend, new data is written to the existing or new CDB files. The CDB is an append only database, new files will written and oldy files will be cleanded. A mass import results in the creation of new CDB files and the CDB size in total will grow. A mass change of many already existing assets will result in a temporary CDB growth. It may take a while for the CDB cleaner to reduce the total file size.
Note: A temporarily growth is normal to the CDB. The size to which it grows cannot be predicted currently. As such we recommend a partition size large enough to buffer the growth of at least 3 times the space of the current production size CDB.
Where are CDB Files located?
The CDB files reside on a dedicated censhare volume usually in ~/work/cdb.
• CDB volume should be a separate unix filesystem
• Fast disk drives (SSD) for the embedded database (CDB)
• We recommend that files for the embedded data base (CDB) are located on a separate mount point
For more detail, see system requirements
What happens if the CDB filesystem runs out of space?
Following scenarios could happen:
1. CDB is growing very fast and threaten to run out of space.
No problem as long as there is free space available
What is recommended:
Increasing the disc space where the CDB is located
2. The CDB filesystem already ran out of space
search results won’t be reliable any more
there will be many error messages in server log which can indicate but never identify the CDB disk space issue.
What is recommend:
shutdown the server (to close the CDB)
increase the disc space
startup the server (check the server logs for CDB error messages, if there are no error messages, keep it running, if the log is full of error messages the CDB is probably corrupt, then a shutdown and a complete CDB rebuild (could take hours depending on size) or rsync the CDB from a remote server or backup is required
It's almost impossible to tell from the server logs why the CDB is growing, as every change to an asset (current version and checkout version) is written to the CDB. A mass changes to assets may result in a temporary growth of the CDB.
Parameters to check
Cleaner usage threshold (cleaner-usage-threshold)
Default value is 30%, it can be increased up to 45% as per need. Why: minimize the size of CDB files or rather than improve the filling rate of CDB files.
IMPORTANT: if you increase the parameter, the disadvantage will be the amount of CDB files increases the first time. So, it needs more space the first time. It’s not recommended if you are running out of space. This change should be immediately followed by a CDB rebuild.
Cleaner files to evict (cleaner-max-files-to-evict-per-run)
Default value is 2, can be increased carefully e.g. 4. Why: It helps to cleanup faster. To have a cleaner effect of this parameter to reduce CDB size. Pro: In the long term CDB file size will decrease.
File size (file-size)
Default 100 MB could be increased to 500 MB for larger CDBs to have less files on disk, which will make it easier for the cleaner thread. A CDB rebuild is recommended, as otherwise the current CDB size will grow.
Thread Pool Size (thread-pool-size in config.xml)
By default 2 cleaner threads are running, which is the recommended number. It may be increase moderately if needed, for example if constantly many asset changes are taking place, resulting in a growing CDB. Please note that increasing the number of threads will reduce server performance. Therefore consider the cleaner-files-to-evict parameter first.
Contra: each additional parallel cleaner thread costs performance.
Relationship between cache size and CDB write actions. If the cache is too small, intermediate states must often be written to the files and the CDB grows (temporarily). If the cache is large, writing is less frequent.
A hitrate for the caches could be checked in the Admin Client with the server-action "Embedded Database statistics". The hitrate should be as high as possible, often it is 99 %, otherwise search/query performance will decrease.
The same statistic show actual usage of the CDB cache in two fields:
If the size is very close to the configured cache size, then it should be reconsidered to increase the cache (and heap).
CDB write-to-disk performance can be identified with the same stastics. The listed node-read/write and data-read/write numbers can by analysed like described in the following article:
How to Get and Interpret Embedded Database (cdb) Statistics