Ceph – Some Useful Tips


Retrospective to some of the Ceph deployment CYBOSOL have done recently, here are some tips that we learned (some where really costly ) while deploying it for couple of our OpenStack installations.

Never combine Ceph monitor nodes with the OpenStack controller nodes, All though it might be seem very tempting to put Monitors on OpenStack controllers, you will soon realise OpenStack scheduling and operation can greatly effect over all performance of the Ceph Cluster.Always keep Ceph cluster on a separate dedicated network with its own 10G or 40G ethernet or inifiniband switches. Make sure you have enough network capacity on each nodes – both OSDs and the clients. We would recommend any thing above 10Gbps.

Remember the more OSD you have per node, the lesser bandwidth available and hence could effect overall performance. Keep the replication factor to at-least 3 for better resiliency against data loss due to node/OSD loss Always keep spare capacity of at-least two of the largest OSDs. Never run up the OSDs to its maximum capacity. Because once the OSD reaches 100% most probably you have to mark the OSD as down, unless if you can expand the disk to have some more capacity.

Added to the above point, never try to remove PGs (Placement Groups) manually from the OSD filesystem, or not even try to move it to a different filesystem hoping to bring it back once the cluster is back online. Ceph uses special filesystem attributes for each files kept on the OSD disk which are not taken as part of copy (cp) or any other POSIX file archiving utility shipped with linux. The newer Ceph release – Hammer – has utilities to recover failed PGs Use a separate, preferably SSD, to keep OSD journals.

Never ever run OSDs along with OpenStack Compute (Nova) nodes, as OSDs are some times know to consume a good amount of CPU time, especially when it is rebuilding cluster after OSD failures. As stated before it is always better to keep it on a separate network. If you are shutting down a server for regular maintenance, use commandceph osd noout in-order to keep the cluster running, without the bandwidth cost of rebalancing the cluster in-order to replicate the “failed” OSDs. Some useful commands administering Ceph can be found