While creating a container , a partition key needs to be choosen. This is some property in the document that Azure Cosmos DB groups together in logical partitions. Each server in the cluster is a physical partition which can host any number of logical partitions. Each logical partition in turn stores documents with the same partition key value.
All the documents in a logical partition are stored on the same physical partition. A logical partition is never spread across multiple physical partitions/servers in cluster. So general rule of thumb to choose a partition key is one whose value will be known for most of the typical queries. When a partition key is known , Azure Cosmos DB can route the query directly to the physical partition. This is called single partition query.
If the partition key is not known, this its a cross partition query / fan-out query. In such a scenario, Azure Cosmos Db needs to visit every physical partition and aggregate results for the query. This is fine occasionally but should not be case for majority of queries.
Partition Key choosen should result in a uniform distribution of storage and throughput. A logical partition has a limit of 20GB. So it should not be the case where some logical partitions are large and some are small. Similarly, from a throughput perspective, you don’t want some logical partitions being heavily used vs others idle most of the time resulting in “hot partition” scenarios which degrade the performance.
Azure CosmosDB – Understanding partitioning
18 Wednesday Nov 2020
Posted in Uncategorized